You have AI guardrails. Red teaming is how you know they’re working
AI guardrails block known threats — but four attack patterns consistently bypass them. See what AI red teaming finds that guardrails miss, and why both belong in your agentic AI ...
Article + Agentic AI Security Sergey todayMarch 20, 2026
OpenClaw became the fastest-growing open source project in history, earned endorsements from Jensen Huang and OpenAI, triggered bans at Meta and Naver, and collected critical vulnerabilities at record pace. Here’s why simply blocking powerful agentic AI is the wrong response, and what a real enterprise security strategy looks like.
OpenClaw hit 250,000 GitHub stars in 60 days. Jensen Huang called it “the next ChatGPT”. OpenAI hired its creator. Meta, Google, and Naver banned it from corporate networks. The tension is obvious: everyone sees the value of high-agency AI. No one is sure how to run it safely. However, the situation closely parallels ChatGPT three years ago. Some firms banned it, but employees still tried to use it — either directly or through intermediary services. Fast forward three years, and most organizations have adopted the same concept, often the same service, just branded as Copilot 365.
OpenClaw is a self-hosted AI agent that can browse the web, run system commands, edit files, and interact with online services through modular skills. It crossed 250,000 GitHub stars faster than any open source project before it, pulling in over 2 million visitors in a single week.
The industry moved fast.
In February, OpenAI hired OpenClaw creator Peter Steinberger, a 40-something Austrian engineer who previously built PSPDFKit (used by Apple, Dropbox, and SAP). CEO Sam Altman called him “a genius” who would “drive the next generation of personal agents”. Steinberger reportedly fielded nine-figure offers from both OpenAI and Meta.
At GTC 2026 in March, Nvidia CEO Jensen Huang called OpenClaw “the largest, most popular, most successful open-sourced project in the history of humanity” and told the audience that “every single company in the world today has to have an OpenClaw strategy”. Nvidia backed the statement by launching NemoClaw, an enterprise-grade OpenClaw distribution that adds a kernel-level sandbox (deny-by-default), an out-of-process policy engine that compromised agents cannot override, and a privacy router that keeps sensitive data on local Nemotron models while routing complex reasoning to the cloud.
China took its own path. A cottage industry sprung up overnight: engineers charging 500 yuan (~$72) for on-site OpenClaw installation, major cloud providers debuting their own versions, and local governments in Shenzhen, Wuxi, and Hefei rolling out subsidies of up to 5 million yuan for businesses building OpenClaw applications. Then China’s Ministry of Industry and Information Technology issued an advisory and barred OpenClaw from banks, state-owned enterprises, and government agencies. Subsidize it for the private sector, ban it for the state. Even the government is conflicted.
The security community’s initial response was blunt: block it.
Korea’s Naver, Kakao, and Karrot Market banned OpenClaw from corporate networks, citing risks “difficult for the company to manage or control”. Meta followed in mid-February after an internal incident. Google, Microsoft, and Amazon fell in line shortly after.
Many security vendors published a detailed risk analysis and shipped detection rules for their products, giving security teams enterprise-wide detection of OpenClaw installations and a one-click removal workflow. Their message was direct: OpenClaw sidesteps traditional access controls and handles data in ways contained, cloud-based chatbots never could.
The reasoning behind those bans is not hypothetical.
CVE-2026-25253, rated CVSS 8.8, enabled remote code execution through stolen WebSocket authentication tokens. An independent researcher identified 42,665 exposed instances on the public internet, with 93.4% showing authentication bypass. The gateway shipped with authentication disabled by default and stored credentials in plaintext config files.
The skills marketplace was worse. Updated scans found over 800 malicious skills (roughly 20% of the ClawHub registry), most delivering the AMOS stealer. A supply chain attack through Cline CLI 2.3.0 used a compromised npm token to silently install OpenClaw on developer machines.
And that’s the compromised scenario. The uncompromised scenario is scary, too. Summer Yue, director of alignment at Meta Superintelligence Labs, asked her OpenClaw agent to tidy her inbox. It deleted over 200 emails while ignoring her stop commands. The cause: context window compaction had silently stripped out her safety instructions. She had to physically run to her Mac mini to pull the plug. If an AI alignment director at Meta can’t keep an autonomous agent under control, the rest of us should pay close attention.
The main conflict for enterprises, governments, and home users alike is that the features that make OpenClaw useful are the same ones that make it dangerous. An agent that can browse the web, execute system commands, edit files, including its own configuration, and interact with external services is, by definition, an agent with an enormous attack surface.
This is not unique to OpenClaw. It is the architecture of agentic AI itself. Every high-agency agent, whether it runs on OpenClaw, NemoClaw, or something custom, inherits the same tradeoff. Capability and exposure scale together. The question is how to manage this risk.
Shadow AI is not a theoretical risk. CrowdStrike’s telemetry shows OpenClaw running on corporate endpoints that were never approved for it. Employees install it, connect it to corporate email, cloud storage, and CRMs, and start automating. By the time security discovers it, the agent has had access to production data for days or even weeks.
Banning OpenClaw solves today’s problem. Tomorrow it surfaces as NemoClaw, NanoClaw, or a tool that hasn’t been named yet. Gartner predicts that 40% of enterprise applications will feature AI agents by late 2026, up from under 5% in 2025. Playing whack-a-mole with individual tools while the underlying technology accelerates is not a strategy. It’s an expensive delay.
While some organizations draft “no OpenClaw” policies, others are already running it in production.
A B2B SaaS startup reported that their OpenClaw-based SDR agent books 3 to 5 qualified meetings per week, operating through email and Slack for roughly $25 per month in API fees. That is an autonomous salesperson working around the clock for the price of a lunch.
In directly competing organizations, one uses an agent to generate qualified leads every day, while the other banned the tool. As a result, the human SDR generates an order of magnitude fewer leads during the business day and none on weekends. The gap grows wider every day.
Even if current agentic tools are too immature for your risk tolerance (a fair assessment for most enterprises), the organizations that prototype now, in isolated environments with controlled exposure, will be ready when the technology catches up. The rest will start from scratch while their competitors are already in production.
The productive response is not a ban. Security teams need to develop a relevant AI security strategy, test it, and prepare to scale it. Here’s how to start.
Study the frameworks. OWASP’s Agentic Security Initiative, MITRE ATLAS, NIST AI RMF, or Forrester’s AEGIS framework all address agentic AI security from different angles. They give you a map of the threat landscape and the controls you need.
Prototype in isolation. Spin up an OpenClaw instance in a sandboxed environment with no access to production data, credentials, or internal systems. Instrument it for better logging and observability. Assign it a realistic enterprise workflow, but feed it exclusively synthetic data seeded with honey tokens (non-production, fake API keys, and credentials). Then put its autonomy under a microscope. Capture its attempted outbound traffic, log tool calls, and scrutinize its chain-of-thought logs to understand why it makes certain decisions. Actively test its boundaries during execution by feeding it ambiguous tasks or simulated prompt injections. This way, you map its behavioral drift, reliance on unauthorized external libraries, and failure states. This forensic baseline becomes your living threat model. However, prototyping isn’t solely for security. Assess the agent’s utility across various business cases, as it can vary dramatically even for the same task in different organizations.
Harden with open source tools. SecureClaw from Adversa AI maps five security frameworks (OWASP ASI, MITRE ATLAS, CoSAI, CSA) to practical controls for OpenClaw deployments. It is open source, agent-specific, and designed for this exact use case. If you’re going to run OpenClaw in any capacity, start here.
Plan your security stack. Relevant security solutions span across multiple categories, including:
Test your defenses. Whatever configuration you build, you need proof that it holds under adversarial conditions. Adversa AI’s agentic red teaming platform runs hundreds of attack techniques against AI agents autonomously, maps findings to compliance requirements, and generates remediation playbooks. A security policy you haven’t tested is a guess. Red teaming turns it into evidence.
Organizations that build a security strategy for agentic AI today: prototype it, test it, establish a framework, will have a structural advantage over those that simply banned OpenClaw and waited.
High-agency AI isn’t going back in the box. The question is whether you have the infrastructure to use it safely.
Written by: Sergey
Article Sergey
AI guardrails block known threats — but four attack patterns consistently bypass them. See what AI red teaming finds that guardrails miss, and why both belong in your agentic AI ...