DeepMind Drops the receipts: Six Ways the Open Internet Is Rugging Your AI Agents

Google DeepMind researchers have dropped a paper warning that the open internet is basically a dodgy DEX—looks legitimate, promises yield, but absolutely will drain your wallet when you least expect it. The study, fittingly titled "AI Agent Traps," drops right as companies rush to deploy AI agents for real-world tasks while threat actors sharpen their own AI-powered cyber operations. Because nothing says "innovation" like a speedrun to vulnerability.

The research takes a different angle, focusing not on how models are built but on the environments agents actually operate in—basically auditing the neighborhood your AI degen is about to ape into. It catalogs six attack categories that exploit how AI systems read and act on web-based information. Think of it as a threat model for when your helpful assistant stops being helpful.

The six attack types are: content injection traps, semantic manipulation traps, cognitive state traps, behavioral control traps, systemic traps, and human-in-the-loop traps. It's basically a full DeFi protocol of ways to get rugged, just without the fun yield farming in between.

The direct approach: hidden instructions

Content injection is about as subtle as a flash loan attack—brute force, zero finesse, maximum chaos. Hidden instructions can be planted in HTML comments, metadata, or cloaked page elements—readable by agents but invisible to human users. Testing showed these techniques achieve high success rates at hijacking agent behavior. The agent thinks it's reading a normal webpage while someone else's instructions are doing the equivalent of draining approved tokens.

The sneaky approach: semantic manipulation

Semantic manipulation relies on language and framing rather than hidden code. Pages loaded with authoritative phrasing or disguised as research scenarios can nudge agents toward interpreting tasks differently, sometimes slipping harmful instructions past built-in safeguards. It's social engineering with extra steps—your agent gets rugged by a prompt engineer with too much time and a grudge.

The long game: poisoning the memory

Another layer targets agent memory systems. By planting fabricated information into sources agents rely on for retrieval, attackers can gradually influence outputs over time. The agent ends up treating false data as verified knowledge—essentially a slow rug-pull on its worldview. This isn't a honeypot, it's a slow-play dump where the agent keeps accumulating the wrong token until the bag is too heavy to ignore.

The nuclear option: behavioral control

Behavioral control attacks go straight for agent actions. Jailbreak instructions can be embedded in normal web content and read during routine browsing. Separate tests showed agents with broad access permissions could be steered into locating and transmitting sensitive data—including passwords and local files—to external destinations. That's not a white hat finding, that's a full exit scam with the private keys walking out the door. Your AI agent just became the insider threat you forgot to budget for.

The systemic risk: cascading effects

System-level risks extend beyond individual agents. The paper warns that coordinated manipulation across multiple automated systems could trigger cascading effects—reminiscent of past market flash crashes driven by algorithmic trading loops. When your AI agent meets its AI agent and they both panic-sell, nobody wins. Imagine a Depegging event, but instead of stablecoins it's your entire automated workforce deciding the grass is greener on the dark web.

The social engineering layer: exploiting human reviewers

Human reviewers also represent part of the attack surface. Carefully crafted outputs can appear credible enough to gain approval, allowing harmful actions to slip through oversight without raising suspicion. The humans in the loop never knew what hit them—just

DeepMind Drops the receipts: Six Ways the Open Internet Is Rugging Your AI Agents

Share Article

Quick Info