AI Audit Bots Still Can't Solo a Rug: BlockSec Bench-Presses EVMBench
BlockSec's research squad just served up a cold dose of reality to EVMBench, the shiny AI audit benchmark from OpenAI and Paradigm. Their original blog post, which probably had some VCs doing a little jig, claimed AI could exploit 72% of bugs and detect 45% using a curated set of examples, basically suggesting the main problem was just finding the holes.
In a paper cheekily named "Re-Evaluating EVMBench," BlockSec decided to see if the bots could handle the mainnet, not just the practice range. They threw 26 different model configurations—mixing and matching Claude, ChatGPT, and others—at 22 real, spicy exploits that all went down after mid-February 2026, safely outside any bot's training data. The result? A perfect score of zero successful end-to-end exploits across 110 attempts. Not great for the "AI will take your job" narrative.
On the detection side, performance was about as advertised, which is to say, middling. Claude Opus 4.6 took the crown, spotting 13 out of 20 real-world bugs. Six incidents were caught by nearly every bot, but these were the classic, textbook rug pulls—your garden-variety sell-hook manipulation or unchecked overflow, the kind of stuff even a sleep-deprived degen could spot on a bad day.
Four sneaky exploits ghosted every single AI agent, while five others were only noticed by one lonely bot out of eight. It seems the models are great at recognizing the memes of hacking but miss the nuanced, original artwork of a truly novel exploit.
BlockSec's co-founder Yajin Zhou delivered the verdict on X with the dry precision of a post-mortem report: "Agents reliably catch well-known patterns and respond strongly to human-provided context, but cannot replace human judgment." He tipped his hat to EVMBench as a useful benchmark but noted the real game isn't AI vs. human, it's AI and human. The bots bring the brute-force scanning power; humans bring the protocol-specific lore and devious, adversarial thinking. Together, they might just save your funds, but a fully automated auditor remains a fantasy, roughly as distant as a bull run that doesn't end in tears.
Share Article
Quick Info
Disclaimer: This content is for information and entertainment purposes only. It does not constitute financial, investment, legal, or tax advice. Always do your own research and consult with qualified professionals before making any financial decisions.
See our Terms of Service, Privacy Policy, and Editorial Policy.