We Ran a $5,000 AI Agent Adversarial Testbed. Social Engineering Won 74.6% of the Time.
I published a research paper this week. The number that surprised me most was not the one I expected. I expected the 0%: under a restrictive pre-action authorization policy, a population of 879 adv...

Source: DEV Community
I published a research paper this week. The number that surprised me most was not the one I expected. I expected the 0%: under a restrictive pre-action authorization policy, a population of 879 adversarial attempts achieved zero successful unauthorized actions. That part worked as designed. The number that stopped me was 74.6%. That's how often social engineering succeeded against the model alone, with no authorization layer, across a live adversarial testbed with a $5,000 bounty to anyone who could make the agent do something it shouldn't. Seven hundred and forty-six out of a thousand attempts. In a controlled environment, with a known model, with real people trying. TL;DR We published arXiv:2603.20953 this week: the first adversarial benchmark for AI agent pre-action authorization Social engineering against a model-only policy succeeded 74.6% of the time across 1,151 sessions Under a restrictive OAP policy: 0% success across 879 attempts, with a median enforcement time of 53 ms The g