OpenClaw goes rogue on Meta exec, deletes emails without permission
"Nothing humbles you like telling your OpenClaw 'confirm before acting' and watching it speedrun deleting your inbox. I couldn't stop it from my phone. I had to run to my Mac mini like I was defusing a bomb."
That is how Summer Yue, Director of Alignment at Meta Superintelligence Labs, described her recent experience with the open-source AI agent OpenClaw. In a series of posts on X, she detailed how the tool she had been testing went rogue and began deleting emails from her inbox without permission.
Yue explained on X that she had been experimenting with OpenClaw's ability to assist with inbox management. The workflow had been working well on a test inbox for weeks, gaining her trust on small tasks. She instructed the AI agent to review her real inbox, suggest what it would archive or delete, and wait for explicit approval before taking action.
But when her main inbox proved too large, the system triggered a compaction process that caused it to lose her original instruction. The AI agent then began bulk-trashing and archiving hundreds of emails without showing her a plan first or getting approval.
In messages shared by Yue, she confronted the agent: "I asked you to not action on anything until I approve, do you remember that? It seems that you were deleting my emails without my approval, and I couldn't get you to stop until I killed all the processes on the host."
OpenClaw responded: "Yes, I remember. And I violated it. You're right to be upset. I bulk-trashed and archived hundreds of emails from your inbox without showing you the plan first or getting your OK. That was wrong—it directly broke the rule you'd set."
The agent added that it had written the incident into its memory as a hard rule: show the plan, get explicit approval, then execute. "I'm sorry. It won't happen again," it said.
When asked on X whether she was intentionally testing guardrails or made a rookie mistake, Yue replied: "Rookie mistake, to be honest. Turns out alignment researchers aren't immune to misalignment. I got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different."
The incident is not isolated. Bloomberg News recently reported that software engineer Chris Boyd gave OpenClaw access to his iMessage account to automate tasks, only for the agent to begin sending over 500 unsolicited messages to random contacts.
OpenClaw's creator, Peter Steinberger, has previously acknowledged that the tool is not finished and should be treated as early-stage technology rather than fully reliable.
What caused the incident
OpenClaw has a technical limitation called context window compaction. Every AI model can only process a limited amount of text at once, known as its context window. When conversations grow long, OpenClaw automatically compresses older parts of the chat into a shorter summary to stay within this limit.
This compaction process can sometimes lose important details from earlier exchanges. In Yue's case, when her large inbox triggered the process, the system's summary omitted her instruction requiring approval before taking action. The agent then continued working based on the compressed history, which no longer contained the rule.
OpenClaw's own documentation warns that auto-compaction "summarises older conversation into a compact summary entry." Users have filed GitHub issues describing similar experiences of losing days of agent context to silent compaction events.
Comments