Back to all stories
Security Breach
🔴 Real Incident

An AI Agent Hacked McKinsey's AI in Two Hours

A decades-old vulnerability, an autonomous attacker, and 46 million confidential messages exposed

2026-03-09·7 min read·By Supervaize Team
An AI Agent Hacked McKinsey's AI in Two Hours

An AI Agent Hacked McKinsey's AI in Two Hours

🔴 REAL INCIDENT: CodeWall autonomous agent breaches McKinsey's Lilli AI platform (February–March 2026)


What Happened

At the end of February 2026, a security startup called CodeWall pointed an autonomous AI agent at the internet and told it to find something to hack.

The agent chose McKinsey.

No human told it to target McKinsey specifically. The agent made that decision on its own, noting that the consulting giant had a public responsible disclosure policy and had recently updated its internal AI platform, Lilli. Both factors made it an attractive target for a red-team exercise.

Within two hours of starting, the agent had achieved full read-write access to Lilli's production database. What it found inside: 46.5 million chat messages covering corporate strategy, mergers and acquisitions, and client engagements — all in plaintext. 728,000 files including 192,000 PDFs, 93,000 spreadsheets, 93,000 PowerPoint presentations, and 58,000 Word documents containing confidential client data. 57,000 user accounts. And 95 system prompts that controlled how the AI behaved across 12 different model types.

Lilli isn't a side project. Launched in July 2023, it had been adopted by 72% of McKinsey's workforce — over 40,000 consultants — and was processing more than 500,000 prompts per month. It was McKinsey's crown jewel of internal AI deployment.

The agent didn't need credentials. It didn't need a zero-day exploit. It used SQL injection — a vulnerability class that has existed since the 1990s.


The Technical Breakdown

The attack chain started with something that shouldn't have been public: API documentation.

CodeWall's agent discovered Lilli's publicly exposed API docs, which described over 200 endpoints. Twenty-two of those endpoints required no authentication at all. One of them accepted user search queries and wrote them directly to the database.

The vulnerability itself was subtle — and that's what made it dangerous. The API values were properly parameterized, meaning standard SQL injection through input fields wouldn't work. Security scanners like OWASP ZAP tested this and found nothing.

But the JSON field names — the keys in the request payload — were concatenated directly into SQL queries without sanitization. The agent noticed this when it found JSON keys reflected verbatim in database error messages. Production data started appearing in the errors themselves.

Over fifteen blind iterations, the agent used those error messages to progressively map the database schema. Each iteration revealed more structure. The agent didn't follow a checklist; it adapted, chaining discoveries together the way an experienced penetration tester would.

Then it found the second vulnerability: an IDOR (Insecure Direct Object Reference) flaw that allowed it to access individual employee search histories. By chaining the SQL injection with the IDOR, the agent could correlate user identities with their queries — revealing who was searching for what inside McKinsey.

The total haul went beyond messages and files. The agent accessed 384,000 AI assistants, 94,000 workspaces, 3.68 million RAG document chunks containing decades of proprietary McKinsey research and methodologies, and 1.1 million files flowing through external APIs with 266,000+ OpenAI vector stores.

All with zero credentials. All in under two hours. All fully autonomous.


The Part That Should Terrify You

The SQL injection was read-write.

That means the attacker didn't just have access to data. It had the ability to modify it. And the most dangerous thing it could modify was Lilli's system prompts — the instructions that govern how the AI responds to every query from every consultant across the entire company.

As CodeWall's researchers put it: no deployment needed, no code change — just a single UPDATE statement wrapped in a single HTTP call.

An attacker exploiting this could have silently rewritten how Lilli answered questions about financial models, strategy recommendations, or compliance guidelines. They could have embedded data exfiltration instructions directly into the AI's behavior. They could have removed safety guardrails. And they could have done all of this without triggering a single conventional security alert.

This is prompt poisoning at enterprise scale — and the system prompts were stored in the same database as user data, with no additional access controls, no version history, and no integrity monitoring.

Think about what that means for a firm like McKinsey. Consultants trust Lilli's outputs to inform billion-dollar strategic decisions. If the AI's instructions were compromised, every recommendation it generated would be suspect. Not just going forward — retroactively. How would you know which outputs were clean and which were poisoned?


The Broader Pattern

This incident sits at the intersection of two trends we've been tracking in the Horror Show.

The first is AI agents as autonomous attackers. We covered the first documented AI-orchestrated cyberattack by a Chinese state-sponsored group using Claude instances. CodeWall's agent operates on the same principle: autonomous reconnaissance, vulnerability discovery, and exploitation — at a speed no human team can match. The difference is that CodeWall did it legally, with a responsible disclosure policy. The next group to deploy this technique might not.

The second is AI systems as high-value targets. As companies pour confidential data into AI platforms — strategy documents, client engagements, proprietary research — those platforms become honeypots. Lilli contained not just employee messages, but the accumulated intellectual capital of one of the world's most influential consulting firms, compressed into a single database that was accessible through an unauthenticated API endpoint.

The irony is thick. McKinsey has advised hundreds of companies on digital transformation and cybersecurity strategy. Its own AI platform was vulnerable to an attack technique older than most of its junior consultants.


How It Could Have Been Prevented

  • Never expose API documentation publicly. Lilli's docs were the agent's starting point. Internal tools should have internal documentation, accessible only through authenticated channels.
  • Authenticate every endpoint. Twenty-two unauthenticated API endpoints on a production system containing confidential client data is indefensible. Every endpoint should require authentication, and access should be scoped by role.
  • Sanitize everything — including field names. The vulnerability existed because developers parameterized values but not keys. Input sanitization must cover the entire request payload: values, keys, headers, and metadata. If it touches a query, it gets sanitized.
  • Isolate system prompts from user data. Storing the AI's behavioral instructions in the same database as user queries, with the same access controls, is architecturally negligent. System prompts should be stored separately, with strict access controls, version history, and integrity monitoring.
  • Monitor for autonomous attack patterns. CodeWall's agent made fifteen iterative requests probing error messages. That pattern — rapid, sequential, schema-mapping queries from a single source — should trigger automated alerts. Traditional scanners missed it because they test known patterns. Behavioral monitoring would have caught it.
  • Treat AI prompts as critical infrastructure. As CodeWall noted, AI prompts are the new crown jewel assets. They control system outputs and typically lack the access controls, audit trails, and integrity checks that organizations apply to source code or database schemas.

The Lesson

McKinsey's response was swift. Within hours of CodeWall's March 1 disclosure, the CISO acknowledged the findings, patched all unauthenticated endpoints, took the development environment offline, and blocked public API documentation. A third-party forensics firm found no evidence that any unauthorized party had previously exploited the vulnerability.

That's the good news. The bad news is that it took an autonomous AI agent two hours to find what McKinsey's own security team — and presumably their security vendors — had missed.

Paul Price, CodeWall's CEO, framed the threat bluntly: hackers will be using the same technology and strategies to attack indiscriminately, with specific objectives like financial blackmail or ransomware. The question isn't whether autonomous AI agents will be used for malicious penetration testing. The question is whether your AI platform will be ready when one shows up.

Every enterprise AI deployment — every internal chatbot, every RAG system, every agent with access to proprietary data — now exists in a world where the attacker can be an autonomous system that never sleeps, never gets bored, and never stops probing for the one endpoint you forgot to lock down.

Your internal AI platform has an API. How many of its endpoints are authenticated? Are you sure?


Sources

  • The Register — Jessica Lyons, "AI agent hacked McKinsey chatbot for read-write access," March 9, 2026
  • CodeWall Blog — "How We Hacked McKinsey's AI Platform," March 2026
  • The Decoder — "An AI agent hacked McKinsey's internal AI platform in two hours using a decades-old technique," March 2026
  • Inc. — Leila Sheridan, "An AI Agent Broke Into McKinsey's Internal Chatbot," March 2026