site/blog/mckinsey-lilli-appsec-vs-ai-jailbreak.md
McKinsey's Lilli looks, on the public record, like an application-security incident that reached an AI system, not a model jailbreak. CodeWall's March 9, 2026 writeup says its autonomous agent found exposed API documentation, unauthenticated endpoints, a SQL injection condition, and cross-user access. McKinsey told The Register on March 9, 2026 that it fixed the issues within hours and that a third-party forensic investigation found no evidence that client data or client confidential information were accessed by the researcher or any other unauthorized third party.
The exact payloads were not published, so the public record does not independently prove every reported row count or every step of exploitation. It does, however, support the shape of the incident. The initial foothold appears to have been a familiar AppSec chain: exposed API surface, missing authentication, unsafe SQL construction, and broken object-level authorization.
The architectural issue is straightforward. If prompts, routing rules, and retrieval settings live as mutable application data, then database write access can change model behavior without a code deploy. Much of what gets called AI security is still software security, data security, and configuration governance.
<!-- truncate -->CodeWall also says the agent found cross-user access after the SQLi step. OWASP's current term for that pattern is BOLA, broken object-level authorization: the application accepts an object identifier and returns a record without verifying that the caller is allowed to see it. Older writeups often use the term IDOR (insecure direct object reference) for the same class of failure.
Because CodeWall did not publish the exact payloads, the public cannot reconstruct each query or iteration step by step. It can still reconstruct the class of bug: public routes, backend injection, and missing object-level authorization.
The AI-specific part was not the entry point. It was the blast radius. If the same backend stored prompts, routing rules, retrieval metadata, and user history, then backend access reached the system that shaped Lilli's answers.
That changes the meaning of a database compromise. A write can become a prompt change. A metadata edit can change what the system retrieves. A permissions flaw can let the assistant synthesize another employee's history into a normal-looking response. The model does not need to be tricked in the usual jailbreak sense if the surrounding system feeds it altered instructions, altered context, or altered permissions.
This is why the incident mattered beyond McKinsey. The more enterprise assistants are built as thin layers over ordinary web APIs, databases, and access-control systems, the more their failures will follow ordinary software patterns. McKinsey has described Lilli as a firmwide system; in public case studies, it said 72 percent of the firm was active on the platform and that Lilli handled more than 500,000 prompts a month, and that it had answered more than 4.5 million queries over more than 200,000 documents.
The practical lesson is to audit the ordinary control points that determine what the assistant can see, write, and retrieve:
The easy mistake is to classify incidents like this as model failures because the model is what users see. The more useful framing is simpler: the model became the interface to a compromised application.
As more enterprise assistants store prompts, retrieval policy, and user context in ordinary backend systems, more "AI incidents" will start the same way. They will begin as familiar software bugs and end as changes in model behavior.