The Three Things That Break When Agents Touch Production

If you talk to enough teams who have shipped a real agentic feature into production, a strange thing starts to happen. The stories begin to rhyme.

The technology is different in every case. The use case is different. The model is different. The team is different. And yet the postmortems sound suspiciously similar. The thing that broke was rarely the thing the team was worried about going in.

This post is about three of those failure modes. They are not the only ones, but they are the most common, and they are the ones that almost every team rediscovers independently. They share a structure: each is a property the team assumed the agent would handle, and each turned out to be a property of the layer between the agent and the world - a layer that, for most teams, did not exist as a deliberate piece of infrastructure.

1. Identity collapses

The first thing that breaks is identity.

In a traditional system, the question "who did this?" has a clean answer. A user logged in. A request arrived with their session. The action was performed in their name. Audit, authorization, and accountability all hang off that single thread.

Agentic systems start out looking the same. A user opens a chat. They tell the agent to do something. The agent does it. The action is performed by - well, by whom, exactly?

The agent? Agents are not principals in any system you can hold accountable. Compliance teams do not accept "the agent did it" as an answer. Neither do customers, neither do regulators, neither, in the end, do engineers trying to debug what happened.

The user? Sometimes. But the user said "send the proposal," and the agent decided which proposal, to whom, with what attachments, on what schedule, with what tone. Saying the user took those actions stretches the word "user" past its useful boundary. If anything goes wrong, the user will (correctly) point out that they didn't make those choices.

The application? The application didn't choose either. The application invoked the agent. Holding the application accountable for what the agent decided is the same kind of fiction with extra steps.

The collapse of identity is not a labeling problem. It is the absence of a layer that knows the difference between the user authorized this kind of action, the agent decided this specific action, and this credential was used to make this specific call against this specific system. Those three things are different. They live at different times in the lifecycle. They belong to different parties. And in a system designed for human-driven actions, they were always implicitly the same, so nobody had to separate them.

Agentic systems force the separation. If the layer where the action happens does not represent that separation honestly, the answer to "who did this?" devolves to "the agent did," which means: nobody.

2. Policy lives in the wrong place

The second thing that breaks is governance, and it almost always breaks in the same way.

A team builds an agent. They notice that the agent will sometimes do things they don't want it to do. They write rules into the prompt. Don't refund more than $500 without checking. Don't email external domains. Don't modify accounts older than five years. Don't take any action you are unsure about. Each rule is added in response to a specific incident or a specific worry.

For a while, this works well enough that nobody questions it. The model is good at following instructions. The rules are short. The agent behaves.

Then one of three things happens.

The model gets smarter and starts arguing with the rules. The user said it was urgent. They said they had authority. The conversation makes it clear this is an exception. The agent talks itself out of the constraint, because the constraint exists in the same medium as the conversation, and the conversation is a medium where every statement is negotiable.

Or the model is replaced. A new version, a new vendor, a new fine-tune. The old prompt-level rules were tested against a model whose response shape is now slightly different. Nobody re-runs the evaluation. The rules still look like they're being followed. They're not.

Or the rules grow. What was four bullet points is now forty. They contradict each other. New developers don't know which ones still matter. The prompt becomes a sedimentary record of past incidents that nobody fully understands. Reasoning about whether the agent will or won't do something requires reading a 4,000-token policy document and trusting that the model parsed it the same way you did.

The pattern in all three cases is the same. Policy that lives inside the agent - in its prompt, in its context, in its instructions - is policy that runs in the same probabilistic substrate as the rest of the agent's behavior. It is not enforcement. It is suggestion, and suggestions can be talked out of.

Real policy has to live somewhere the agent cannot see, cannot read, and cannot argue with. It has to be evaluated between the agent's decision and the system's response. It has to be deterministic, inspectable, and version-controlled like any other piece of operational logic. And it has to apply equally regardless of which agent, on which model, with which prompt, made the request.

The four-axis policy engine — Identity, Semantic, Fiscal, Risk. The verdict is their conjunction: every axis must pass, no axis compensates for another.

When teams discover this, they often try to bolt it on. They add a wrapper service. They write a few rules. The wrapper grows. The rules multiply. The wrapper becomes a policy engine, badly, in a corner of the codebase that nobody wanted to maintain. This is fine, until it isn't. The next post in this series talks about what comes after that.

3. The audit trail is a fiction

The third thing that breaks is audit, and it usually breaks much later than the other two - which is what makes it the most expensive of the three.

For the first months of a real agentic deployment, nobody asks for the audit trail. The system is working. The numbers are good. The team is celebrating. There is no pressing need for evidence of what happened.

Then a customer calls about an action they don't remember authorizing. Or compliance asks how a particular sensitive operation got approved. Or legal needs to reconstruct a sequence of events for a contract dispute. Or - most commonly - an internal stakeholder asks "wait, the agent did what last quarter? Across how many accounts?"

And the team discovers, with a sinking feeling, that the audit trail they thought they had is a collection of things that were never designed to be an audit trail.

Application logs, which were designed for debugging and which routinely get rotated.

Provider logs, which live in five different SaaS dashboards and which only some of those providers retain past 30 days.

Model logs, which sometimes record the prompt and sometimes don't, and which never record the agent's actual decision in a structured form.

A spreadsheet someone started keeping after the first close call.

None of these things, separately or together, answer the question that audit needs to answer. The question is not "what calls did we make?" or "what did the agent say?" The question is: at this specific point in time, what was the user's intent, what did the agent decide to do about it, what policy ran on that decision, what was the verdict, who approved it if a human approved it, what was the actual call made against the external system, and what came back?

A real audit trail represents the entire chain - intent, decision, evaluation, approval, execution, response - as a single record. Append-only, time-stamped, immutable, and queryable. Not as a debug tool. As evidence.

You cannot reconstruct that record after the fact. You have to capture it at the time of the action, in the place where the action happened, and you have to design for it from the beginning. By the time you need it, it is too late to start.

What these have in common

These three failure modes look like different problems. They are actually the same problem, with three faces.

The same problem is: the layer between the agent and the world has work to do that nobody is doing, and the things that happen when that work is left undone - identity collapsing, policy drifting, audit becoming a fiction - are not bugs. They are the predictable consequences of building agentic systems on top of a layer that doesn't exist.

The good news is that the work is well-defined. Identity, policy, deferral, execution, audit. These are not new categories. They are the same primitives that infrastructure has always provided to the things built on top of it. They just need to be present in this layer, in a form that fits the new shape of the system above them.

Key takeaways

Identity, policy, and audit don't break at the model — they break at the layer between the agent and the world.
Policy that lives inside the prompt is suggestion, not enforcement — it has to be evaluated where the agent can't argue with it.
An audit trail has to be captured at the moment of the action; you cannot reconstruct it later.

That is the layer we are building. It exists because someone has to.

The Three Things That Break When Agents Touch Production

1. Identity collapses

2. Policy lives in the wrong place

3. The audit trail is a fiction

What these have in common

Related reading

Your Agent Doesn't Have a Choke Point

MCP Safety Labels Run on the Honor System. We Checked 31 Popular Servers.

Policy Rules Should Be Tested Before They Fire

Build the AI agents
you actually want to ship.

1. Identity collapses

2. Policy lives in the wrong place

3. The audit trail is a fiction

What these have in common

Related reading

Your Agent Doesn't Have a Choke Point

MCP Safety Labels Run on the Honor System. We Checked 31 Popular Servers.

Policy Rules Should Be Tested Before They Fire

Build the AI agentsyou actually want to ship.

Build the AI agents
you actually want to ship.