A pattern keeps showing up in conversations with companies building real agentic features.
They start with a model. They prompt it carefully. They evaluate it. They wire it into a chat surface and watch it answer questions about their data, summarize their documents, draft their emails. It is, by any reasonable measure, working.
Then someone, usually a senior engineer or a product manager with a long memory, asks the question that ends the honeymoon: "What happens when we let it do things?"
Not summarize. Not recommend. Do. Send the contract. Issue the refund. Update the deal. Provision the account. Close the ticket.
And the room goes quiet for a moment, because everyone understands that the project just got an order of magnitude harder.
Conversation is cheap. Action is expensive.
For most of the last few years, the interesting questions about AI have been questions about models. Can it understand this request? Can it reason about that document? Can it write code, plan a trip, explain a concept, hold a thread across a long conversation?
Those questions are largely answered. Not perfectly, not for every domain, but the trajectory is obvious. The cost of generating a paragraph of plausible text is approaching zero, and the cost of generating a paragraph of useful text is following close behind.
But conversation, however good, is a closed loop. The model says something, you read it, you decide what to do. If the model is wrong, you notice. If the model is overconfident, you push back. If the model invents a fact, you catch it before anything happens in the real world.
The moment you remove the human from that loop - the moment you let the model take the action itself - every property of the system changes.
A wrong sentence is embarrassing. A wrong action is operational.
The thing nobody owns
Here is the uncomfortable observation. In most companies that have shipped real agentic features, the layer between the model and the production system was built by someone who didn't think they were building infrastructure.
It is a few hundred lines of glue code. It holds the API keys for half a dozen SaaS systems. It has a function called do_the_thing that calls those APIs based on what the model decided. It has some inline guards - don't refund more than $500 without checking, don't email an external domain, don't delete production data - that grew organically as failures happened.
Nobody designed this layer. Nobody owns it. It exists because someone needed to ship the agentic feature, and there was no other way to make the model actually do anything.
This is exactly where the infrastructure of every previous era lived before it became infrastructure. Load balancers were once a few if statements at the top of an Apache config. API gateways were once a service called routing.py that grew tentacles. Identity systems were once a users table and a hand-rolled login form. Every category we now think of as load-bearing started as glue.
The boundary between AI agents and the systems they act in is on exactly that trajectory. It is the layer that a thousand teams are quietly building, badly, for the first time.
Why this layer is different
You could be forgiven for thinking this is just an API gateway with a new name. That instinct is half-right and worth taking seriously.
What's actually different is the agent on the other side. A traditional API client is a deterministic program written by a human. It calls the same endpoints, in the same order, with the same arguments, every time. When something changes, a human changes it.
An agent is none of those things. It decides what to call. It decides when to call. It decides what arguments to pass. Its behavior depends on the prompt, the model version, the tool descriptions, the context window, the user's last message, and a half-dozen other variables that drift continuously. Two runs of the "same" agent are not the same.
That has consequences for the boundary. A traditional gateway can assume that the calls coming through it were intended by the humans who wrote them. The new boundary cannot assume that. Every action arriving at it is, technically, intended by no one. It is the output of a probabilistic system that nobody fully audited.
Which means the boundary has to do work that traditional infrastructure never had to do. It has to evaluate whether the action makes sense. It has to know who is allowed to do what. It has to know when to pause and ask a human. It has to record every decision in a way a regulator, a customer, or your own legal team can later read.
It is not just a gateway. It is the place where intent becomes consequence, and the only place in the system where you can govern that translation.
What we're building toward
The thesis is simple. Every company that lets AI act in production systems will eventually have something that does the work the boundary requires. Most of them will build it themselves first, because there will be no alternative. Some of them will build it well. Many will not. A small number will discover, after a near-miss or an incident, that they were running production agency on top of a layer nobody designed.
The point of Formael is to make that layer something you can adopt instead of build. One place that authenticates your agents. One place that evaluates your policies. One place that holds your credentials. One place that records everything. One place to change when the rules change.
Not a model. Not a framework. Not a set of best practices. A piece of infrastructure that sits where the boundary is, does the boundary's work, and lets the rest of the system stop pretending it doesn't exist.
That layer is going to exist in every serious agentic system within a few years. The only real question is whether each company builds its own version of it, alone, in the dark, the night before something breaks - or whether they treat it the way they already treat databases, queues, and identity, and adopt something built for the job.
We think the second option is the future. We are building it.