This is the story of a codebase under siege, a very thorough otter, and the tool we built because nothing else existed.
Read the storyIt started with a hallucinated import. A developer on a team we know well was using an AI coding tool to speed up a feature. The code looked great. The tests passed. The PR looked great. It merged on a Friday afternoon.
"The linter passed. The syntax checker passed. The reviewer approved it on a Friday. The library did not exist. The app crashed on Monday."
That kind of thing used to be rare enough to laugh about. It is not rare anymore. Teams using AI tools are shipping code at speeds that human review was never designed to handle, and the tools that are supposed to catch problems were all built before AI-generated code was a thing anyone needed to worry about.
Nobody had built the tool that actually runs the code. That checks whether the imports exist. That maps the blast radius before the merge. So we decided to build it ourselves.
The hallucinated import was not a fluke. It was a symptom of something structural. AI coding tools are extraordinary. They let teams ship faster than ever before. But speed without a reliable gate is just a faster way to break things.
The volume of code going out has exploded. The human capacity to carefully evaluate it has not moved at all. And the tools that are supposed to catch problems were built for a different era.
This is not a story about bad developers. Every team we have talked to cares about quality. The problem is structural. The tools have not kept up with how code is actually being written now.
AI tools invent library names, function signatures, and API endpoints with total confidence. They look real. They pass every static check. They fail the moment the code runs, because the thing they are calling does not exist.
A two-line change to an authentication function can silently break ten other parts of the system. Reviewers looking only at the diff have no way to know that without a complete map of the codebase. Nobody holds that map in their head for every file in every repo they review.
Traditional scanners find vulnerabilities after code is merged, deployed, and live on servers real users are hitting. By then the cleanup costs more than the original feature was worth, and the trust damage is already done.
A tool that raises fifty uncertain findings on every PR trains teams to dismiss everything. Real issues hide in the noise. The false positive problem is not a tradeoff. It is a product failure that makes the tool actively harmful.
There is no fast, PR-level system that answers the question teams actually need answered: if this change is wrong, what breaks, how far does the damage spread, and how risky is it to ship right now?
The codebase is the town. Every pull request is a ship at the dock.
There was once a town that lived by the harbour. Carefully built, with each district depending on the next. The people who lived there had spent years learning its patterns. They knew which bridges bore the most weight, which streets led to the square, which wells supplied the bakeries and which ones fed the mill.
Ships arrived every day. The familiar ones from nearby ports were waved through without much fuss. Their captains had docked before. Their cargo was trusted.
Then came the machines.
Tools so powerful that almost anyone could now assemble a ship in hours that once would have taken months. The harbour received more goods than ever before. New ideas arrived from new places. But more often than not, they came across grave failures, yet they were extraordinarily confident about things they invented.
A machine-built ship might carry a crate labelled with a merchant's name, except that merchant had never existed. The label looked authentic. The crate looked sealed. Everything about it passed a casual inspection. But open it, and there was nothing inside. Or worse: something inside that would rot the warehouse it was stored in.
The harbour master could not read every manifest, open every crate, and test every length of rope for every ship that arrived. There were too many ships. There was only one harbour master. Something in that equation had to change.
His name was Patch. He had been at the harbour long enough to remember every ship by name and every captain by face. He had watched the change come. He had seen the inspection process grow thin. And the occasional disaster that followed when something wrong slipped through.
He did not complain about it. He was not the sort to complain. He was the sort to stand at the gate and start checking.
Not the manifest. Not the captain's word. He actually opened the crates, unpacked the cargo, and ran the mechanisms to see if they worked under load. Reading code and running it are completely different problems.
A complete, current, always-updated map of the town. He knew which road connected to which bridge, which district depended on which supply. He consulted the map before opening a single crate. The map told him where to look hardest.
A barrel going to one kitchen received one level of attention. A barrel supplying the whole district's water received another. Two ships could carry the same flaw and represent entirely different levels of risk, depending on where their cargo was going.
He was not trying to slow the ships down. He was trying to ensure that what reached the town was what it claimed to be. No more, no less, and nothing hidden.
Other inspectors were not lazy. They were doing what they had always done: reading manifests, looking at crates from the outside, asking the captain if everything was in order. This had worked for a long time. The machine-built ships broke that assumption. The old checks, applied to this new kind of ship, were not catching what needed to be caught.
Every library name, every function reference, every listed dependency is confirmed against what actually exists. Phantom cargo does not survive this check. AI tools invent things with complete confidence. Patch checks whether the confidence is warranted.
In a sealed testing environment, the code is actually run. Test suites execute and compare against the base branch. If something will fail, it fails here, in isolation, before it ever touches the production system. There is a whole class of bugs that look fine on paper and break immediately when executed. The only reliable way to find them is to execute.
The structural parts that are not visible at a glance. The concealed connections. The things that will matter later, in production, when the stakes are real. Vulnerabilities, exposed credentials, authentication removed from a gate that really needed it. A ship whose hull looks sound but whose keel is secretly cracked is not a sound ship.
Where is this change going, and how many parts of the system depend on it? A vulnerability in a file that nothing depends on is a different problem to the same vulnerability in a file that forty services call. Risk is not just what is broken. It is how far the damage spreads. The answer determines how deeply Patch looks.
When Patch finds something wrong, he does not simply refuse the cargo and send the ship away. He comes back with an explanation. Here is the exact problem. Here is why it matters. Here is the specific fix. Not a verdict. A path forward.
And when something cannot be fixed at the dock, the berth is blocked. Not a suggestion. Not a recommendation. A gate, closed.
The gate is not there to stop the ships. It is there so that when a ship reaches the town, the town knows what arrived.
A security check you can ignore when a deadline arrives is not a security check. It is advice. autter blocks the merge button. That is intentional, and it is non-negotiable.
There is a whole class of bugs that look fine on paper and break immediately when executed. The only reliable way to find them before production is to actually run the code. Every other tool skips this step. We do not.
A vulnerability in a file nothing depends on is a different problem to the same vulnerability in a file forty services call. Risk is not just what is broken. It is how far the damage spreads. We measure both before surfacing anything.
A tool that raises too many false alarms gets turned off. Every finding autter surfaces has to be real, explained, and fixable. We would rather catch five real issues than flag fifty uncertain ones.
We build for the people who own codebases, not the people who submit to them. The maintainer carries the risk when something goes wrong. Features that reduce maintainer burden ship first.
Every other tool in this space was designed before AI-generated code was something anyone needed to worry about. We started from scratch in 2025, building for the world as it actually is, not the world these tools were designed for.
autter is Patch. A very thorough otter, and a check that actually means something. The gate is real. Come see for yourself.