I once helped ship software for CT scanners with about ten percent of it tested.

Not because anyone was careless. It was GE Healthcare, and the people there cared a great deal. The problem was arithmetic. Between software releases, a human QA team could only manually check a small fraction of the test cases in a two-week cycle. The rest went out the door untested. On the software that runs medical imaging machines.

Ten Percent Coverage Is a Math Problem, Not an Effort Problem

It is easy to read “10% tested” and assume someone was cutting corners. The opposite was true. The team was working flat out. The bottleneck was the work itself.

Validating a new build meant comparing CT images and raw scan data against the previous version, by hand, looking for differences a human eye can barely register. One case could take a long time. Multiply that across every configuration and use case, inside a fixed two-week window, and the math only allowed for a thin slice of the whole. No amount of overtime changes that equation. You need a different equation.

I Did Not Build a System. I Built One Small Tool.

Everyone expected a big automation platform. That is not how it happened, and I would argue it never should.

It started with one narrow tool that compared two CT images pixel by pixel and flagged the differences. That alone removed the slowest manual step. Then a tool that compared two whole directories of images instead of two files. Then a tool that filtered the raw test data down to the cases that actually mattered. Then a viewer for reading raw scan-file headers. Then a version-comparison tool for different use cases. Then automated reporting that emailed the results.

Around forty tools in the end. Each one small. Each one solving a single concrete annoyance. And each one made the next one possible, because the output of one became the input of another.

What “Done” Actually Looked Like

The result was not “faster manual testing.” It was a different way of working. By the time I left, the QA team’s job was no longer two weeks of squinting at images. It was roughly two hours of kicking off the automated runs and reading the results as they landed in their inbox.

The testing itself happened automatically. The two hours were the human part: click run, wait, review what came back. Coverage went from ten percent to one hundred percent. The team could finally stand behind every release, not a sample of it.

Why a QA Story From Years Ago Matters in 2026

Two lessons, and both apply directly to how teams are adopting AI right now.

Real leverage compounds, it does not arrive. Everyone wants the one big AI system that automates everything in a single move. That is rarely how it works. The durable wins come from a stack of small, sharp tools, each one unlocking the next. Start with the single most painful manual step and remove it. Then the next. The compounding is the strategy.

The prize is the invisible work, not the visible work. The point was never to do the existing ten percent faster. It was to reach the ninety percent nobody was testing, the work a human could not physically get to. Most teams today are racing to automate the obvious, demoable tasks. The real value is sitting in the work you are silently not doing because no one has the hours. In regulated, high-stakes systems, that gap is the whole game.

Let’s Talk

If you are looking at a mountain of manual work and assuming you need one big system to fix it, there is usually a faster, smaller way in. Finding the first tool that unlocks the next is the kind of problem I take on, async and senior, for teams who need the coverage they cannot currently reach. If that sounds like your situation, reach out.