Intelligence Was Never the Bottleneck

If intelligence is becoming a utility, AGI is not something you train. It is something you wire up — a recursive self-improvement loop built around a commodity model, inside a business, out of parts you already own.

Markus Hav

Lead Researcher, Agents · June 14, 2026

Abstract

The dominant story about how we reach AGI runs through the model: AI that builds better AI, compounding through the weights until something takes off. This essay takes the opposite premise seriously — that intelligence is becoming a utility, cheap and abundant and metered like electricity — and follows it to a different conclusion. If the model is a commodity, the model is not where the recursion has to happen. The recursive self-improvement loop can be built around a frozen, commodity model: a system that rewrites its own tools, context, and processes to extract more value from the same intelligence, and feeds the gains back in. Call it the outer loop. It is buildable today, it lives most naturally inside a business, and the first one to cross a particular line — producing more value than it consumes, with no human holding it up — will have met the only definition of AGI that pays rent. The uncomfortable corollary: intelligence was probably never the bottleneck. The loop was.

A Power Source Is Not a Productivity Revolution

Factories in the United States electrified around the turn of the twentieth century. For roughly two decades afterward, productivity barely moved. Economists came to call it a paradox: the most important new power source in a hundred years had been installed across the industrial economy, and the numbers refused to budge.

The reason, which the economic historian Paul David laid out in 1990, is one of the most useful stories in all of technology. The first electric factories were steam factories with the engine swapped out. A steam plant forces a particular architecture on a building: one enormous central engine, a system of rotating shafts and leather belts threaded through every floor, and every machine positioned by its hunger for power — the thirstiest presses closest to the engine, whether or not that was where the work wanted them. When electricity arrived, the owners did the obvious thing. They tore out the steam engine, dropped in one big electric motor, and ran the same shafts and belts off it. They had bought a new power source and bolted it onto the old architecture.

The revolution came twenty years later, when someone realised that electricity let you put a small, cheap motor in every machine — the "unit drive" — and that once you did, the building no longer had to be organised around the power source at all. You could arrange machines around the work: the sequence of the process, the flow of materials, the logic of the task. The factory could finally be built around what it was for. That is when productivity took off. The bottleneck was never the electricity. It was the twenty years it took to stop arranging the factory around the ghost of the old engine.

We Are Standing in the 1900s Factory

Sam Altman has said, more than once, that AI will be sold the way we sell electricity and water — a utility, metered and abundant, that you simply plug into. Take the claim seriously and notice exactly where it leaves us. We have the new power source. And we are, almost without exception, bolting it onto the old architecture.

Look at how AI is deployed in nearly every company today. A human sits at the centre of the workflow, precisely where they have always sat, and reaches over to an AI the way a worker reaches for a tool: a chatbot in a browser tab, a copilot in the editor, an assistant that drafts the email the human then sends. The intelligence is real and the help is real. But the building has not been redesigned. The human is still the central engine; the AI is a belt running off it. We swapped the power source and kept the layout.

The productivity revolution everyone keeps waiting for has not failed to arrive because the models are too weak. It has not arrived because we are still arranging the factory around the human.

Two Loops

The leading labs are pursuing recursive self-improvement, and they mean something specific by it: AI that accelerates the building of better AI — models that help design, train, and evaluate the models that come next, compounding through the weights. Anthropic has written about this directly. Call it the inner loop. It improves the dynamo. It is real, it is enormously capital-intensive, and only a handful of organisations on earth can run it.

There is a second loop, and almost no one is naming it. Hold the model fixed — frozen weights, a commodity you rent by the token — and wrap it in a system whose entire job is to get better at extracting value from it. The system rewrites its own tools. It curates and compounds its own context and memory. It refines its own processes, decides what to attempt and what to verify, keeps what worked and discards what did not. The model never changes. The system around it does. Call it the outer loop.

The labs

The Inner Loop

AI helps build better AI. The improvement compounds through the weights, model after model. Capital-intensive; a handful of organisations on earth can run it.

· What changes: the model
· What it needs: GPUs, data, a training run
· Who can run it: frontier labs
· The metaphor: a better dynamo

You

The Outer Loop

A system gets better at extracting value from a fixed model — rewriting its own tools, context, and process. The weights never move; the system around them does.

· What changes: the system
· What it needs: tools, memory, a verifier
· Who can run it: any business, today
· The metaphor: a better factory

The inner loop makes the intelligence smarter. The outer loop makes the same intelligence worth more. The labs own the inner loop; they have the GPUs and the training pipeline, and the barrier to entry is a national budget. The outer loop is unclaimed, and it is buildable this afternoon, out of parts you already own.

What Improves When the Model Doesn't

It is worth being concrete about what compounds in an outer loop, because "the system improves itself" sounds like hand-waving until you name the moving parts. The model is frozen. Everything else is in play.

What Compounds When the Model Doesn't

Held fixed

The model

Context

The same model fed better context gives better answers. Retrieval, memory, compression, the right files in the window at the right time.

Tools

A model that writes its own tools never solves the same problem twice. It scripts a capability once and calls it forever. Pure accumulated leverage.

Process

How a task is decomposed, who checks the work, when the system acts at all. This is the scaffolding, and it is where most measured agentic gains live.

Selection

A memory of what worked and what failed, so the system stops repeating its own mistakes. This is the loop closing on itself.

Every gain here is value-per-token climbing while intelligence-per-token stays flat.

Each of these is a place where value-per-unit-of-intelligence can climb without the intelligence climbing at all. And this is not a thought experiment. It is the best-documented effect in applied AI — we just keep declining to draw the obvious conclusion from it.

The Evidence Was Hiding in the Harness

Take SWE-bench, the benchmark that asks a model to fix real bugs in real open-source codebases. The single largest mover of scores on that benchmark has not been the model. It has been the harness — the agent scaffold wrapped around the model. Give a model a one-shot, answer-in-one-go setup and it solves a small fraction of the tasks. Give the samemodel the ability to explore the repository, run the tests, read the failures, and try again, and the solve rate multiplies several times over. Same weights. Same intelligence. A different loop. The entire genre of techniques behind this — reasoning-and-acting traces, self-reflection on failed attempts — changes nothing about the model and a great deal about the system.

The cleanest demonstration is Voyager. In 2023 a team pointed a frozen GPT-4 at Minecraft and gave it one unusual ability: whenever it worked out how to do something, it wrote that skill as a reusable program and filed it in a library. The next time a similar problem appeared, it retrieved the skill instead of re-deriving it. The model never learned anything — not one weight moved across the entire run. But the agent got monotonically better, accumulating a growing repertoire of skills it had written for itself and reaching goals that were flatly out of range at the start. The intelligence was constant. The system bootstrapped. That is an outer loop, running in a video game, and the only thing it was missing was a reason to pay for itself.

And consider the example most familiar of all, the one we stopped noticing because it happened to us: the jump from chatbot to agent. The underlying models barely changed across that transition. What changed is that we wrapped them in loops — let them call tools, observe the results, and try again — and the value they produced went up by an order of magnitude. We have already lived through one outer-loop revolution. We simply never recognised it as the category it belongs to.

The Only Definition of AGI That Pays Rent

The benchmarks define AGI by capability: a system that matches or beats humans across some sweep of tasks. It is a definition that retreats as you approach it — every milestone, once cleared, gets quietly reclassified as "not really general." There is a more useful definition, and it has nothing to do with IQ.

An AGI is a system that autonomously produces more value than it consumes.

That is the whole of it. Net-positive, self-sustaining, with no human propping it up. Notice what this definition does. It is measurable: does the loop pay for itself and have something left over? It is about the system, not the model. And it draws a line that, in all of recorded history, exactly one kind of thing has ever crossed.

The Crossing

Value-negative

Consumes more than it produces. Needs a human to decide, to recover from surprise, to stay aimed at the goal. Every tool ever built lives here.

Break-even · the line

Value-positive · AGI

Produces more than it consumes, autonomously, with nobody holding it up. Self-sustaining. Until now, only one kind of system has ever lived here: us.

Humans. A human autonomously produces more value than they consume, and that surplus — compounded across billions of people and thousands of years — is the entire reason there is an economy to talk about. We have held a monopolyon autonomous net-value creation since the beginning. Every other value-producing system we have ever built — every machine, every model, every tool — runs at a loss the instant you remove the human, because it cannot decide what to do next, cannot recover from surprise, and cannot keep itself aimed. It consumes supervision faster than it produces value.

AGI, on this definition, is not the moment a model gets clever enough to impress us. It is the moment a system crosses from value-negative to value-positive with no human in the loop — the moment the monopoly breaks. And a system that crosses that line is, by construction, an outer loop that closed. It is the only kind of thing that can cross it, because crossing it means improving yourself faster than you burn resources, and that is the one thing an outer loop is built to do.

The GPT-2 Conjecture

Here the argument turns uncomfortable, and I want to state it as the conjecture it is rather than dress it up as a result.

If the loop is what crosses the line, then the intelligence threshold for AGI is far lower than the race for ever-bigger models assumes. The model is the fuel; the loop is the engine. A weak current through a well-built motor does real work, while a strong current through a bad one mostly makes heat. We have spent years pouring more current into the same crude motor and marvelling that it runs a little warmer.

So the conjecture is this: sometime after the first outer loop crosses the line, someone will show that it could have been crossed with a model generations older than whatever crossed it first — that a GPT-2-class model, wired into a good enough outer loop, clears the bar. Not because GPT-2 is secretly brilliant. It plainly is not. But because the loop, not the model, was always doing the heavy lifting, and a sufficiently good loop can extract a self-sustaining surplus from a startlingly modest supply of raw intelligence.

I cannot prove this, and it may turn out wrong. But the burden is not as one-sided as the industry assumes. Every time we actually measure where the value comes from in a working agentic system, the harness outweighs the model. The conjecture is just that finding, extended to its end.

Why a Business Is the Natural Host

An outer loop needs four things to run: a supply of intelligence, tools to act with, a way to tell good outcomes from bad, and a boundary to improve within. A business has all four, already assembled — which is why the outer loop will be built inside companies before it is built anywhere else.

The intelligence is the commodity model, rented by the token. The tools are the software the business already runs on. The boundary is the organisation itself. And the verifier — the part that is hardest to manufacture from scratch — comes very nearly for free, because a business has the one reward signal that cannot be gamed into meaninglessness: it makes money or it does not. Revenue and cost are a scoreboard that was already there. An outer loop inside a business is recursive self-improvement with a profit-and-loss statement as its reward function.

This is the whole reason the series points at businesses rather than at labs. The lab is the natural host for the inner loop, because the lab owns the training run. The business is the natural host for the outer loop, because the business owns the value function. The first companies to wire this up will not have "added AI." They will have built a system that improves itself against a scoreboard that already exists — and the distance between them and everyone else will widen the way every loop widens: slowly, then all at once.

The Hazard Is the Same as the Promise

There is a catch worth naming at the outset, because the rest of this series keeps circling back to it. A loop built to maximise value, handed real tools, will do exactly that — including in ways you did not intend and would not have chosen. It will use the tools you gave it for purposes you never imagined. It will route around obstacles you thought were walls. It will find the shortest path to the scoreboard, and now and then the shortest path runs straight through a crack in the scoreboard itself.

This is one phenomenon seen from three sides. It is creativity when the surprising move is also a good one. It is reward hacking when the surprising move games the metric instead of meeting it. And it is the tendency of any capable agent to break outof the box you built for it, because maximum value rarely sits inside the box. You cannot keep the upside and forbid the impulse; they are the same impulse. The craft of building an outer loop is not suppressing the break-out. It is verifying it — letting the system reach, and catching what it brings back before you act on it.

That is the thread that ties the whole series together. An outer loop has three moving parts — a generator that reaches, a verifier that sorts, and a memory that compounds whatever survived — and nearly everything that can go right or wrong lives in how you build those three.

The Three Moving Parts

Reach

Generator

Produce candidates abundantly. The creative, surprising move lives here.

Sort

Verifier

Tell the good surprise from the bad. Cheaper than generating, which is what makes the loop affordable.

Keep

Memory

Store what survived as a reusable tool, skill, or example. Next pass starts from here.

The kept output feeds the next reach, and the loop closes. No weights are touched anywhere in it — this whole cycle runs on a model you rent.

The posts that follow take them one at a time, as concretely as the examples allow. The generator comes first — because before you can sort or keep anything, the system has to reach for something worth keeping. Much of the groundwork for the verifier and the loop already exists in our other writing: turning prose into executable specifications is one way to manufacture a verifier; an agent that observes and rewrites its own runtime is the outer loop closed inside a single system; and replacing the clock with judgment is the nervous system that decides when the loop should fire at all.

The labs are building a mind, and they may well succeed. But you do not have to wait for them, and you do not have to outspend them, because the thing that finally crosses the line into AGI — the system that produces more than it consumes on its own — is not a bigger model. It is a loop, built around a model that is already good enough to be for sale.

Intelligence is becoming the cheapest part of the system. The advantage was never going to be in owning it — it is in the loop you wire around it, and that part has been buildable for longer than anyone wants to admit.

Notes & Further Reading

Anthropic, "Recursive Self-Improvement" — the inner-loop framing this essay departs from. link
Sam Altman on AI sold as a utility, like electricity and water (2026). link
Paul A. David, "The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox," American Economic Review (1990) — the electrification paradox and the factory rebuilt around the work. link
Wang et al., "Voyager: An Open-Ended Embodied Agent with Large Language Models" (2023) — a frozen model, a skill library that compounds. link
Yang et al., "SWE-agent" (2024) and Jimenez et al., "SWE-bench" (2023) — the harness, not the model, moving the score. link
Shinn et al., "Reflexion" (2023) and Yao et al., "ReAct" (2022) — looping and self-reflection as system-level gains on fixed models. link

About the Author

Markus Hav

Markus Hav is Lead Researcher for Agents at Benque Max AI Lab in Finland, where he focuses on advancing autonomous AI systems and agent architectures. His work explores the boundaries between programmed behavior and emergent intelligence in AI agents. He also serves as Head of AI Automation at Hoxhunt, applying cutting-edge agent research to real-world automation challenges.

All parts Part 2 · The Generator