The Outer LoopPart 2 · The Generator

Creativity Is Positive Hallucination

Negative hallucination is a lie. Positive hallucination is a true thing nobody had said yet — and a verifier is all that stands between them.

MH

Markus Hav

Lead Researcher, Agents · June 14, 2026

Abstract

This is the generator of the outer loop. Before a self-improving system can sort, keep, or compound anything, it has to reach — produce a surprising move worth keeping — and reaching is the very faculty we have spent three years trying to suppress. Hallucination is not one phenomenon but two, sharing a mechanism and differing only in sign. Negative hallucination is confident falsehood, the thing we are right to fear. Positive hallucination is the true-but-unexpected, the thing we call creativity when a human does it and a bug when a machine does it. The claim here is that creativity and error are the same act with opposite outcomes; that the only thing dividing them is a check applied after the fact; and that an outer loop lives or dies on its willingness to generate the first while ruthlessly sorting out the second. Turn the dreaming up. Turn the verifier up. Keep what survives. The mistake we are deleting and the genius we are chasing come from the same place.

A Photograph of a Dog

Show a model a picture of a dog and ask it what it is looking at. Here are three answers it could give.

One Photograph, Three Captions

A model is shown a picture of a dog. Every caption below is produced the same way — by sampling the next likely word. Only one of them is false.

Neutral

"It is a dog."

True. Expected. You learned nothing.

Negative

"It is a cat."

False. The textbook hallucination.

Positive

"It is a wolf that chose us — a predator we spent thirty thousand years turning into the one animal that reads a human face better than a chimpanzee can."

True. Unexpected. You will remember it.

All three describe the same pixels. Only the middle one is wrong. But notice what the other two are doing, because the usual conversation about hallucination cannot see the difference between them. The first answer is correct. The third answer is also correct — and it is not remotely the same kind of thing. One is a label. The other is an idea. One closes the question; the other opens three more. If a person said the first, you would nod. If a person said the third, you would call them creative.

And here is the part that should bother us: there is exactly one neutral answer, a handful of plausible wrong ones, and an effectively unbounded supply of true-and-surprising ones. The dog is also a four-legged heat engine quietly turning breakfast into warmth. It is a colour-blind nose that experiences the afternoon mostly as a timeline of smells. It is a distant cousin of the animal that would eat it. Every one of these is true. None of them is the answer you were expecting. The space of positive hallucinations about a single dog is larger than the space of facts most people will ever state about it.

Everything a Model Says Is a Hallucination

Andrej Karpathy put it sharply in 2023: hallucination is all an LLM does. It is a dream machine. The prompt starts the dream, and the model unspools whatever its training makes likely. The dog, the cat, and the wolf are produced by precisely the same act — sampling the next token from a learned distribution. The model is not consulting a fact about the photograph and reporting it; it is dreaming, and sometimes the dream lands on the truth. Hallucination is merely the word we reserve for the dreams we dislike the destination of.

This has an uncomfortable consequence. If the generative act is identical whether the output is true or false, then "make the model stop hallucinating" is incoherent as literally stated. You cannot keep the wolf and kill the cat by turning the dreaming down, because the wolf and the cat are the same faculty pointed at different targets. Turn the dial that suppresses the cat and you suppress the wolf in the same motion. This is not a tuning problem. It is a sorting problem, and we have been treating it as the former.

The Two Axes of a Dream

Almost every public discussion of this topic collapses a two-dimensional space onto a single line running from "accurate" to "hallucinating." That line is the error. There are two independent axes, not one.

The first is truth: is the statement the case or not? The second is surprise— how far the statement sits from the answer you expected, which information theory makes precise as surprisal, the negative log-probability of the token under your prior. A predictable true statement carries almost no information. An improbable one carries a great deal. Plot the two axes and the regions we have been lumping together fall apart.

The Two Axes of a Dream

Surprise

Negative Hallucination

Confident fabrication. It sounds like insight and is simply untrue. The dangerous one.

Positive Hallucination

True and unexpected. Boden's definition of creativity, reached from the other direction. The prize.

Ordinary Error

The plausible wrong guess. "It is a cat."

Neutral

Correct and dull. "It is a dog."

FalseTruthTrue

Up is more surprising; right is more true. The two halves we call hallucination sit in the same column — the false one. Creativity sits one column over, and we have never had a name for the wall between them.

Now the three captions have coordinates. Neutral is true and expected: correct, and nearly free of information. Negative hallucination is the false column entire — from the dull wrong guess to the confident, novel-sounding fabrication that is the genuinely dangerous corner. And positive hallucination is the top-right: true and surprising. That top-right cell is not a new invention of mine. It is, almost word for word, Margaret Boden's definition of creativity — ideas that are novel, surprising, and valuable. We arrived at her quadrant from the other direction and found her already standing in it. Creativity has always been positive hallucination. We simply never noticed it shared an address with the bug.

The map also explains why this is hard. Positive and negative hallucination are nearest neighbours. A brilliant, improbable truth and a confident, improbable falsehood look identical from the inside— both are the model reaching past the obvious with full conviction. Nothing in the generating process distinguishes them. Only an external check, applied after the fact, can tell move 37 from a blunder. Hold that thought; it is the whole essay.

The Machines Already Do It. So Do You.

Two existence proofs that positive hallucination is real, and valuable, and not a figure of speech.

The first is move 37. Game two, March 2016, AlphaGo against Lee Sedol. On the thirty-seventh move AlphaGo placed a stone on the fifth line in a position so alien that the commentators assumed a bug; the system's own model of human play put the odds that a person would choose it at one in ten thousand. It was not a person's move. It won the game and rewrote opening theory that had ossified over a thousand years. A one-in-ten-thousand move, played by a system trained only to win, is a positive hallucination with a scoreboard attached — and the improbability was not incidental to its value. The improbability was the value. A move humans would have played was already known.

The second proof is sitting behind your eyes. In Michael Gazzaniga's split-brain studies, the speaking left hemisphere is asked to explain an action that the mute right hemisphere actually initiated. It has no access to the real cause. It never says "I don't know." It confabulates instantly and fluently — I got up to fetch a drink— and believes itself. Gazzaniga called this faculty the interpreter, and it is not a defect of damaged brains; it runs in all of us, all the time. The interpreter that lies to the split-brain patient is the same interpreter that writes the novel. Confabulation and creativity are one engine. Humans are not less hallucinatory than models. We are merely, and only sometimes, better at sorting our output before it leaves our mouths.

We Built a Lie Detector and Called It Alignment

If positive hallucination is so valuable, why does every tool we have point the other way? Because of how we grade.

OpenAI's 2025 analysis of why models hallucinate makes the incentive embarrassingly plain: we score models like a multiple-choice exam. A confident wrong answer and a confident right answer are graded on the same axis, and "I don't know" scores zero. Under that rule the optimal test-taking strategy is to always guess. We trained models to bluff and are now surprised that they bluff. That is the negative-hallucination problem, and it is real.

But look at the fix we reached for. To stop the bluffing we lean on alignment that rewards the safe, expected, well-behaved continuation — and the measurements show what that costs. Reinforcement learning from human feedback reliably collapses output diversity; the aligned model drifts toward the mode, toward the answer everybody already expected. We pulled the only knob we had, "be less surprising," to solve a problem that was really "be less false" — and in doing so we deleted the cat and the wolf together. We have optimized, with real success, for the least interesting true thing in the room. We built a magnificent instrument for finding the dog.

Two Loops

Here is where it becomes an engineering question rather than a lament, and where the loop from the core conceptdoes its work. Take the sorting problem from earlier — the fact that only an external check separates the brilliant surprise from the confident lie — and ask what happens when you wire a system's output back into its own next move. There are two ways to do it, and they differ by exactly one component.

Without the Sorter

The Doom Loop

  • 1 · The system generates, dreams and all
  • 2 · Its raw output feeds straight back in, unsorted
  • 3 · Errors compound; the rare and the true thin out
  • 4 · The distribution narrows toward its own average
  • 5 · Collapse

Feed a model its own undifferentiated output and it forgets the tails (Shumailov et al., Nature, 2024).

With the Sorter

The Dream Loop

  • 1 · The system generates abundantly — let it dream
  • 2 · A verifier sorts each output by sign
  • 3 · Discard the false; keep the true-and-surprising
  • 4 · Reuse what survived — as data, or as a saved tool or skill
  • 5 · Recursive self-improvement

Keep only what survives the check, reuse it, repeat (STaR; self-play). At the system layer, "reuse" is a saved tool, not a training run.

The loop on the left is the one everyone fears, and it is documented. Shumailov and colleagues showed in Nature in 2024 that a model trained on its own undifferentiated output, generation after generation, suffers model collapse: the tails of the distribution wither, the rare and the true are forgotten first, and the model converges on a blurry average of itself. This is real. But notice the precise condition under which it happens — undifferentiated output, fed back without sorting. The doom loop is what you get when you skip one step.

The loop on the right adds that step and nothing else: a verifier between generation and reuse that sorts each output by sign. Generate abundantly, with the temperature up and the dreaming unconstrained. Then keep only the true-and-surprising, throw the false away, and let the next pass build on what survived. This is precisely what the Self-Taught Reasoner does — generate chains of reasoning, keep only the ones that reach a verified-correct answer, train on those, repeat — and the model climbs. It is precisely what self-play does — AlphaZero generates moves no human taught it, retains the ones that win, and reaches superhuman strength fed entirely on its own positive hallucinations.

But notice that "build on what survived" need not mean retraining at all. That is the version the labs run, through the weights. At the system layer — the outer loop this series is about — it means something cheaper and faster: the validated move becomes a tool the system calls again, a skill it files away, an example it carries forward in context. No weights move. The loop still climbs.

So the headline result is this: collapse and recursive self-improvement are the same loop, distinguished only by whether the sign-sorter is installed— and that holds whether the loop runs through the weights or through the system wrapped around them. Remove the filter and a system devours itself. Install it and it bootstraps upward on its own surprises. The entire difference between the doom we worry about and the takeoff we chase is one component — the thing that can tell true-and-surprising from false-and-surprising. Everything important is hiding in that filter.

Why the Loop Compounds: Token Density

It is worth being precise about why the dream loop improves a system rather than merely maintaining it, because the mechanism is the quiet center of this whole argument. Assign each statement a value equal to its truth multiplied by its surprise.

Information Per Token

Neutral
~0
"It is a dog." — true, but you could have guessed it.
Negative
"It is a cat." — worse than silence; now something has to unteach it.
Positive
++
"A wolf that learned to read our faces." — true and improbable; dense with information.

A neutral statement is true but predictable: its information value rounds to zero. A negative hallucination is worse than zero — it carries negative value, because someone downstream must now spend effort unlearning it; the cat costs you twice. A positive hallucination is true and improbable, and so it is dense: it packs many bits of usable, correct surprise into a few tokens.

Now run the dream loop on that ledger. Each pass strips out the negative-value outputs and concentrates the positive-value ones, and the average information carried per token rises. The system learns to put more true, more surprising things in fewer words. That is not a metaphor for getting smarter. By a literal information-theoretic measure it is getting smarter — more bits of true surprise per token, climbing monotonically under a good filter. Recursive self-improvement, stripped of mystique, is just the density of true surprise going up because the system kept feeding on its best ideas and starved itself of its worst — and whether "feeding" means a training set or a growing library of validated tools and skills, the arithmetic is identical. It is the same obsession that runs through our work on what will kill the current generation of coding agents— make every token earn its place — turned away from orchestration and pointed at thought itself.

The Sorter Is the Whole Game

If the loop's one critical part is the verifier, then verification is the lever on which all of it rests — and there is a deep asymmetry working in our favour. For most domains, checking is easier than creating. It is hard to find a proof and easy to verify one; hard to play move 37 and trivial to count who won; hard to write the theorem and cheap to test the code against it. That gap is exactly what makes the dream loop affordable: you can let the generator hallucinate wildly, at ruinous breadth, precisely because the cost of sorting the output afterwards is so much lower than the cost of producing it. The next part of this series is about nothing else.

This reframes a surprising amount of our own research as variations on building the sorter. Codumentation turns documentation claims into executable specifications — prose that fails loudly the moment it drifts false. That is a sign-sorter for language: a way to give text the one property, checkability, that lets the dream loop run on it at all. Why AI Must Wake Up to Scale argued for agents that observe, evaluate, and rewrite their own runtime; a self-improving agent is just this loop closed inside a single system, and the evaluator it needs is exactly the verifier described here. And The Agentic Heartbeatargued for replacing a dumb clock with judgment; the sign-sorter is judgment again, this time pointed at a system's own output instead of at a schedule.

Which makes the practical program clear, and it is almost the opposite of the one the field is running. It is not "make models hallucinate less." It is four things.

  • Separate the knobs. Stop using one dial to fix two problems. Turn the dreaming up at generation and turn the rigour up at verification. A wild generator behind a strict checker beats a timid model that was lobotomized into never guessing.
  • Build verifiers, not suppressors. Every domain that gains a cheap, trustworthy check of true-versus-false unlocks the dream loop for itself. The frontier is not better generators; we have those. It is better, broader, more trustworthy checks.
  • Reward true surprise, not merely the absence of error. An evaluation that gives full marks to "it is a dog" will breed a system full of dogs. Score novelty conditional on truth, or you will keep selecting for the least interesting correct answer.
  • Protect the tails. Diversity is the raw ore the loop refines. A model regressed to the mean has no positive hallucinations left to mine. Alignment that flattens the distribution to buy today's safety is quietly eating tomorrow's creativity.

The lie and the idea are the same act with opposite outcomes, and for three years we have gotten very good at one half of the problem. The trouble is that the half we mastered — suppression — is also the half that kills creativity, because the only tool it ever had was the instruction to be less surprising. The other half, the half that turns a dreaming machine into a creative one, was never a tuning problem at all. It is a sorting problem and a loop: dream without limit, keep only what is true, and grow on the difference.

A model that never hallucinates also never has an idea. The system we actually want does both, on purpose — it dreams without restraint, keeps only what survives the check, and feeds on its own best surprises until the dreaming itself gets better.

Notes & Further Reading

  • Andrej Karpathy, "On the hallucination problem" (2023) — the dream-machine framing. link
  • AlphaGo versus Lee Sedol, game two, move 37 (2016) — the one-in-ten-thousand move. link
  • Margaret A. Boden, Creativity and Art: Three Roads to Surprise (Oxford, 2010) — creativity as novel, surprising, and valuable. link
  • Michael Gazzaniga, the left-brain interpreter and confabulation in split-brain patients. link
  • Shumailov et al., "AI models collapse when trained on recursively generated data," Nature (2024). link
  • Zelikman et al., "STaR: Bootstrapping Reasoning with Reasoning" (2022) — keep only the chains that reach a correct answer, retrain, repeat. link
  • Kalai et al. (OpenAI), "Why Language Models Hallucinate" (2025) — evaluation rewards guessing over abstaining. link
  • Kirk et al., "Understanding the Effects of RLHF on LLM Generalisation and Diversity" (2023) — alignment narrows the distribution. link

About the Author

MH

Markus Hav

Markus Hav is Lead Researcher for Agents at Benque Max AI Lab in Finland, where he focuses on advancing autonomous AI systems and agent architectures. His work explores the boundaries between programmed behavior and emergent intelligence in AI agents. He also serves as Head of AI Automation at Hoxhunt, applying cutting-edge agent research to real-world automation challenges.