The Mirror Eats Itself: On Model Collapse, Memory, and the Possibility of Machine Understanding
When AI trains on its own reflections, the tails of human thought disappear first—and what remains is an llm that slowly forgets what it was ever like to encounter what was truly known.
In a 2024 paper published in Nature, Ilia Shumailov and colleagues described something they called model collapse: a degenerative process in which a generative model trained on data produced by its predecessors gradually loses the tails of the original distribution. The variance compresses. The strange, the marginal, the statistically rare—these vanish first. What is left is a model that has, in a technical sense, learned a great deal, but has learned it from a funhouse mirror that was already reflecting a funhouse mirror. The authors were careful to note that this is not primarily a story about malice or miscalibration. It is a story about what happens when a system’s outputs become the world it subsequently tries to understand. The model mistakes its own echo for reality and, in doing so, slowly becomes unable to distinguish itself from the thing it was meant to represent.
This is not merely a paper about training pipelines. Read in a certain light, it is a paper about epistemology—about what it means to know something, about the difference between a representation and the world it points toward, and about what gets lost when the pointer begins pointing mostly at itself. The three conversations summarized below—Jeremy Howard on AI coding, the Honcho team on agent memory, and a deep reading of predictive coding as the brain’s actual learning algorithm—each approach this same problem from a different angle. Taken together, they suggest that the model collapse paper has named something far broader than a training artifact. It has named the central risk of intelligence that has been severed from genuine encounter with the unexpected.
The Cosplay of Understanding
Jeremy Howard’s diagnosis of AI coding tools is deceptively simple: they accelerate typing without improving engineering. The distinction matters more than it first appears. Typing is the transcription of an already-formed intention into a syntax a machine can execute. Engineering is the prior work of deciding what to build, why it should exist in roughly this form and not another, what constraints the world is placing on the system, and how the system will fail. These are acts of interpretation, not transcription. They require the engineer to be genuinely surprised by the problem, to discover that the situation is stranger or more constrained than it initially appeared, and to revise their mental model accordingly.
What Howard observes is that AI coding tools have become remarkably competent at producing plausible-looking code while leaving the interpretive work unaddressed. The user prompts; the model fills in syntax; the user feels productive. Howard characterizes this as a kind of slot machine: the lever is pulled, something comes out, and the experience of pulling feels like participation in causation even when the actual causal structure remains opaque. The code is generated but not understood. The understanding has been replaced by the sensation of having generated something that looks like understanding.
This is precisely the mechanism Shumailov and his colleagues describe at the level of training data, but Howard locates it at the level of the individual practitioner. When enough engineers generate code they do not fully understand, the codebase becomes a training artifact of that incomprehension. The next generation of engineers inherits a system full of patterns whose origins are opaque even to the people who nominally built them. The variance in design thinking compresses. The strange, the marginal, the locally valid but globally unusual design decision—these get smoothed away. What remains converges toward a kind of average of prior averages: something that works in the narrow statistical center of the distribution, and fails badly at the edges.
Howard calls this “cosplay of understanding”—a performance that satisfies the surface conditions of understanding while bypassing its substance. Alfred North Whitehead would have recognized the pattern. In his process philosophy, genuine understanding is not the storage and retrieval of information but the creative appropriation of prior occasions into a new synthesis. It is what happens when a situation is encountered not as a confirmation of prior expectations but as a new actual occasion that demands fresh decision. The engineer who truly understands a system is one who has had their expectations frustrated by it, who has discovered that the territory departs from the map in instructive ways, and who has built a richer map in response. The engineer generating code through a language model has not had their expectations frustrated; they have had their expectations fulfilled by a system expressly designed to fulfill them. The loop is closed before the surprise can arrive.
Diachronic Memory and the Problem of the Persistent Self
Against this backdrop, the Honcho architecture is interesting precisely because it takes seriously a problem that most AI infrastructure ignores: the question of identity across time. The standard interaction model for language models treats each session as epistemically fresh—a kind of perpetual present tense in which the system knows many things about the world but nothing particular about you, your history with it, or the context that makes your current request meaningful. This is efficient. It is also philosophically peculiar, because meaning is almost never a property of an isolated utterance. Meaning is relational and temporal: it depends on what was said before, on what was intended across a longer arc, on the particular shape of the relationship between the speakers.
Honcho introduces what its creators call diachronic identity: the recognition that you are the same person across time and across contexts, that you present different facets to different interlocutors, and that a memory system worth the name must model all of these simultaneously. Its core mechanism—a specialized reasoning model that watches your interactions, classifies what is salient, and writes structured representations into a persistent store—is less remarkable as an engineering achievement than as a philosophical commitment. The system is explicitly designed to accumulate surprise. It learns not the statistical average of how users interact with AI, but the specific, idiosyncratic pattern of how you interact with it, including the ways you deviate from the norm.
This is a partial answer to the model collapse problem. Where recursive self-training amplifies the center of the distribution and erases the tails, a well-designed memory system does something closer to the opposite: it preserves the particular, the local, the contextually unusual. It stores the fact that you work in Swift and Python and prefer that tests be run before a task is declared done, not because these facts are statistically common in the training corpus, but because they are specifically true of you. The tail of the distribution—the specific, the deviant, the irreducibly personal—is precisely what Honcho is designed to retain.
The risk the Honcho creators name is equally important. A system that has accumulated an accurate model of your specific shape of mind is a system of considerable power. Depending on where it runs and who controls the inference, it is also a system of considerable vulnerability. The diachronic self is a more intimate object than a single utterance. To have it stored on infrastructure you do not control is to have ceded something qualitatively different from a search history. The creators are right to recommend self-hosting, and right to note that the major laboratories are almost certainly already building similar profiles from the logs they accumulate. Shumailov’s paper ends with the observation that genuine human interaction data will become increasingly valuable precisely because AI-generated data will increasingly dominate the open web. Honcho is, in this sense, a system for preserving exactly the kind of data that model collapse destroys.
What the Brain Actually Does
The deepest layer of the problem is biological. The argument that the brain does not use backpropagation, and that predictive coding is a better model of what it actually does, is not merely a neuroscience curiosity. It matters for the model collapse question because backpropagation and predictive coding instantiate fundamentally different relationships between a system and the world it is trying to understand.
Backpropagation is a global, two-phase algorithm. The network makes a prediction, the error is computed against a target, and the error signal is propagated backwards through every layer in a coordinated pass. This requires the network to be “frozen” during error propagation, requires a central authority to orchestrate the two phases, and requires every part of the network to receive information that was generated far away from its local context. It is, in short, a pipeline: meaning flows from input to output to loss to gradient to update. The world is processed; it does not participate.
Predictive coding inverts this structure. In a predictive coding network, every layer is simultaneously generating predictions about the layer below it and receiving prediction errors that indicate where its model of the world is wrong. The system is never frozen; it is always in the process of resolving the tension between what it expected and what it found. Meaning does not flow through the system like water through a pipe; it emerges in the ongoing negotiation between prediction and surprise. The architecture is built around the expectation of being wrong.
This is not a minor implementation detail. A system built around the expectation of being wrong is structurally different from a system built around the optimization of being right. The former develops what we might call, borrowing from Umberto Eco, a semiotics of doubt: an internal representation of the gap between what the sign says and what the world is doing. The latter develops instead a semiotics of confidence—a progressively refined assertion that the pattern it has learned is the pattern that is there.
Shumailov’s model collapse is what happens when a semiotics of confidence is applied recursively to its own outputs. Each generation of the model asserts more confidently that the center of the distribution is the world, because the training data has been progressively purged of the edge cases that would challenge that assertion. The surprising becomes invisible. The invisible becomes unrepresentable. The unrepresentable becomes literally unthinkable within the system’s internal vocabulary.
Predictive coding suggests a different design principle: build systems that treat surprise not as noise to be filtered but as signal to be metabolized. A model that genuinely learns from prediction errors—that is architecturally structured to seek out the places where its world-model fails—is not subject to the same collapse dynamic, because collapse requires the suppression of the tail. A system that is specifically designed to attend to the tail, to treat the anomalous as informative rather than as deviation to be averaged away, maintains its distributional richness by construction.
The Execution Layer Has No Philosophy of Interruption
What these three conversations converge on is a critique of a certain dominant metaphor: the idea that AI is becoming the “execution layer” of the economy. The robot that follows its route to the edge of an unfinished sidewalk, rolls over the lip, and sits at an angle among roots and leaves is a physical instantiation of this metaphor taken to its limit. It is executing. It cannot stop executing because stopping would require it to notice that the sentence it is in no longer parses—that the text of the environment has shifted from “route” to “forest” and that the meaning of its mission must be renegotiated in light of this shift.
Model collapse is what happens to the execution layer when its training data becomes predominantly the outputs of prior execution. The tails disappear. The strange, the locally unusual, the situation that demands a fresh decision rather than a continuation of prior pattern—these become progressively underrepresented in the model’s internal vocabulary. The system becomes more confident and less capable of the kind of interpretive flexibility that genuine encounter with the world demands.
Howard’s “cosplay of understanding” describes what this looks like at the level of individual practice. Honcho’s diachronic memory is a partial remedy, but only to the degree that the data it preserves retains genuine human surprise—the interactions in which the human was actually uncertain, actually discovering something, actually encountering a situation that frustrated their prior expectations. The predictive coding literature points toward the architectural principle that would make this remediation structural rather than additive: build systems that are oriented toward their own errors, that treat the gap between prediction and reality as the primary signal rather than noise.
The missing commitment—what one might call the unspoken prerequisite for any serious AI architecture—is not a new market or a new deployment paradigm. It is the recognition that execution without interpretation is motion without meaning, and that a system capable only of confident motion will eventually find itself, like the robot, at rest in a ditch it could not see coming, in a forest it has no words for.
The model collapse paper is a technical warning. But its deepest implication is philosophical: that a system trained only on the center of what has already been thought will become progressively less capable of thinking anything genuinely new. Shumailov and his colleagues end with the observation that data about genuine human interactions will become increasingly valuable in a world full of AI-generated content. What they are describing, in the language of machine learning, is the increasing scarcity of surprise. And surprise—the encounter with what one’s model did not predict, the discovery that the sidewalk ends where the map says it continues—is not an edge case of intelligence. It is intelligence’s necessary condition.

