Neural networks could become dumber within a few years, but it can be stopped. Photo.

Neural networks could become dumber within a few years, but it can be stopped

Neural networks like ChatGPT and Gemini consume an astronomical amount of energy and learn from human-written texts. But what happens when high-quality content written by people becomes critically scarce? A new study has shown that if models begin training primarily on texts generated by other models, they will inevitably start to degrade. The good news is that scientists have found a surprisingly simple way to prevent this.

What Is Model Collapse in Neural Networks in Simple Terms

Imagine you make a photocopy of a document, then a photocopy of that copy, then a copy of the copy of the copy. With each generation, the text becomes less and less legible, until it turns into a blurry smudge. Roughly the same thing happens to neural networks when they learn from texts they themselves generated.

This phenomenon is called model collapse, a term that appeared in 2024 and was mentioned in the scientific journal Nature. When a new model trains on data created by a previous model, it loses some information. With each repeated training cycle, the diversity of responses shrinks, and errors accumulate.

And this is mathematically proven. The central limit theorem guarantees that each generation of training on synthetic data reduces variance and destroys rare but critically important patterns. Research in the areas of text, code, and image generation confirms this theoretical model and shows measurable degradation after just five generations.

Why the Internet Is Running Out of Human Content

Neural networks learn from texts on the internet: books, articles, forums, Wikipedia, and scientific papers. But this resource is finite. According to estimates by the research group Epoch AI, the total supply of high-quality public text is approximately 300 trillion tokens, and at current rates, language models will completely exhaust it between 2026 and 2032.

At the same time, AI consumes data faster than humans create it. Text generators produce billions of words daily, image generators fill stock photo libraries, and AI assistants write code that ends up in public repositories. All this artificial content inevitably flows back into the training datasets of new models, creating a closed loop of degradation.

Companies have already started buying content from publishers and media outlets. OpenAI and Google are racing to sign licensing deals for high-quality data sources. But this is only a delay, not a solution to the problem.

How AI Content Differs from Human Content

At first glance, text written by a neural network can look indistinguishable from human writing. But from a training perspective, the difference is enormous.

Human texts carry lived experience: doubts, mistakes, compromises, taboos, and unspoken rules by which people actually make decisions. Content from neural networks doesn’t provide this. It’s merely a retelling of what the neural network already knows, so there is almost no new experience in it.

Human data contains nuances that synthetic texts lose

Human data contains nuances that synthetic texts lose

If a neural network repeatedly learns from its own texts, it starts reinforcing its own mistakes. Training data becomes poorer and more monotonous, and biases intensify. As a result, AI responses may become increasingly formulaic, bland, and inaccurate over time. In the worst case, the model begins confidently presenting fabrications as facts.

How to Prevent Neural Network Model Collapse

On May 14, 2026, a study published in the journal Physical Review Letters proposed an unexpectedly elegant solution to the problem.

The researchers studied so-called closed-loop training — a process in which a model repeatedly trains on data it generated itself.

It turned out that to save a model, sometimes just a tiny amount of real data is enough. Even a single real example added to the training dataset can prevent AI from sliding into model collapse, even though almost all other data is artificial.

According to Professor Yasser Roudi, the authors deliberately chose a simple model to understand the mechanism itself without excessive mathematics. This way they demonstrated that even a tiny portion of real information acts as an anchor, preventing the model from generating nonsense.

Why the Future of Neural Networks Depends on Humans

It’s important to emphasize that model collapse has not yet occurred at full scale in real-world operating systems. But that doesn’t mean the problem doesn’t exist. Model collapse is already happening, and users continue doing exactly what makes it worse — massively generating content with AI and publishing it openly.

The current study is a first step. The team hopes that by demonstrating patterns on simple but powerful models, they can formulate principles for preventing collapse in more complex language models, like those behind ChatGPT.

The next stage is to verify whether the principle works on complex models. If it does, this could become a practical tool for developers. As Roudi noted, engineers building the next ChatGPT can use our results to develop models that don’t collapse.

Ultimately, it turns out that the more powerful neural networks become, the more they depend on humans. Machines can generate billions of words per second, but without human content, those words gradually lose meaning. It turns out that even a single example from a living person can be enough to keep AI from self-destruction.