Research

AI Model Collapse: Are We Hitting a Ceiling?

Naive scaling of AI models on generic web text is hitting genuine limits simultaneously from multiple directions: high-quality human-generated training data may be effectively exhausted by 2026, hardware memory and energy constraints are tightening, and recursive training on synthetic data risks "model collapse" — a progressive narrowing and degradation of model outputs that more compute cannot fix. However, the picture is not a simple ceiling: inference-time scaling, targeted synthetic data for specific domains, retrieval-augmented systems, and improved data curation still offer substantial headroom, meaning the crisis is specific to one phase of scaling rather than to AI progress broadly. A largely absent counterargument from non-Western researchers reframes "collapse" not as a universal technical limit but as a symptom of over-reliance on homogenized, English-centric corpora — with China, India, South Korea, and others demonstrating that domain-specific, community-anchored, and civilizationally grounded datasets can outperform brute-scale approaches on locally relevant tasks without requiring larger models.

Sources (50)

The State Of LLMs 2025: Progress, Problems, and Predictions
A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for ...
AI training data is running low – but we have a solution
The ability to create new data and datasets to train models on is available to us, but obstacles remain. Companies and economies that drop the ...
Model Collapse Is Already Happening, We Just Pretend It Isn't
Model Collapse Is Already Happening, We Just Pretend It Isn't. The weird, rare, surprising patterns that make data rich slowly get smoothed out ...
The Race to Efficiency: A New Perspective on AI Scaling Laws - arXiv
Empirical trends suggest that sustained efficiency gains can push AI scaling well into the coming decade, providing a new perspective on the ...
[2503.04870] Leveraging Large Language Models to Address Data ...
We propose strategies that utilize large language models (LLMs) to enhance machine learning performance on a limited, heterogeneous dataset of graphene ...
Researchers warn we could run out of data to train AI's by 2026
In a paper published last year, a group of researchers predicted we will run out of high-quality text data before 2026 if current AI training trends continue.
[PDF] Is the Scaling Hypothesis Falsifiable? - PhilSci-Archive
The scaling hypothesis in artificial intelligence claims that a model's cognitive ability scales with increased compute. This hypothesis has ...
Training Compute-Optimal Large Language Models - arXiv
We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B ...
[2001.08361] Scaling Laws for Neural Language Models - arXiv
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the ...
[2505.08803] Multi-modal Synthetic Data Training and Model Collapse
We expand the synthetic data training and model collapse study to multi-modal vision-language generative systems, such as vision-language models (VLMs) and ...
The unspoken bottleneck reshaping artificial intelligence
As we enter 2026, a quieter shift is underway. The limiting factor is no longer just how quickly we can process data, but how effectively we can ...
[2504.02495] Inference-Time Scaling for Generalist Reward Modeling
In this work, we investigate how to improve reward modeling (RM) with more inference compute for general queries, ie the \textbf{inference-time scalability of ...
What Are the Diversity Requirements for AI Training Data?
Key Diversity Requirements for AI Training Data · 1. Representation Across Demographics · 2. Contextual Diversity · 3. Data Sourcing and Quality.
DeepSeek Debates: Chinese Leadership On Cost, True Training ...
Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters ...
Deep limitations? Examining expert disagreement over deep learning
Experts disagreed strongly on whether deep learning could lead to HLMI. Optimists tended to focus on the importance of scale, while pessimists ...
The AI scaling debate: What the industry's top minds are saying
Yann LeCun, who worked with Hinton on pioneering AI research, has also challenged the extent of the scale doctrine. "You cannot just assume that ...
Frontier AI Training Costs 2026: GPT-4, Llama, DeepSeek
Cost of training frontier AI models 2026: GPT-4 ($100M+), Llama 3 ($25M), DeepSeek V3 ($5.6M). Fine-tuning costs $500-5K.
Scaling Laws for Efficient Mixture-of-Experts Language Models - arXiv
Mixture-of-Experts (MoE) has become a dominant architecture for scaling Large Language Models (LLMs) efficiently by decoupling total parameters ...
AI scaling myths - by Arvind Narayanan and Sayash Kapoor
This view rests on a series of myths and misconceptions. The seeming predictability of scaling is a misunderstanding of what research has shown.
GPT-4 vs 4o vs 4 Turbo Performance Differences - Galileo AI
GPT-4o generates about 109 tokens per second while costing roughly 50% less than GPT-4 Turbo, yet GPT-4 still leads on complex reasoning tasks.
Optimizing Token Usage for AI Efficiency in 2025 - Sparkco
Explore advanced strategies for optimizing token usage in AI, reducing costs, and enhancing performance in 2025.
A Survey of Synthetic Data Generation for Rare Events - arXiv
The core challenge lies in training models to effectively distinguish and reproduce the rare extreme patterns while not being overwhelmed by the ...
Emergent Abilities in Large Language Models: A Survey - arXiv
Emergence arises from complex scaling dynamics, with performance leaps after critical thresholds. Focused on multiple-choice tasks; retrospective analysis may ...
The Quadrillion-Dollar Disagreement on AI and the Economy
Those predicting explosive growth are not making a claim about 2026. They are making claims about trajectories—some betting that compute scaling ...
[2406.05303] Beyond Efficiency: Scaling AI Sustainably - arXiv
This paper characterizes the carbon impact of AI, including both operational carbon emissions from training and inference as well as embodied carbon emissions.
Scaling Paradox in Complex Systems - Emergent Mind
Scaling Paradox is a phenomenon where standard power-law relationships fail or produce contradictory results across various complex systems.
"Worse" AI Counterintuitively Enhances Human Decision Making ...
AI systems that more closely align with human notions of confidence could lead to more effective human-AI collaboration.
A Systematic Study of Benchmark Saturation - arXiv
A benchmark is saturated when top-performing models cannot be statistically distinguished and performance approaches the empirically observed ceiling. This ...
Shifting Product Priorities: What Firms Have Added in the Last Five ...
New polling data shows asset managers prioritizing alternatives, ETFs, and personalized investment solutions to meet evolving investor demands.
Diagnosing Bottlenecks in Data Visualization Understanding by...
Summary: This paper investigates why modern vision-language models (VLMs) fail to understand data visualizations, arguing it's unclear if the ...
What Matters in Transformers? Not All Attention is Needed - arXiv
Our findings reveal that while dropping MLP layers negatively impacts performance, dropping Attention layers, i.e., the core of Transformer architectures which ...
Chinchilla data-optimal scaling laws: In plain English - LifeArchitect.ai
For a fixed compute budget, Chinchilla showed that we need to be using 11× more data during training than that used for GPT-3 and similar models.
The engineering challenges of scaling interpretability - Anthropic
Our researchers reflect on the close relationship between scientific and engineering progress, and discuss the technical challenges they encountered in scaling ...
Grok 5 and AGI: What xAI's Model Roadmap Means for AI Builders
xAI is training seven models simultaneously, scaling from 1T to 10T parameters. Here's what Elon Musk's Grok 5 AGI roadmap means for the AI ...
[2602.13626] Benchmark Leakage Trap: Can We Trust LLM-based ...
This phenomenon occurs when LLMs are exposed to and potentially memorize benchmark datasets during pre-training or fine-tuning, leading to ...
Pretraining vs. Fine-Tuning vs. RAG: Choosing the Right AI Approach
Explore the pros and cons of pretraining, fine-tuning, and RAG for AI projects. Learn which path best balances cost, speed, control, ...
[2503.16040] Evaluating Test-Time Scaling LLMs for Legal Reasoning
DeepSeek-R1 exhibits clear advantages in Chinese legal reasoning, while OpenAI's o1 achieves comparable results on English tasks. We further ...
Reasoning or Retrieval? A Study of Answer Attribution on Large ...
Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, ...
Scaling Laws Revisited: Modeling the Role of Data Quality in ... - arXiv
Abstract:Scaling laws for language model training traditionally characterize how performance scales with model size and dataset volume.
Free energy principle - Wikipedia
The free energy principle is a mathematical principle of information physics. Its application to fMRI brain imaging data as a theoretical framework
Energy Markets Race to Solve the AI Power Bottleneck
Developers expect power constraints by 2027–2028 due to underinvestment in grids and potential supply chain disruption. · Off-grid is rising.
[AN #156]: The scaling hypothesis: a plan for building AGI
More powerful NNs are “just” scaled-up weak NNs, in much the same ... and then train a new model with next-word prediction on that dataset.
Multimodal Video Data for AI Model Training - Oxylabs
Multimodal model training can require hundreds or thousands of terabytes of multimodal data per month. As a result, your data acquisition costs might skyrocket.
[PDF] LLM: Retreival vs. Parametric Memory Tradeoff - Diva-Portal.org
This study thus explores the trade-off between “memory” (internal parametric knowledge) and “retrieval”. (external knowledge lookup) in LLMs, aiming to ...
What is GPT-4 and Why is it Better Than GPT-3? - Moveworks
Compared to GPT-3, GPT-4 showed significant performance improvements, achieving higher percentiles on all exams tested. Although these exams ...
Trends in Artificial Intelligence | Epoch AI
The amount of compute used to train frontier language models has grown exponentially. Since 2020, the trend among top-5 models has grown by a ...
"What Is The Performance Ceiling of My Classifier?" Utilizing...
Among its key tools, influence functions provide a powerful framework to quantify the impact of individual training samples on model predictions ...
RLoop: An Self-Improving Framework for Reinforcement Learning ...
This loop of exploration and exploitation via iterative re-initialization effectively converts transient policy variations into robust ...
The Great AI Silicon Shortage - SemiAnalysis
The Compute Shortage. Token demand is skyrocketing and the need for AI compute continues to accelerate.
Beyond Attention: New Possibilities for AI Architectures
The transformer architecture represents one of the most consequential breakthroughs in the history of machine learning. The work of “Attention” authors Vaswani ...

Listen to the full discussion

Read the article