Latest Scaling Laws Research Papers
The newest Scaling Laws papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Scaling Laws so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Scaling Laws papers in your inbox — free →Recent papers
- OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinibAbhijoy Sarkar, Aarchi Singh Thakur · arXiv · Jun 9, 2026
Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computatio…
- An Agency-Transferring Model-Free Policy Enhancement TechniqueAnton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko · arXiv · Jun 8, 2026
Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal polic…
- Looped Diffusion Language ModelsSanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee et al. · arXiv · May 25, 2026
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that selecti…
- Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic FrequencyMatthew L. Smith, Jonathan P. Shock, Samuel T. Segun, Iyiola E. Olatunji et al. · arXiv · May 18, 2026
While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an autom…
- Safety and accuracy follow different scaling laws in clinical large language modelsSebastian Wind, Tri-Thien Nguyen, Jeta Sopa, Mahshad Lotfinia et al. · arXiv · May 5, 2026
Clinical LLMs are often scaled by increasing model size, context length, retrieval complexity, or inference-time compute, with the implicit expectation that higher accuracy implies safer behavior. This assumption is incomplete in medicine, …
- Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language ModelsGongbo Zhang, Wen Wang, Ye Tian, Li Yuan · arXiv · Apr 29, 2026
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference…
- Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment SelectionSijie Li, Shanda Li, Haowei Lin, Weiwei Sun et al. · arXiv · Apr 24, 2026
Scaling laws are used to plan multi-million-dollar training runs, but fitting those laws can itself cost millions. In modern large-scale workflows, assembling a sufficiently informative set of pilot experiments is already a major budget-all…
- VLA Foundry: A Unified Framework for Training Vision-Language-Action ModelsJean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang et al. · arXiv · Apr 21, 2026
We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines…
- Scaling-Law Analysis of SignSGD: From Feature-Space Linear Regression to LLM Pre-trainingBinghui Li, Jianan Wang, Jinbo Wang, Lean Wang et al. · Sci4DL 2026 · Mar 2, 2026
Despite their widespread use in deep learning, the mechanisms underlying the effectiveness of adaptive gradient methods in large-scale training remain poorly understood. In this work, we provide a scaling-law analysis of SignSGD, a minimal …
- Configuration-to-Performance Scaling Law with Neural AnsatzHuaqing Zhang, Kaiyue Wen, Tengyu Ma · Sci4DL 2026 · Mar 2, 2026
Researchers build scaling laws to forecast the training performance of expensive large-scale runs with larger model size $N$ and data size $D$. These laws assume that other training hyperparameters are optimally chosen, which can require si…
- Configuration-to-Performance Scaling Law with Neural AnsatzHuaqing Zhang, Kaiyue Wen, Tengyu Ma · ICLR 2026 Workshop DATA-FM · Mar 2, 2026
Researchers build scaling laws to forecast the training performance of expensive large-scale runs with larger model size $N$ and data size $D$. These laws assume that other training hyperparameters are optimally chosen, which can require si…
- DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous DrivingYingyan Li, Shuyao Shang, Weisong Liu, Bing Zhan et al. · arXiv.org · Oct 14, 2025
Scaling Vision-Language-Action (VLA) models on large-scale data offers a promising path to achieving a more generalized driving intelligence. However, VLA models are limited by a ``supervision deficit'': the vast model capacity is supervise…
- Towards a Comprehensive Scaling Law of Mixture-of-ExpertsGuoliang Zhao, Yuhan Fu, Shuaipeng Li, Xingwu Sun et al. · arXiv.org · Sep 28, 2025
Mixture-of-Experts (MoE) models have become the consensus approach for enabling parameter-efficient scaling and cost-effective deployment in large language models. However, existing scaling laws for dense models are inapplicable to MoE mode…
- Relative-Based Scaling Law for Neural Language ModelsBaoqing Yue, Jinyuan Zhou, Zixi Wei, Jingtao Zhan et al. · ICLR 2026 Conference Withdrawn Submission · Sep 20, 2025
Scaling laws aim to accurately predict model performance across different scales. Existing scaling-law studies almost exclusively rely on cross-entropy as the evaluation metric. However, cross-entropy provides only a partial view of perform…
- Scaling Law for Code: A More Data-Hungry RegimeXianzhen Luo, Wenzhen Zheng, Qingfu Zhu, Rongyi Zhang et al. · ICLR 2026 Conference Withdrawn Submission · Sep 19, 2025
The training of large language models (LLMs) for code generation incurs substantial computational costs, yet the resource allocation strategies are often guided by scaling laws derived from natural language (NL). Given the distinct statisti…
- LayerMix Law: Scaling Law for Large Language Models on Quality-Weighted Mixture Data with RepetitionFengze Liu, Weidong Zhou, BINBINLIU, Ping Guo et al. · ICLR 2026 Conference Withdrawn Submission · Sep 19, 2025
Upweighting high-quality data in large language model (LLM) pretraining typically improves performance. However, the limited availability of high-quality data—particularly in overtrained regimes—means that stronger upweighting often increas…
- P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation ModelsTingjia Shen, Hao Wang, Chuhan Wu, Jin Yao Chin et al. · NeurIPS 2025 poster · Sep 18, 2025
With the growing size of data and models in Large Recommendation Models, the time required for debugging has become increasingly prohibitive, underscoring the urgent need for effective guidance in parameter configuration. The Scaling Law (S…
- Predictable Scale (Part II) --- Farseer: A Refined Scaling Law in LLMsHouyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding et al. · NeurIPS 2025 spotlight · Sep 18, 2025
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innov…
- Kinetics: Rethinking Test-Time Scaling LawRanajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Beidi Chen · NeurIPS 2025 poster · Sep 18, 2025
We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottl…
- Parallel Scaling Law for Language ModelsMouxiang Chen, Binyuan Hui, Zeyu Cui, Jiaxi Yang et al. · NeurIPS 2025 poster · Sep 18, 2025
It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce another and more inference-efficie…
- Beyond Scaling Law: A Data-Efficient Distillation Framework for ReasoningXiaojun Wu, Xiaoguang Jiang, Huiyang Li, Jucai Zhai et al. · arXiv.org · Aug 13, 2025
Large language models (LLMs) demonstrate remarkable reasoning capabilities in tasks such as algorithmic coding and mathematical problem-solving. Recent methods have improved reasoning through expanded corpus and multistage training combinin…
- Scaling Law for Quantization-Aware TrainingMengzhao Chen, Chaoyi Zhang, Jing Liu, Yutao Zeng et al. · arXiv.org · May 20, 2025
Large language models (LLMs) demand substantial computational and memory resources, creating deployment challenges. Quantization-aware training (QAT) addresses these challenges by reducing model precision while maintaining performance. Howe…
- Parallel Scaling Law for Language ModelsMouxiang Chen, Binyuan Hui, Zeyu Cui, Jiaxin Yang et al. · arXiv.org · May 15, 2025
It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more inference-effic…
- Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingXinji Mai, Haotian Xu, Zhong-Zhi Li, W. Xing et al. · Semantic Scholar · May 12, 2025
Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents au…
- A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling LawQianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li et al. · arXiv.org · May 5, 2025
This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic"slow thinking"- a reasoning process inspired by human cognition, as described in Kahneman's Thinking, Fast and Slow. These models, like Ope…
- a1: Steep Test-time Scaling Law via Environment Augmented GenerationLingrui Mei, Shenghua Liu, Yiwei Wang, Baolong Bi et al. · arXiv.org · Apr 20, 2025
Large Language Models (LLMs) have made remarkable breakthroughs in reasoning, yet continue to struggle with hallucinations, logical errors, and inability to self-correct during complex multi-step tasks. Current approaches like chain-of-thou…
- Unsourced Random Access in MIMO Quasi-Static Rayleigh Fading Channels: Finite Blocklength and Scaling Law AnalysesJunyuan Gao, Yongpeng Wu, Giuseppe Caire, Wei Yang et al. · IEEE Transactions on Information Theory · Mar 21, 2025
This paper considers the unsourced random access (URA) problem with a random and unknown number of active users in multiple-input multiple-output (MIMO) quasi-static Rayleigh fading channels. We derive non-asymptotic achievability bounds on…
- L2M: Mutual Information Scaling Law for Long-Context Language ModelingZhuo Chen, Oriol Mayn'e i Comas, Zhuotao Jin, Di Luo et al. · arXiv.org · Mar 6, 2025
We present a universal theoretical framework for understanding long-context language modeling based on a bipartite mutual information scaling law that we rigorously verify in natural language. We demonstrate that bipartite mutual informatio…
- Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model PretrainingHouyi Li, Wenzheng Zheng, Qiufeng Wang, Hanshan Zhang et al. · Semantic Scholar · Mar 6, 2025
The impressive capabilities of Large Language Models (LLMs) across diverse tasks are now well established, yet their effective deployment necessitates careful hyperparameter optimization. Although existing methods have explored the influenc…
- Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User ModelBencheng Yan, Shilei Liu, Zhiyuan Zeng, Zihao Wang et al. · Web Search and Data Mining · Feb 12, 2025
Recent advancements in autoregressive Large Language Models (LLMs) have achieved remarkable progress, largely driven by their scalability—commonly formalized as the scaling law. Inspired by these successes, there has been growing interest i…