Infrastructure

Latest Scaling Laws Research Papers

The newest Scaling Laws papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Scaling Laws so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Scaling Laws papers in your inbox — free →

Recent papers

TriA Pipeline: A Large-Scale Automatic Audio Annotation Pipeline For Audio Classification In Specific Scenarios
Hong Lyu, Mingru Yang, Qianhua He, Yanxiong Li et al. · arXiv · Jul 7, 2026
There are some datasets of varying scales for audio classification (AC) applied to different tasks. However, annotated data is limited for most scenarios, such as domestic environments. To address this challenge, we propose an $\textbf{A}$u…
How Width and Data Shape Generalization Scaling Laws in Quadratic Neural Networks
Julius Girardin, Emanuele Troiani, Yizhou Xu, Vittorio Erba et al. · arXiv · Jun 26, 2026
Understanding how performance scales jointly with model size and data is a central problem in modern machine learning. Existing theoretical works on scaling laws typically describe generalization as a function of data or compute, often in f…
The Inference-Compute Frontier and a Latency-Efficient Architecture for Limit Order Book Prediction
C. Evans Hedges · arXiv · Jun 24, 2026
We study whether a scaling-law-style inference-compute frontier appears in limit order book prediction. Using FI-2010 and a suite of models ranging from small decision trees to neural LOB architectures, we find that the realized empirical f…
Hedgementation = Hedgerow Segmentation: A Remote Sensing Benchmark
Nathan Senyard, Salem Hamdani, Astrid Zhang, Derek Wang et al. · arXiv · Jun 22, 2026
We propose Hedgementation: a new benchmark to evaluate machine learning models for hedgerow mapping from remote sensing data at country scale and 10m$^2$ spatial resolution. We combine and harmonize multiple remote sensing data products and…
Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers
Tianyi Li, Zhiqiang Shen · arXiv · Jun 22, 2026
Linear mode connectivity (LMC) provides a promising foundation for understanding and merging independently trained neural networks, but existing methods typically optimize the interpolation path from only one model endpoint, limiting their …
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks
Mengyu Zheng, Kai Han, Boxun Li, Haiyang Xu et al. · arXiv · Jun 10, 2026
General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and pred…
OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib
Abhijoy Sarkar, Aarchi Singh Thakur · arXiv · Jun 9, 2026
Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computatio…
An Agency-Transferring Model-Free Policy Enhancement Technique
Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko · arXiv · Jun 8, 2026
Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal polic…
Looped Diffusion Language Models
Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee et al. · arXiv · May 25, 2026
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that selecti…
Scaling-Law Analysis of SignSGD: From Feature-Space Linear Regression to LLM Pre-training
Binghui Li, Jianan Wang, Jinbo Wang, Lean Wang et al. · Sci4DL 2026 · Mar 2, 2026
Despite their widespread use in deep learning, the mechanisms underlying the effectiveness of adaptive gradient methods in large-scale training remain poorly understood. In this work, we provide a scaling-law analysis of SignSGD, a minimal …
Configuration-to-Performance Scaling Law with Neural Ansatz
Huaqing Zhang, Kaiyue Wen, Tengyu Ma · Sci4DL 2026 · Mar 2, 2026
Researchers build scaling laws to forecast the training performance of expensive large-scale runs with larger model size $N$ and data size $D$. These laws assume that other training hyperparameters are optimally chosen, which can require si…
Configuration-to-Performance Scaling Law with Neural Ansatz
Huaqing Zhang, Kaiyue Wen, Tengyu Ma · ICLR 2026 Workshop DATA-FM · Mar 2, 2026
Researchers build scaling laws to forecast the training performance of expensive large-scale runs with larger model size $N$ and data size $D$. These laws assume that other training hyperparameters are optimally chosen, which can require si…
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Yingyan Li, Shuyao Shang, Weisong Liu, Bing Zhan et al. · arXiv.org · Oct 14, 2025
Scaling Vision-Language-Action (VLA) models on large-scale data offers a promising path to achieving a more generalized driving intelligence. However, VLA models are limited by a ``supervision deficit'': the vast model capacity is supervise…
Towards a Comprehensive Scaling Law of Mixture-of-Experts
Guoliang Zhao, Yuhan Fu, Shuaipeng Li, Xingwu Sun et al. · arXiv.org · Sep 28, 2025
Mixture-of-Experts (MoE) models have become the consensus approach for enabling parameter-efficient scaling and cost-effective deployment in large language models. However, existing scaling laws for dense models are inapplicable to MoE mode…
Relative-Based Scaling Law for Neural Language Models
Baoqing Yue, Jinyuan Zhou, Zixi Wei, Jingtao Zhan et al. · ICLR 2026 Conference Withdrawn Submission · Sep 20, 2025
Scaling laws aim to accurately predict model performance across different scales. Existing scaling-law studies almost exclusively rely on cross-entropy as the evaluation metric. However, cross-entropy provides only a partial view of perform…
Scaling Law for Code: A More Data-Hungry Regime
Xianzhen Luo, Wenzhen Zheng, Qingfu Zhu, Rongyi Zhang et al. · ICLR 2026 Conference Withdrawn Submission · Sep 19, 2025
The training of large language models (LLMs) for code generation incurs substantial computational costs, yet the resource allocation strategies are often guided by scaling laws derived from natural language (NL). Given the distinct statisti…
LayerMix Law: Scaling Law for Large Language Models on Quality-Weighted Mixture Data with Repetition
Fengze Liu, Weidong Zhou, BINBINLIU, Ping Guo et al. · ICLR 2026 Conference Withdrawn Submission · Sep 19, 2025
Upweighting high-quality data in large language model (LLM) pretraining typically improves performance. However, the limited availability of high-quality data—particularly in overtrained regimes—means that stronger upweighting often increas…
P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation Models
Tingjia Shen, Hao Wang, Chuhan Wu, Jin Yao Chin et al. · NeurIPS 2025 poster · Sep 18, 2025
With the growing size of data and models in Large Recommendation Models, the time required for debugging has become increasingly prohibitive, underscoring the urgent need for effective guidance in parameter configuration. The Scaling Law (S…
Predictable Scale (Part II) --- Farseer: A Refined Scaling Law in LLMs
Houyi Li, Wenzhen Zheng, Qiufeng Wang, Zhenyu Ding et al. · NeurIPS 2025 spotlight · Sep 18, 2025
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innov…
Kinetics: Rethinking Test-Time Scaling Law
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Beidi Chen · NeurIPS 2025 poster · Sep 18, 2025
We rethink test-time scaling laws from a practical efficiency perspective, revealing that the effectiveness of smaller models is significantly overestimated. Prior work, grounded in compute-optimality, overlooks critical memory access bottl…
Parallel Scaling Law for Language Models
Mouxiang Chen, Binyuan Hui, Zeyu Cui, Jiaxi Yang et al. · NeurIPS 2025 poster · Sep 18, 2025
It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce another and more inference-efficie…
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
Xiaojun Wu, Xiaoguang Jiang, Huiyang Li, Jucai Zhai et al. · arXiv.org · Aug 13, 2025
Large language models (LLMs) demonstrate remarkable reasoning capabilities in tasks such as algorithmic coding and mathematical problem-solving. Recent methods have improved reasoning through expanded corpus and multistage training combinin…
Revisiting the theory of van Driest: a general scaling law for the skin-friction coefficient of high-speed turbulent boundary layers
Zhiye Zhao, Lin Fu (Associate Professor) · Journal of Fluid Mechanics · May 29, 2025
Abstract The skin-friction coefficient is a dimensionless quantity defined by the wall shear stress exerted on an object moving in a fluid, and it decreases as the Reynolds number increases for wall-bounded turbulent flows over a flat plate…
Scaling Law for Quantization-Aware Training
Mengzhao Chen, Chaoyi Zhang, Jing Liu, Yutao Zeng et al. · arXiv.org · May 20, 2025
Large language models (LLMs) demand substantial computational and memory resources, creating deployment challenges. Quantization-aware training (QAT) addresses these challenges by reducing model precision while maintaining performance. Howe…
Parallel Scaling Law for Language Models
Mouxiang Chen, Binyuan Hui, Zeyu Cui, Jiaxin Yang et al. · arXiv.org · May 15, 2025
It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more inference-effic…
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai, Haotian Xu, Zhong-Zhi Li, W. Xing et al. · Semantic Scholar · May 12, 2025
Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents au…
A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law
Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li et al. · arXiv.org · May 5, 2025
This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic"slow thinking"- a reasoning process inspired by human cognition, as described in Kahneman's Thinking, Fast and Slow. These models, like Ope…
a1: Steep Test-time Scaling Law via Environment Augmented Generation
Lingrui Mei, Shenghua Liu, Yiwei Wang, Baolong Bi et al. · Annual Meeting of the Association for Computational Linguistics · Apr 20, 2025
Large Language Models (LLMs) have made remarkable breakthroughs in reasoning, yet continue to struggle with hallucinations, logical errors, and inability to self-correct during complex multi-step tasks. Current approaches like chain-of-thou…
Unsourced Random Access in MIMO Quasi-Static Rayleigh Fading Channels: Finite Blocklength and Scaling Law Analyses
Junyuan Gao, Yongpeng Wu, Giuseppe Caire, Wei Yang et al. · IEEE Transactions on Information Theory · Mar 21, 2025
This paper considers the unsourced random access (URA) problem with a random and unknown number of active users in multiple-input multiple-output (MIMO) quasi-static Rayleigh fading channels. We derive non-asymptotic achievability bounds on…
L2M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen, Oriol Mayn'e i Comas, Zhuotao Jin, Di Luo et al. · arXiv.org · Mar 6, 2025
We present a universal theoretical framework for understanding long-context language modeling based on a bipartite mutual information scaling law that we rigorously verify in natural language. We demonstrate that bipartite mutual informatio…

Track Scaling Laws on Distill AI — start free →

Latest Scaling Laws Research Papers

Recent papers

Related topics