Latest Foundation Models Research Papers
The newest Foundation Models papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Foundation Models so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Foundation Models papers in your inbox — free →Recent papers
- A Unifying Lens on Supervised Fine-Tuning Through Target Distribution DesignTong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An et al. · arXiv · Jun 9, 2026
Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot targe…
- OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinibAbhijoy Sarkar, Aarchi Singh Thakur · arXiv · Jun 9, 2026
Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computatio…
- Twelve quick tips for designing AI-driven HPC workflowsJamie J. Alnasir · arXiv · Jun 5, 2026
High-performance computing (HPC) clusters remain the backbone of large-scale scientific computation, traditionally executing deterministic, linear pipelines optimised for predictable performance. However, the pervasive integration of artifi…
- Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic ForecastingLorenzo Longarini, Alessandro Rongoni, Simone Silenzi, Emanuele Frontoni et al. · arXiv · Jun 5, 2026
At commissioning time, Photovoltaic (PV) operators must forecast production before target-site observations are available, limiting the direct use of standard supervised forecasters. This cold-start setting is addressed with a zero-shot pip…
- Pretraining Recurrent Networks without RecurrenceAkarsh Kumar, Phillip Isola · arXiv · Jun 4, 2026
Training recurrent neural networks (RNNs) requires assigning credit across long sequences of computations. Standard backpropagation through time (BPTT) addresses this problem poorly: it is sequential in time, limiting parallelism, and suffe…
- On the Scaling of PEFT: Towards Million Personal Models of Trillion ParametersMind Lab, :, Song Cao, Vic Cao et al. · arXiv · Jun 1, 2026
Parameter-efficient fine-tuning (PEFT) is usually treated as a cheaper alternative to full fine-tuning. We study a broader role: small trainable adapters as persistent local state on top of strong shared foundation models. In this framing, …
- LLMSurgeon: Diagnosing Data Mixture of Large Language ModelsYaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang et al. · arXiv · May 28, 2026
The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and failure modes. Yet this composition is rarely disclosed, making post-hoc auditing of data combination o…
- PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity PerspectiveYangyi Huang, Ruotian Peng, Zeju Qiu, Jiale Kang et al. · arXiv · May 27, 2026
Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT …
- OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured RecalibrationXinchen Zhang, Bowei Liu, Jiale Liu, Chufan Shi et al. · arXiv · May 27, 2026
Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which…
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AIShangding Gu · arXiv · May 25, 2026
This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the…
- CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and ModalitiesJunyuan Liu, Xinglei Wang, Zichao Zeng, Jiazhuang Feng et al. · arXiv · May 25, 2026
Urban representation learning encodes complex urban environments into general-purpose embeddings for diverse downstream tasks and emerging urban foundation models. However, current evaluations are limited, typically focusing on one or two c…
- A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and DeblurringAdina Scheinfeld, Haotan Zhang, Shang Mu, Rudolf L. M. van Herten et al. · arXiv · May 25, 2026
Light sheet fluorescence microscopy (LSM) enables high-resolution, three-dimensional (3D) imaging of biological specimens, providing rich volumetric data for studying cellular organization, pathology, and vascular networks. However, the siz…
- A foundation model for electrodermal activity dataLeonardo Alchieri, Matteo Garzon, Lidia Alecci, Francesco Bombassei De Bona et al. · SD4H ICML 2026 · May 23, 2026
Foundation models have recently extended beyond natural language and vision to time‑series domains, including physiological signals. However, progress in electrodermal activity (EDA) modeling is hindered by the absence of large‑scale, curat…
- CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead AdaptationAmir Mousavi, Mohammad Sadegh Sirjani, Erfan Nourbakhsh, Mimi Xie et al. · arXiv · May 21, 2026
Real-time cognitive load assessment is essential for adaptive human-computer interaction but remains challenging due to limited labeled data and poor cross-subject generalization. Recent ECG foundation models pre-trained on millions of clin…
- Distilling Tabular Foundation Models for Structured Health DataAditya Tanna, Nassim Bouarour, Mohamed Bouadi, Vinay Kumar Sankarapu et al. · arXiv · May 18, 2026
Tabular foundation models (TFMs) achieve strong performance on health datasets, but their inference cost and infrastructure requirements limit practical use. We study whether their predictive behavior can be transferred to lightweight tabul…
- Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration TrapAditya Tanna, Yash Desai, Pratinav Seth, Mohamed Bouadi et al. · arXiv · May 18, 2026
Tabular foundation models (TFMs) now match or beat tuned gradient-boosted trees on a growing fraction of tabular tasks, but no single TFM wins on every dataset. Ensembling is the go to fix here, and it works less well than expected. Six mod…
- Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts RoutingEllwil Sharma, Arastu Sharma · arXiv · May 14, 2026
Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstabl…
- MeMo: Memory as a ModelRyan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, Arun Verma et al. · arXiv · May 14, 2026
Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the nee…
- Causal Foundation Models with Continuous TreatmentsChristopher Stith, Medha Barath, Vahid Balazadeh, Jesse C. Cresswell et al. · arXiv · May 14, 2026
Causal inference, estimating causal effects from observational data, is a fundamental tool in many disciplines. Of particular importance across a variety of domains is the continuous treatment setting, where the variable of intervention has…
- Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature RegimeAlbert Alcalde, Leon Bungert, Konstantin Riedl, Tim Roith · arXiv · May 11, 2026
Transformers with self-attention modules as their core components have become an integral architecture in modern large language and foundation models. In this paper, we study the evolution of tokens in deep encoder-only transformers at infe…
- V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy PredictionMarcin Kostrzewa, Sebastian Tomczak, Roman Furman, Anna Poberezhna et al. · arXiv · May 11, 2026
Corporate bankruptcy prediction is a high-stakes financial task characterized by severe class imbalance and multi-horizon forecasting demands. Public datasets supporting it remain scarce and small: widely used free benchmarks contain betwee…
- Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets LessYuxing Liu, Jianyu Wang, Tong Zhang · arXiv · May 7, 2026
Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a bette…
- When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower BoundsHongyi Tao, Dingzhi Yu, Lijun Zhang · arXiv · May 7, 2026
Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understandin…
- Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-TuningZakarya Elmimouni, Fares Fourati, Mohamed-Slim Alouini · arXiv · May 5, 2026
Accurate school detection is essential for supporting education initiatives, including infrastructure planning and expanding internet connectivity to underserved areas. However, many regions around the world face challenges due to outdated,…
- Pretrained Model Representations as Acquisition Signals for Active Learning of MLIPsEszter Varga-Umbrich, Shikha Surana, Paul Duckworth, Jules Tilly et al. · arXiv · May 5, 2026
Training machine learning interatomic potentials (MLIPs) for reactive chemistry is often bottlenecked by the high cost of quantum chemical labels and the scarcity of transition state configurations in candidate pools. Active learning (AL) c…
- Explainable Load Forecasting with Covariate-Informed Time Series Foundation ModelsMatthias Hertel, Alexandra Nikoltchovska, Sebastian Pütz, Ralf Mikut et al. · arXiv · Apr 30, 2026
Time Series Foundation Models (TSFMs) have recently emerged as general-purpose forecasting models and show considerable potential for applications in energy systems. However, applications in critical infrastructure like power grids require …
- Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language ModelsGongbo Zhang, Wen Wang, Ye Tian, Li Yuan · arXiv · Apr 29, 2026
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference…
- Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation TrainingMahya Ramezani, Holger Voos · arXiv · Apr 29, 2026
This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level a…
- Long-Context Aware Upcycling: A New Frontier for Hybrid LLM ScalingParsa Ashrafi Fashi, Utkarsh Saxena, Mehdi Rezagholizadeh, Aref Jafari et al. · arXiv · Apr 27, 2026
Hybrid sequence models that combine efficient Transformer components with linear sequence modeling blocks are a promising alternative to pure Transformers, but most are still pretrained from scratch and therefore fail to reuse existing Tran…
- Benchmarking Pathology Foundation Models for Breast Cancer Survival PredictionFredrik K. Gustafsson, Constance Boissin, Johan Vallon-Christersson, David A. Clifton et al. · arXiv · Apr 27, 2026
Pathology foundation models (PFMs) have recently emerged as powerful pretrained encoders for computational pathology, enabling transfer learning across a wide range of downstream tasks. However, systematic comparisons of these models for cl…