Agents & Foundation

Latest Foundation Models Research Papers

The newest Foundation Models papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Foundation Models so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Foundation Models papers in your inbox — free →

Recent papers

Graph Learning on Ensembles of Cyclic Peptides: An Investigation of Molecular Ensemble Modeling
Aaron Feller, Kris Deibler, Maxim Secor · arXiv · Jul 23, 2026
Molecular property prediction from structure often uses a single representative conformation, even though many molecules exist as conformational ensembles in solution. We introduce EnsembleEGNN, a molecular ensemble foundation model that en…
The Blessing of Dimensionality: How Near-Orthogonality in High-Dimensional Spaces Explains Temporal Portability
Abigail Woodring, Adrian Chan, Rana Muhammad Shahroz Khan, Sukwon Yun et al. · arXiv · Jul 22, 2026
Fine-tuning has been widely used to adapt large language models (LLMs) for domain-specific tasks. Parameter efficient fine-tuning (PEFT) methods such as low-rank adaptation (LoRA) are frequently used to reduce computational costs. PortLLM i…
Self-supervision drives representational convergence in medical foundation models more than clinical supervision
Soroosh Tayebi Arasteh, Sebastian Ziegelmayer, Mahshad Lotfinia, Lisa Adams et al. · arXiv · Jul 22, 2026
Medical image encoders from different groups are increasingly treated as interchangeable, on the assumption that scale and clinical supervision concentrate their representations onto a shared structure. Whether this convergence is real, wha…
DBMol: Design of High-Affinity, Target-Specific Small Molecules through Structure Prediction Models
Yiming Qin, Kai Yi, Miruna Cretu, Sjors H. W. Scheres et al. · arXiv · Jul 21, 2026
Designing small molecule ligands that bind with high affinity to specific protein pockets is a fundamental goal in drug discovery, as small molecules constitute a major fraction of approved therapeutics. Recent breakthroughs in structure pr…
RoboTTT: Context Scaling for Robot Policies
Yunfan Jiang, Yevgen Chebotar, Ruijie Zheng, Fengyuan Hu et al. · arXiv · Jul 16, 2026
Recent robot foundation models operate with single-step or short-history visuomotor context. We introduce Test-Time-Training Robot Policies (RoboTTT), a robot model and training recipe that scale visuomotor context to 8K timesteps, three or…
The Spectrum Is Not Enough: When Context Helps Time-Series Forecasting
Mert Onur Cakiroglu, Mehmet Dalkilic, Hasan Kurban · arXiv · Jul 14, 2026
A growing family of indices scores how predictable a series is from its spectrum. Practitioners increasingly read these scores as answering a different question: whether \emph{adding context}, a longer lookback, a retrieval plug-in, or a pr…
CoCoT-EEG: Contrastive-Pretrained Multiscale Convolutional Transformer for EEG Decoding
Gabriel Mahuas, Victoria Shevchenko, Ugo Tanielian, Yassir Bendou et al. · arXiv · Jul 10, 2026
Self-supervised pretrained foundation models (FM) have shown early promise for non-invasive electroencephalogram (EEG) decoding applications. Many recent large-scale models converged on the approach of tokenizing raw EEG followed by masked …
A Sovereign, Open-Source Foundation Model for German and English
The Soofi-Team, :, Benedikt Droste, David Fitzek et al. · arXiv · Jul 10, 2026
We present Soofi S 30B-A3B, a sovereign, open-source Mixture-of-Experts (MoE) hybrid Mamba Transformer foundation model for German and English. Its hybrid design activates only 3B of 30B parameters per token and keeps the inference cache ne…
Co-LMLM: Continuous-Query Limited Memory Language Models
Yair Feldman, Linxi Zhao, Nathan Godey, Dongyoung Go et al. · arXiv · Jul 8, 2026
Limited memory language models (LMLMs) externalize factual knowledge during pretraining to a knowledge base (KB), rather than memorizing it in their weights. During generation, the model then fetches knowledge from the KB as needed. This re…
MedPMC: A Systematic Framework for Scaling High-Fidelity Medical Multimodal Data for Foundation Models
Hyunjae Kim, Dain Kim, Pan Xiao, Serina S. Applebaum et al. · arXiv · Jul 8, 2026
Medicine is inherently multimodal, requiring clinicians to synthesize information across diverse data streams. Yet the development of multimodal foundation models is constrained by limited access to large-scale, high-quality clinical data. …
ELSA3D: Elastic Semantic Anchoring for Unified 3D Understanding and Generation
Tianjiao Yu, Xinzhuo Li, Yifan Shen, Onkar Susladkar et al. · arXiv · Jul 7, 2026
Unified 3D foundation models aspire to generate 3D assets and reason about them in language within a single backbone, but their text-3D interaction remains largely implicit. Existing methods concatenate text and 3D tokens into a flat sequen…
Canopy: A Heterograph Foundation Model for Metabolic Engineering
Jake Bowden, Laurence Legon, Satnam Surae · arXiv · Jul 7, 2026
Designing microbial strains that produce high-value chemicals at commercially viable titers remains a central challenge in metabolic engineering. Existing computational approaches either rely on stoichiometric constraint-based models that c…
TriA Pipeline: A Large-Scale Automatic Audio Annotation Pipeline For Audio Classification In Specific Scenarios
Hong Lyu, Mingru Yang, Qianhua He, Yanxiong Li et al. · arXiv · Jul 7, 2026
There are some datasets of varying scales for audio classification (AC) applied to different tasks. However, annotated data is limited for most scenarios, such as domestic environments. To address this challenge, we propose an $\textbf{A}$u…
The State-Prediction Separation Hypothesis
Giovanni Monea, Nathan Godey, Kianté Brantley, Yoav Artzi · arXiv · Jul 1, 2026
Transformers use the same forward computation stream to both predict the next token and store useful state for future token predictions. We formulate the \emph{state-prediction separation hypothesis}: disentangling the two roles yields bett…
TiRex-2: Generalizing TiRex to Multivariate Data and Streaming
Patrick Podest, Marco Pichler, Elias Bürger, Levente Zólyomi et al. · arXiv · Jul 1, 2026
We introduce TiRex-2, a recurrent xLSTM-based time series foundation model that generalizes the univariate TiRex to multivariate forecasting with both past and future covariates. Real-world forecasting is inherently sequential: observations…
One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining
Philip Zmushko, Egor Petrov, Nursultan Abdullaev, Mikhail Khrushchev et al. · arXiv · Jun 29, 2026
Modern large-scale LLM pretraining benefits from utilizing Pipeline Parallelism; however, synchronous implementations leave GPUs idle during pipeline bubbles, wasting computational resources. Asynchronous Pipeline Parallelism eliminates the…
Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders
Nathanaël Jacquier, Maria Vakalopoulou, Mahdi S. Hosseini · arXiv · Jun 25, 2026
Sparse autoencoders (SAEs) have become a leading tool for interpreting the representations of vision foundation models, decomposing their polysemantic activations into a larger set of sparse, more monosemantic features. The Top-$k$ SAE, a n…
How Good Can Linear Models Be for Time-Series Forecasting?
Lang Huang, Jinglue Xu, Luke Darlow · arXiv · Jun 25, 2026
Time-series forecasting research has been moving steadily toward larger architectures, from specialized transformers to general-purpose foundation models, on the assumption that capacity is what unlocks accuracy. We take the opposite positi…
Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
Juliana Li, Diya Sreedhar · arXiv · Jun 24, 2026
Midway through an ordinary pretraining run, a small language model learns the pronoun-gender rule: cued with a girl's name ("Sue cried because"), it resolves the next pronoun to she, generalizing to held-out probes (0.94 by step 925). By st…
A Fair Evaluation of Graph Foundation Models for Node Property Prediction
Oleg Platonov, Gleb Bazhenov, Dmitry Eremeev, Liudmila Prokhorenkova · arXiv · Jun 23, 2026
Due to the wide use of graph-structured data in different fields of industry and science, the development of Graph Foundation Models (GFMs) has recently attracted a lot of attention. While many different types of models are called GFMs, par…
DiT-Reward: Generative Representations for Text-to-Image Reward Modeling
Yuanming Yang, Guoqing Ma, Bo Wang, Yuan Zhang et al. · arXiv · Jun 22, 2026
Can representations learned for image generation also support the evaluation of generated images? We study text-to-image reward prediction as a downstream task of generative representation learning. To this end, we introduce DiT-Reward, whi…
Hedgementation = Hedgerow Segmentation: A Remote Sensing Benchmark
Nathan Senyard, Salem Hamdani, Astrid Zhang, Derek Wang et al. · arXiv · Jun 22, 2026
We propose Hedgementation: a new benchmark to evaluate machine learning models for hedgerow mapping from remote sensing data at country scale and 10m$^2$ spatial resolution. We combine and harmonize multiple remote sensing data products and…
Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers
Tianyi Li, Zhiqiang Shen · arXiv · Jun 22, 2026
Linear mode connectivity (LMC) provides a promising foundation for understanding and merging independently trained neural networks, but existing methods typically optimize the interpolation path from only one model endpoint, limiting their …
UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning
Wenhao Chi, Arkaprava Sinha, Dominick Reilly, Hieu Le et al. · arXiv · Jun 18, 2026
Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive ego…
Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models
Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev et al. · arXiv · Jun 17, 2026
Embodied Vision-Language-Action (VLA) models are typically obtained by fine-tuning powerful pretrained VLMs on robotics data, yet it is unclear how much commonsense and factual knowledge they retain after adaptation. Failures on knowledge-s…
Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models
Ramprasath Ganesaraja, Sahil Dilip Panse, Swathika N · arXiv · Jun 16, 2026
State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, re…
Exact Posterior Score Estimation for Solving Linear Inverse Problems
Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov et al. · arXiv · Jun 15, 2026
Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior prov…
Geometric Action Model for Robot Policy Learning
Jisang Han, Seonghu Jeon, Jaewoo Jung, René Zurbrügg et al. · arXiv · Jun 15, 2026
Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inheri…
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Tongyan Fang, Siyuan Huang, Naiyu Fang, Ganlong Zhao et al. · arXiv · Jun 15, 2026
When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce th…
Beyond task performance: Decoding bioacoustic embeddings with speech features
Ines Nolasco, Jules Cauzinille, Marius Miron, Gagan Narula et al. · arXiv · Jun 12, 2026
Pretrained audio embeddings are standard in bioacoustics, yet little is known about which acoustic features these models encode, nor which are useful for a given task. This hinders transparency and limits extension to rare species or data-s…

Track Foundation Models on Distill AI — start free →

Latest Foundation Models Research Papers

Recent papers

Related topics