Agents & Foundation

Latest World Models Research Papers

The newest World Models papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks World Models so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest World Models papers in your inbox — free →

Recent papers

Concept-Guided Spatial Regularization for World Models in Atari Pong
Yukuan Lu, Zaishuo Xia, Weyl Lu, Yubei Chen · arXiv · Jul 16, 2026
World models are usually evaluated as components of model-based reinforcement learning (MBRL) systems, while the world models themselves are rarely studied in isolation. We examine five representative visual world-model agents in Atari Pong…
DriftWorld: Fast World Modeling through Drifting
Susie Lu, Haonan Chen, Weirui Ye, Yilun Du · arXiv · Jul 16, 2026
Predictive world models enable robots to plan by imagining the outcomes of their actions, but their value for control hinges on generating many rollouts quickly. This creates a bottleneck for diffusion-based world models: multistep sampling…
TriA Pipeline: A Large-Scale Automatic Audio Annotation Pipeline For Audio Classification In Specific Scenarios
Hong Lyu, Mingru Yang, Qianhua He, Yanxiong Li et al. · arXiv · Jul 7, 2026
There are some datasets of varying scales for audio classification (AC) applied to different tasks. However, annotated data is limited for most scenarios, such as domestic environments. To address this challenge, we propose an $\textbf{A}$u…
WattGPU: Predicting Inference Power and Latency on Unseen GPUs and LLMs
Mauricio Fadel Argerich, Jonathan Fürst, Marta Patiño-Martínez · arXiv · Jul 2, 2026
Large Language Model (LLM) inference workloads are a rapidly growing contributor to data center energy consumption. Optimizing these deployments requires matching specific LLMs to the most efficient GPUs, but operators currently lack the to…
Hallucination in World Models is Predictable and Preventable
Nicklas Hansen, Xiaolong Wang · arXiv · Jun 25, 2026
Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination conc…
Hedgementation = Hedgerow Segmentation: A Remote Sensing Benchmark
Nathan Senyard, Salem Hamdani, Astrid Zhang, Derek Wang et al. · arXiv · Jun 22, 2026
We propose Hedgementation: a new benchmark to evaluate machine learning models for hedgerow mapping from remote sensing data at country scale and 10m$^2$ spatial resolution. We combine and harmonize multiple remote sensing data products and…
Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers
Tianyi Li, Zhiqiang Shen · arXiv · Jun 22, 2026
Linear mode connectivity (LMC) provides a promising foundation for understanding and merging independently trained neural networks, but existing methods typically optimize the interpolation path from only one model endpoint, limiting their …
Looped World Models
Hongyuan Adam Lu, Z. L. Victor Wei, Qun Zhang, Jinrui Zeng et al. · arXiv · Jun 16, 2026
Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World Models (LoopW…
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks
Mengyu Zheng, Kai Han, Boxun Li, Haiyang Xu et al. · arXiv · Jun 10, 2026
General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and pred…
OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib
Abhijoy Sarkar, Aarchi Singh Thakur · arXiv · Jun 9, 2026
Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computatio…
Echo-Memory: A Controlled Study of Memory in Action World Models
Wayne King, Zeyue Xue, Yuxuan Bian, Jie Huang et al. · arXiv · Jun 8, 2026
We present \textbf{Echo-Memory}, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure i…
Zero Touch Predictive Orchestration: Automating Time-Series Models for the Cloud-Edge Continuum
Abd Elghani Meliani, Arora Sagar, Adlen Ksentini, Raymond Knopp · arXiv · Jun 8, 2026
The Cloud-Edge Continuum (CEC) enables latency-critical applications by distributing resources to the far edge, but its extreme volatility makes proactive Zero Touch Management via time-series forecasting essential. However, orchestrators f…
Policy and World Modeling Co-Training for Language Agents
Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu et al. · arXiv · Jun 1, 2026
Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment. World modeling (WM) can fill this gap, …
TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks
Andrej Tschalzev, Nick Erickson, Yuyang Wang, Huzefa Rangwala et al. · arXiv · Jun 1, 2026
Progress in tabular machine learning has largely focused on increasingly sophisticated model architectures. At the same time, feature engineering remains a critical yet underexplored component of real-world modeling pipelines that is entire…
Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets
M. Ross Kunz, John Merickel, Keith Wilson · arXiv · May 28, 2026
Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches ei…
Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization
Audrey Chan, Aaron Labbé, Jacob Lavoie, Jordan Bannister et al. · arXiv · May 27, 2026
Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by the listener's affective state, but online experimentation on emotion is ethicall…
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
Yuejiang Liu, Fan Feng, Lingjing Kong, Weifeng Lu et al. · ICLR 2026 Workshop World Models · Mar 2, 2026
General-purpose world models promise scalable policy evaluation, optimization, and planning, yet achieving the required level of robustness remains challenging. Unlike policy learning which primarily focuses on optimal actions, a world mode…
Consistent Video World Model With Geometry-Aware Rotary Position Embedding
Chendong Xiang, Jiajun Liu, Jintao Zhang, Xiao Yang et al. · ICLR 2026 Workshop World Models · Mar 2, 2026
Predictive world models that simulate future observations under explicit camera control are fundamental to interactive AI. Despite rapid advances, current systems lack spatial persistence: they fail to maintain stable scene structures over …
Computer-Using World Model
Yiming Guan, Rui Yu, John Zhang, Lu Wang et al. · ICLR 2026 Workshop World Models · Mar 2, 2026
Agents operating in complex software environments benefit from reasoning about the consequences of their actions, as even a single incorrect user interface (UI) operation can derail long, artifact-preserving workflows. This challenge is par…
Action Shapley: A training data selection metric for Training World Models for Reinforcement Learning
Rajat Ghosh, Debojyoti Dutta · ICLR 2026 Workshop World Models · Mar 2, 2026
World models are central to model-based reinforcement learning, enabling agents to predict environment dynamics and reason about future outcomes. In real-world settings, however, training high-fidelity world models is often constrained by l…
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, Chelsea Finn · ICLR 2026 Workshop World Models · Mar 2, 2026
Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number …
Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models
Zhilong Zhang, Haoxiang Ren, Yihao Sun, Yifei Sheng et al. · ICLR 2026 Workshop World Models · Mar 2, 2026
Vision-Language-Action (VLA) models show strong generalization for robotic control, but finetuning them with reinforcement learning (RL) is constrained by the high cost and safety risks of real-world interaction. Training VLA models in inte…
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, Chelsea Finn · ICLR 2026 Poster · Jan 26, 2026
Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number …
Motus: A Unified Latent Action World Model
Hongzhe Bi, Hengkai Tan, Shenghao Xie, Zeyu Wang et al. · arXiv.org · Dec 15, 2025
While a general embodied agent must function as a unified system, current methods are built on isolated models for understanding, world modeling, and control. This fragmentation prevents unifying multimodal generative capabilities and hinde…
RELIC: Interactive Video World Model with Long-Horizon Memory
Yicong Hong, Yiqun Mei, Chongjian Ge, Yiran Xu et al. · arXiv.org · Dec 3, 2025
A truly interactive world model requires three key ingredients: real-time long-horizon streaming, consistent spatial memory, and precise user control. However, most existing approaches address only one of these aspects in isolation, as achi…
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Fangqi Zhu, Zhengyang Yan, Zicong Hong, Quanxin Shou et al. · arXiv.org · Nov 12, 2025
Vision-Language-Action (VLA) models have shown strong potential for general-purpose robotic manipulation, but their reliance on expert demonstrations limits their ability to learn from failures and perform self-corrections. Reinforcement le…
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Junjin Xiao, Yandan Yang, Xinyuan Chang, Ronghan Chen et al. · arXiv.org · Sep 29, 2025
Vision-Language-Action (VLA) models trained via imitation learning suffer from significant performance degradation in data-scarce scenarios due to their reliance on large-scale demonstration datasets. Although reinforcement learning (RL)-ba…
WoW: Towards a World omniscient World model Through Embodied Interaction
Xiaowei Chi, Peidong Jia, Chunkai Fan, Xiaozhu Ju et al. · arXiv.org · Sep 26, 2025
Humans develop an understanding of intuitive physics through active interaction with the world. This approach is in stark contrast to current video models, such as Sora, which rely on passive observation and therefore struggle with grasping…
Remote Sensing-Oriented World Model
Yuxi Lu, Biao Wu, Zhidong Li, Kunqi Li et al. · Submitted to ICLR 2026 · Sep 12, 2025
World models have shown potential in artificial intelligence by predicting and reasoning about world states beyond direct observations. However, existing approaches are predominantly evaluated in synthetic environments or constrained scene …
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang et al. · arXiv.org · Aug 18, 2025
Recent advances in interactive video generations have demonstrated diffusion model's potential as world models by capturing complex physical dynamics and interactive behaviors. However, existing interactive world models depend on bidirectio…

Track World Models on Distill AI — start free →

Latest World Models Research Papers

Recent papers

Related topics