Robotics

Latest Embodied AI Research Papers

The newest Embodied AI papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Embodied AI so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Embodied AI papers in your inbox — free →

Recent papers

AI Study Buddy: Does Physical Presence Help?
Jacob Dishman, Colton Grimshaw, Garrett Ashcroft, Ryan Schuetzler · Journal of the Association ... · Aug 15, 2026
This study will examine whether physical embodiment enhances the effectiveness of AI as a study companion. Drawing on Social Presence Theory and body doubling, we compare four conditions in a 30-minute university self-study session: (1) cha…
AXIS: A Growable Community-Driven Data Engine for Scalable Robot Manipulation
Mengfei Zhao, Dihong Huang, Yikai Tang, Peihao Li et al. · arXiv · Jul 23, 2026
Learning effective robot manipulation policies requires diverse, high-quality demonstrations, yet existing data pipelines are often difficult to scale because they rely on specialized hardware, centralized operators, or fixed task suites. W…
Scale Up Strategically: Learning Compositional Generalization via Bias-Aware Evaluation and Data Collection for Robotic Manipulation
Yu Qi, Zhang Ye, Xinyi Xu, Yuxuan Lu et al. · arXiv · Jul 23, 2026
Compositional generalization is essential for robot to follow diverse instructions. However, pretrained policies are known to take shortcuts, deferring to salient cues rather than grounding language. We introduce a diagnostic framework that…
Beyond Episodic Evaluation: Memory Architectural Bottlenecks in Sequential Embodied Question Answering
Zikui Cai, Kaushal Janga, Tan Dat Dao, Seungjae Lee et al. · arXiv · Jul 23, 2026
Embodied question answering (EQA) is traditionally evaluated under an episodic formulation, where agents solve each task independently and reset internal state between episodes. However, real-world robots operate continuously and must accum…
VoLN: Vision-Only Long-Horizon Navigation---Paradigm, Benchmark, and Method
Jiabin Lou, Haopeng Wang, Yuanshuai Wang, Xinyu Liu et al. · arXiv · Jul 23, 2026
Vision-and-Language Navigation (VLN) enables embodied agents to follow natural-language instructions. However, route-level instructions commonly encode spatial priors, such as orientation, distance, and layout, that are not explicitly avail…
Factorized Spatio-Temporal Convolutions for Human Pose Estimation from Planar Lidar
Simone Arreghini, Mirko Nava, Nicholas Carlotti, Antonio Paolillo et al. · arXiv · Jul 23, 2026
Localizing nearby humans and estimating their facing direction are key capabilities for safe navigation and socially aware human-robot interaction. Many pose-estimation pipelines target cameras and 3D LiDAR or assume GPU-class compute, wher…
TransBiolab: A Real-World Multi-View Dataset of Cluttered Transparent Biomedical Objects
Ke Ma, Yifei Wang, Meng Wang, Tian Xia · arXiv · Jul 23, 2026
Autonomous biomedical laboratories increasingly rely on visual perception to recognize, localize, and manipulate transparent plasticware, yet high-quality real-world datasets for this setting remain limited. The scarcity of domain-relevant …
GuidedAttention: Interpretable and Correctable Visual Attention for OOD-Robust Robot Manipulation via Imitation Learning
Masaki Murooka, Ryoichi Nakajo, Keisuke Shirai, Tomohiro Motoda et al. · arXiv · Jul 23, 2026
End-to-end visuomotor policies provide little opportunity for humans to understand or correct the policy's visual attention. We propose GuidedAttention, a visuomotor imitation learning framework that introduces interpretable and correctable…
A Real-Time Generalized Nash Equilibrium Framework for Interaction-Aware Autonomous Driving in Mixed Traffic
Nouhed Naidja, Mohamed-Cherif Rahal, Steve Pechberti, Stéphane Font et al. · arXiv · Jul 23, 2026
Safe and efficient navigation in mixed-traffic environments remains a critical challenge for Autonomous Vehicles (AVs), primarily due to the complex interdependence between the AV's decisions and the unpredictable reactions of human drivers…
ZONDA: Zero-shot Object Navigation with Dynamic Avoidance in Multi-floor Environments
Shaomin Liang, Xuanhong Liao, Shiyao Zhang · arXiv · Jul 23, 2026
In Object Goal Navigation task, existing methods are typically restricted to static and single-floor environments, ignoring cross-floor topologies and dynamic pedestrian, which limits their real-world deployment. To address these limitation…
TableVerse: A Large-scale Tabletop Dataset with Real-world Grounded Layouts for Generalizable Manipulation
Boyuan Wang, Yue Zhang, Xutao Xue, Xueyu Song et al. · arXiv · Jul 23, 2026
The development of generalizable robotic manipulation policies is inherently bounded by the availability of large-scale, high-fidelity scene data. While recent automated synthesis methods attempt to bridge this gap via text-to-layout halluc…
Towards Miniature Humanoid Tele-Loco-Manipulation Using Virtual Reality and Reinforcement Learning
Nicolas Kosanovic, Jordan Dowdy, Jean Chagas Vaz · arXiv · Jul 22, 2026
Full-sized humanoid robot capabilities have grown exponentially in recent years, aiming towards general-purpose deployment in human environments. A popular control method used by manufacturers utilizes Virtual Reality for upper-body teleope…
ReferTrack: Referring Then Tracking for Embodied Visual Tracking
Hanjing Ye, Tianle Zeng, Jiazhao Zhang, Shaoan Wang et al. · arXiv · Jul 22, 2026
Embodied visual tracking (EVT) requires a mobile agent to continuously follow a specific target described in natural language using only onboard vision. While recent vision-language-action (VLA) policies unify target identification and traj…
Robots Acquire Manipulation Skills in Seconds from a Single Human Video
Guangyan Chen, Meiling Wang, Te Cui, Zichen Zhou et al. · arXiv · Jul 22, 2026
The ability to acquire skills rapidly and effortlessly while retaining those already mastered is essential for robots. However, current methods still rely on a cumbersome training-time loop that is costly and slow, while eroding skills alre…
Unified Prediction and Planning via Conflict-Aware Disjoint Parameter Training
Taewon Seo, Seonae Jeon, Giwon Lee, Kuk-Jin Yoon et al. · arXiv · Jul 22, 2026
Accurate motion prediction of surrounding agents and safe motion planning are two closely coupled key tasks for social robot navigation in crowded environments. Deploying these systems on resource-constrained edge devices necessitates compa…
EA-Nav: Learning Safe Visual Navigation Policies with Embodiment Awareness
Jialu Zhang, Yong Du, Xianda Guo, Shunwang Sun et al. · arXiv · Jul 22, 2026
Cross-embodiment navigation is a key challenge in embodied intelligence. Due to differences in embodiment, the same visual observation may imply different actions for different agents, making prediction ambiguous when relying solely on visi…
KineBench: Benchmarking Embodied World Models via IDM-Free Kinematic Grounding
Zeyu Liu, Zhangzhe Zhu, Yang Zhang, Chenyou Fan et al. · arXiv · Jul 22, 2026
Evaluating the physical consistency of embodied world models(EWMs) is a critical open challenge. While closed-loop evaluation via simulator rollouts offers a more faithful assessment of physical plausibility than open-loop alternatives, exi…
SOPD-SocialNav: Selective On-Policy Distillation for Vision-Language Social Navigation
Xinyu Zhang, Zishuo Wang, Ling Xiao · arXiv · Jul 22, 2026
Vision-language models have shown strong potential for social robot navigation by leveraging rich semantic understanding of complex environments and human behaviors. However, large scale VLMs are difficult to deploy on resource-constrained …
Clinical Pathways as Safety Specifications for Physical AI in Hospital Wards
Gabriele Franchini, Giulio Mallardi, Michele De Carolis, Filippo Lanubile · arXiv · Jul 22, 2026
Ensuring safety in Physical AI systems operating in real-world environments is a critical challenge, particularly in hospital wards where vulnerable patients, clinical staff, medical devices, and assistive robots coexist. In this paper, we …
Balancing centralized learning and distributed self-organization: a hybrid model for embodied morphogenesis
Takehiro Ishikawa · Royal Society Open Science · Jul 22, 2026
Abstract Background: both embodied intelligence and developmental morphogenesis depend on a division of labour between centralized guidance and distributed material dynamics, but the amount of top-down control needed to steer self-organizat…
Artificial Intelligence Embedding and Enterprise Competitiveness in the Embodied Intelligence Industry: The Mediating Role of Competitive Structure Reconfiguration
Janet Zhu, Jinguo Xin, Ning Zhang · Administrative Sciences · Jul 22, 2026
Artificial intelligence is increasingly integrated into physical products, industrial scenarios, and innovation ecosystems, yet management research has focused mainly on AI adoption rather than the depth of organizational integration. This …
Masked Visual Actions for Unified World Modeling
Hadi Alzayer, Wenlong Huang, Haonan Chen, Christopher Luey et al. · arXiv · Jul 21, 2026
Video models absorb rich priors over how the visual world moves, interacts, and responds to contact, making them promising substrates for robotic world modeling. The central challenge is how to communicate action to such models in a form al…
No Training, Better Flights: Test-Time Scaled VLMs for UAV Navigation
Feinan Cheng, Dongliang Xu, Wenli Nong, Zhiheng Zhang et al. · arXiv · Jul 21, 2026
Test-time scaling offers a promising method to improve the inference performance of Vision-Language Models (VLMs) without additional training. Existing approaches to vision-language navigation (VLN) for Unmanned Aerial Vehicle (UAV) typical…
Eversion-based robots can enable safe access,steering and endoscopic imaging within the spinal subarachnoid space
Zicong Wu, Panagiotis Kalozoumis, S. M. Hadi Sadati, Aminul I. Ahmed et al. · arXiv · Jul 21, 2026
Safe navigation within the spinal subarachnoid space is constrained by its narrow, compliant, and delicate anatomy. Conventional catheters and continuum robots rely on proximal pushing, generating friction and shear along the tissue device …
The Twist Decomposition of Serial Robots Under Lower-Mobility Tasks
Luc Baron, Damien Chablat · arXiv · Jul 21, 2026
This paper introduces a twist decomposition framework for serial manipulators performing lower mobility tasks. Rather than relying on Jacobian null-space projections, the method separates the end-effector twist into task and redundant compo…
WorldScape Policy 2.0: Empowering Steerable World Action Modeling with Reasoning-Augmented Memory
Haisheng Su, Zongdai Liu, Xin Jin, Haoxuan Dou et al. · arXiv · Jul 21, 2026
World Action Models (WAMs) offer a promising paradigm for robotic manipulation by jointly modeling visual state transitions and robot actions. However, existing WAMs are constrained by limited temporal context, coarse episode-level language…
Beyond Transformers: Linear Attention Policy for Open-Vocabulary Object Goal Navigation
Jiahong Zhang, Yifan Lin, Yandong Zhang, Sijun Shen et al. · arXiv · Jul 21, 2026
Open-Vocabulary Object Goal Navigation (OVON) requires agents to operate under partial observability, making effective internal state updates critical for navigation performance. This update is implemented by the policy network, where recen…
RoboInter1.5: A Holistic Intermediate Representation Suite for Embodied World Modeling and Robotic Manipulation
Ziqin Wang, Hao Li, Weijun Wang, Junhao Cai et al. · arXiv · Jul 21, 2026
Existing robot datasets remain expensive to curate, embodiment-specific, and insufficiently annotated with the fine-grained structure required for generalizable reasoning, execution, or long-horizon environment dynamics simulation. Building…
Ai-driven evolution in surgical robotics: A comprehensive review of assisted and autonomous surgery
Jinfang Wang, Xinkang Zhang, Xinrong Chen · Journal of Mechanics in Med... · Jul 21, 2026
The increasing demand for high-quality surgical care, together with the uneven distribution of experienced surgeons and advanced medical infrastructure, has motivated growing interest in AI-enabled surgical robotics. Recent advances in mach…
Project LINGXI: The Implicit Archive of Human Civilization — White Paper v1.0
J Y Zhao · Zenodo (CERN European Organ... · Jul 19, 2026
This white paper proposes Project LINGXI, a universal archival framework for tacit knowledge—the embodied skills and cognitive styles that have always vanished with their bearers. It introduces the .hha format protocol, a multimodal contain…

Track Embodied AI on Distill AI — start free →

Latest Embodied AI Research Papers

Recent papers

Related topics