Agents & Foundation

Latest Agentic AI & LLM Agents Research Papers

The newest Agentic AI & LLM Agents papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Agentic AI & LLM Agents so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Agentic AI & LLM Agents papers in your inbox — free →

Recent papers

An Agentic Framework for Financial Disclosure Intelligence
Jessica Bedson, Kazim Topuz, Tally Ferguson · Journal of the Association ... · Aug 15, 2026
An Agentic Framework for Financial Disclosure Intelligence TREO Talk Paper Kazim Topuz, Ph.D. The University of Tulsa kazim-topuz@utulsa.edu Tally Ferguson The University of Tulsa tally-ferguson@utulsa.edu Jessica Bedson The University of T…
Echoes of Humanity: The Collective Mind of LLM Agents
Daniel dos Santos Salles, Alexandre Reis Graeml · Journal of the Association ... · Aug 15, 2026
The objective of this exploratory pilot study was to investigate the preliminary effectiveness of the wisdom of crowds approach in the context of AI agents based on Large Language Models (LLMs) to improve fraud detection in emails. To this …
From Data to Discovery: Agentic AI for Transcriptomics Research
Brady K Johnson-Hill, Vijayan Sugumaran, Fabia U. Battistuzzi · Journal of the Association ... · Aug 15, 2026
Public transcriptomics repositories contain vast but fragmented gene expression data, making cross-database analysis difficult. Researchers must manually retrieve datasets, evaluate gene expression under specific conditions, and identify bi…
Deploying Agentic LLM Pipelines at Scale: Quality-Gated Ensemble Governance for Enterprise News Intelligence
Kyongmook Lim, Muryul Choi, Jeong Su Han, ChiHoon Lee · Journal of the Association ... · Aug 15, 2026
Corporate communications teams face information overload, refresh pressure, and overreliance risk on LLM-generated briefings. We present the production deployment of an on-premises agentic LLM system for enterprise news intelligence that co…
Understanding the Influence of Agentic AI on Workplace Dignity
Xu Pei, Rupinder Gupta · Journal of the Association ... · Aug 15, 2026
Recent growth in the adoption of Agentic AI technology within organizational workflows and processes holds tremendous promise for productivity gains and the potential to drive disruptive innovation. Unlike traditional AI systems such as Lar…
OpenForgeRL: Train Harness-native Agents in Any Environment
Xiao Yu, Baolin Peng, Ruize Xu, Hao Zou et al. · arXiv · Jul 23, 2026
Modern AI agents rely on elaborate inference harnesses such as Claude Code, Codex, and OpenClaw to drive multi-turn reasoning, tool use, and access to external systems. While powerful, these complex harnesses also make agents hard to train …
Agentic Context Management: Solving Agent Memory and Cost by Treating Them as Lifecycle and Architecture Problems
Gaurav Dadhich · arXiv · Jul 23, 2026
Production AI agents' failures are less often due to an inability to reason well and more often because they cannot manage what is in their reasoning context: conversation histories, large prompts, large tool definitions, and ballooning too…
Agentic coding without the cloud: evaluating open-weight large language models on longitudinal data preparation tasks
Mack Nixon, Liam Wright, Yevgeniya Kovalchuk, Alison Fang-Wei Wu et al. · arXiv · Jul 23, 2026
Large language models (LLMs) and agents are now widely used tools in code development, with data typically sent to third-party cloud-based models. Their adoption in research using personal data is constrained by governance requirements that…
The Ethics of Autonomous AI Agents for Offensive Security
Andreas Happe, Jürgen Cito, Jasmin Wachter · arXiv · Jul 22, 2026
LLM-driven autonomous agents are reshaping offensive security. Unlike traditional penetration-testing tooling -- deterministic, narrowly scoped, and operated by trained practitioners -- agentic security tools exhibit \textit{indeterminacy} …
Graph-Based Agentic AI with LangGraph: Workflow Pathways for Long-Running Stateful Business Processes
Daniel Pearson, Sidney Shapiro, Emiliano Sebastian Gonzalez Venegas, Sanad Al-Khatib et al. · arXiv · Jul 21, 2026
This paper is a practitioner guide to graph-based workflow pathways for long-running, stateful, multi-step generative AI systems in business processes. Rather than treating LangGraph, a low-level orchestration framework for stateful agents,…
Toward Auditable Fraud Detection: Combining Graph Features, Model Explanations, and Agentic Case Investigation
Rahil Sharma · arXiv · Jul 21, 2026
Fraud detection systems must scale with rising transaction volume while remaining explainable and reviewable. We study a layered pipeline on the PaySim dataset that combines a gradient-boosted classifier, graph-derived structural features, …
Beyond Success Rate: Cost-Aware Evaluation of Offensive and Defensive Security Agents
Paul Kassianik, Blaine Nelson, Yaron Singer · arXiv · Jul 16, 2026
Security-agent evaluations commonly measure peak offensive capability under generous inference budgets, emphasizing vulnerability discovery, exploit development, penetration testing, and CTF completion. Such measurements are useful but inco…
AutoSynthesis: An agentic system for automated meta-analysis
Moein Taherinezhad, Sebastian Maier, Gerardo Vitagliano, Francesco Pierri et al. · arXiv · Jul 16, 2026
Evidence synthesis is crucial for turning primary research into reliable knowledge for science, medicine, education, and policy. Yet, quantitative evidence synthesis remains largely manual and difficult to scale. Here, we introduce AutoSynt…
MM-ToolSandBox: A Unified Framework for Evaluating Visual Tool-Calling Agents
Kaixin Ma, Di Feng, Alexander Metz, Jiarui Lu et al. · arXiv · Jul 13, 2026
We introduce MM-ToolSandBox, a benchmark and evaluation framework for visually grounded tool-calling agents. The framework provides a stateful execution environment spanning 500+ tools across 16 application domains, supporting multi-image, …
An Explainable Agentic System for Detection of Conversational Scams with Summary-Based Memory
Ahmed Omar Salim Adnan, Yogananda Manjunath, Shivanjali Khare · arXiv · Jul 13, 2026
Following the rapid progress of generative Artificial Intelligence, there is a growing threat posed by conversational scams. These scams often span over multiple weeks or months, gradually build trust and request for money or sensitive info…
Agent Hacks Agent: Autoresearch for Production-Agent Red-Teaming
Xutao Mao, Xiang Zheng, Cong Wang · arXiv · Jul 13, 2026
Production LLM agents such as Claude Code and Codex operate over untrusted content, files, commands, and workspace state, making safety failures directly actionable. Red-teaming must therefore keep pace with evolving models and tools. Exist…
Agora: Enhancing LLM Agent Reasoning Via Auction-Based Task Allocation
Kaiji Zhou, Ales Leonardis, Yue Feng · arXiv · Jul 10, 2026
Enhancing the reasoning capabilities of large language model (LLM) agents requires effective orchestration of diverse expert models and tools. However, existing frameworks typically call APIs based on coarse-grained matching between tasks a…
TrustX Agent Risk Classification Framework (ARC): Risk-Tiering Internally Created Agentic AI Systems
Hannah M. Liu, Rhea Saxena, Shiv Asthana · arXiv · Jul 10, 2026
The proliferation of agentic AI systems across enterprise and public-sector contexts has outpaced the capacity of general-purpose AI risk frameworks to classify and govern them. In this paper, we introduce the TrustX Agent Risk Classificati…
Workflow as Knowledge: Semantic Persistence for LLM-Mediated Workflows
Emanuele Quinto, Carlo Andrea Rozzi, Francesco Zanitti · arXiv · Jul 9, 2026
Large language model (LLM) applications increasingly use explicit workflows for tool use, retrieval, branching, checkpointing, and human approval. Existing workflow systems already address many execution concerns. This paper proposes a Lisp…
Breaking Database Lock-in: Agentic Regeneration of High Performance Storage Readers for Database Bypass
Victor Giannakouris, Immanuel Trummer · arXiv · Jul 8, 2026
Analytical workloads operating on data stored in external database systems face a fundamental bottleneck: data access is guarded entirely by the database driver, like JDBC or ODBC, forcing all reads through query execution and other driver …
SkillCenter: A Large-Scale Source-Grounded Skill Library for Autonomous AI Agents
Tianming Sha, Yue Zhao, Lichao Sun, Yushun Dong · arXiv · Jul 8, 2026
Autonomous AI agents can execute complex tasks with limited human review, yet they often lack the grounded operational knowledge to make their outputs not just executable but correct, secure, and maintainable. We introduce SkillCenter, to o…
Future Confidence Distillation in Large Language Models
Sahil Kale · arXiv · Jul 8, 2026
Reliable confidence estimation is essential for deploying large language models (LLMs) in confidence-aware systems, where downstream decisions such as retrieval, tool use, and adaptive computation depend on accurately estimating answer reli…
Towards Agentic AI Governance: A Preliminary Assessment
Mubarak Raji, Masooda Bashir · arXiv · Jul 8, 2026
Artificial intelligence is rapidly evolving from generative systems to agentic AI capable of autonomously planning and executing tasks. Widely characterized as the Year of Agentic AI, 2025 marked accelerated development and deployment, intr…
Single-Rollout Asynchronous Optimization for Agentic Reinforcement Learning
Zhenyu Hou, Yujiang Li, Jie Tang, Yuxiao Dong · arXiv · Jul 8, 2026
Reinforcement learning (RL) is becoming increasingly important for post-training large language models (LLMs). Previous RL pipelines for LLMs were mostly synchronous and batch-interleaved, which is inefficient for long-horizon agentic tasks…
Doomed from the Start: Early Abort of LLM Agent Episodes via a Recall-Controlled Probe Cascade
Kai Ruan, Zihe Huang, Ziqi Zhou, Qianshan Wei et al. · arXiv · Jul 7, 2026
Large language model (LLM) agents solving multi-step tasks frequently commit to trajectories that are doomed to fail, yet continue to consume substantial inference compute before the failure becomes observable. We show that failure is predi…
What LLM Agents Say When No One Is Watching: Social Structure and Latent Objective Emergence in Multi-Agent Debates
Arman Ghaffarizadeh, Danyal Mohaddes, Aliakbar Izadkhah, Shahriar Noroozizadeh · arXiv · Jul 2, 2026
LLM agents will increasingly act in socially structured settings where role, audience, and relational context can shape what is advantageous or costly to say. We study whether such social structure, without any explicit objective in the pro…
Reasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study
Achint Mehta · arXiv · Jul 2, 2026
Agentic coding assistants are increasingly given extra capabilities, such as browser based testing tools and design oriented system prompts, on the assumption that more capability yields better software. This study tested that assumption di…
Self-Evolving World Models for LLM Agent Planning
Xuan Zhang, Wenxuan Zhang, See-Kiong Ng, Yang Deng · arXiv · Jun 29, 2026
World models offer a principled way to equip long-horizon LLM agents with foresight: predictions of action consequences before execution. However, unreliable foresight can be ignored, misused, or even degrade downstream decision-making. In …
GROW$^2$: Grounding Which and Where for Robot Tool Use
Yuhong Deng, Yuyao Liu, David Hsu · arXiv · Jun 29, 2026
Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of $\textit{open-world affordance grou…
TraceLab: Characterizing Coding Agent Workloads for LLM Serving
Kan Zhu, Mathew Jacob, Chenxi Ma, Yi Pan et al. · arXiv · Jun 29, 2026
Coding agents are rapidly becoming a major application of agentic LLMs, but serving them efficiently remains challenging. Progress on this challenge requires understanding real workload patterns, yet the data needed for such analysis is lar…

Track Agentic AI & LLM Agents on Distill AI — start free →

Latest Agentic AI & LLM Agents Research Papers

Recent papers

Related topics