Latest AI for Code Research Papers
The newest AI for Code papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks AI for Code so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest AI for Code papers in your inbox — free →Recent papers
- Hot Fixing in the WildCarol Hanna, Karine Even-Mendoza, W. B. Langdon, Mar Zamorano López et al. · arXiv · Apr 29, 2026
Despite the operational importance of hot fixes, large-scale evidence on how they reshape routine maintenance workflows, particularly in the era of autonomous coding agents, remains limited. We analyse hot fixes present in over 61,000 GitHu…
- TDD Governance for Multi-Agent Code Generation via Prompt EngineeringTarlan Hasanli, Shahbaz Siddeeq, Bishwash Khanal, Pyry Kotilainen et al. · arXiv · Apr 29, 2026
Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven development (TDD) provides a structured …
- An Empirical Study of Speculative Decoding on Software Engineering TasksYijia Li, Junkai Chen, Xing Hu, Xin Xia · arXiv · Apr 29, 2026
Large Language Models (LLMs) have become widely used for Software Engineering (SE) tasks, spanning from function-level code generation to complex repository-level workflows. However, the high latency of autoregressive inference remains a si…
- Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software EngineeringHappy Bhati · arXiv · Apr 29, 2026
The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion tools such as GitHub Copilot operated a…
- Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent HarnessesJiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin et al. · arXiv · Apr 28, 2026
Harnesses have become a central determinant of coding-agent performance, shaping how models interact with repositories, tools, and execution environments. Yet automating harness engineering is hard: a heterogeneous action space, sparse and …
- Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software developmentMagnus Palmblad, Jared M. Ragland, Benjamin A. Neely · arXiv · Apr 23, 2026
The capabilities of AI-assisted coding are progressing at breakneck speed. Chat-based vibe coding has evolved into fully fledged AI-assisted, agentic software development using agent scaffolds where the human developer creates a plan that a…
- DryRUN: On the Role of Public Tests in LLM-Driven Code GenerationKaushitha Silva, Srinath Perera · arXiv · Apr 23, 2026
Multi-agent frameworks are widely used in autonomous code generation and have applications in complex algorithmic problem-solving. Recent work has addressed the challenge of generating functionally correct code by incorporating simulation-d…
- Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code GenerationDi Yang, Xinou Xie, Xiuwen Yang, Ming Hu et al. · arXiv · Apr 23, 2026
Software requirement ambiguity is ubiquitous in real-world development, stemming from the inherent imprecision of natural language and the varying interpretations of stakeholders. While Large Language Models (LLMs) have demonstrated impress…
- SWE-chat: Coding Agent Interactions From Real Users in the WildJoachim Baumann, Vishakh Padmakumar, Xiang Li, John Yang et al. · arXiv · Apr 22, 2026
AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions…
- Vibe-Coding: Feedback-Based Automated Verification with no Human Code Inspection, a Feasibility StudyMichal Töpfer, František Plášil, Tomáš Bureš, Petr Hnětynka · arXiv · Apr 16, 2026
Vibe coding inherently assumes iterative refinement of LLM-generated code through feedback loops. While effective for conventional software tasks, its reliability in runtime-adaptive systems is unclear -- especially when generated code is n…
- Learned or Memorized ? Quantifying Memorization Advantage in Code LLMsDjiré Albérick Euraste, Kaboré Abdoul Kader, Jordan Samhi, Earl T. Barr et al. · arXiv · Apr 15, 2026
The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data leakage. We present a perturbation-based method to quantify memorization advantage in code LLM…
- CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code GenerationDuy Tung Doan, Quang Huy Phung, Dzung Nguyen, Khac-Hoai Nam Bui · arXiv · Apr 15, 2026
Automated code generation remains a persistent challenge in software engineering, as conventional multi-agent frameworks are often constrained by static planning, isolated execution, high computational overhead, and limited adaptability to …
- Enhancing Code LLMs with Reinforcement Learning in Code Generation: A SurveyJunqiao Wang, Zeng Zhang, Yangfan He, Zihao Zhang et al. · ICLR 2026 Workshop LLM Reasoning · Mar 8, 2026
With the rapid evolution of large language models (LLM), reinforcement learning (RL) has emerged as a pivotal technique for code generation and optimization in various domains. This paper presents a systematic survey of the application of R…
- CodeGenGuard: A Watermark for Code Generation ModelsBorui Yang, Mingxuan Ma, Liyao Xiang, Nan Chen et al. · ICLR 2026 Poster · Jan 26, 2026
Code language models (LMs) represent valuable intellectual property (IP) as their training involves immense investments, including large-scale code corpora, proprietary annotations, extensive computational resources, and specialized designs…
- VERINA: Benchmarking Verifiable Code GenerationZhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel et al. · ICLR 2026 Poster · Jan 26, 2026
Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation---jointly generating co…
- From Code Generation to Code Reasoning: A Survey of Inference-Time Methods in LLM-Based Code GenerationACL ARR 2026 January Submission · Jan 6, 2026
Large language models (LLMs) have rapidly advanced the state of code generation, evolving from prompt-based function synthesis to iterative, execution-guided, and agentic software engineering systems. While recent progress has led to impres…
- Process Supervision-Guided Policy Optimization for Code GenerationNing Dai, Zheng Wu, Renjie Zheng, Ziyun Wei et al. · Submitted to ICLR 2026 · Sep 20, 2025
Reinforcement learning (RL) with unit test feedback has enhanced large language models’ (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental improvem…
- Verina: Benchmarking Verifiable Code GenerationZhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel et al. · AI4Math@ICML25 Poster · Jul 9, 2025
Large language models (LLMs) are being increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging, which often requires manual review. Verifiable code generation---jointly generating …
- Multi-Turn Code Generation Through Single-Step RewardsArnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · ICML 2025 spotlightposter · May 1, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
- Multi-Turn Code Generation Through Single-Step RewardsArnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · SSI-FM Poster · Mar 8, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
- Multi-Turn Code Generation Through Single-Step RewardsArnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · ICLR 2025 Workshop VerifAI Poster · Mar 6, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
- Multi-Turn Code Generation Through Single-Step RewardsArnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · Reasoning and Planning for LLMs @ ICLR2025 · Mar 5, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
- Improve Code Generation with FeedbackZhi Xu, Yun Fu · Submitted to ICLR 2025 · Sep 27, 2024
As advancements in Large Language Models (LLMs) continue to accelerate, an increasing number of researchers are exploring the potential of these models to assist in everyday tasks. Despite their remarkable achievements in various downstream…
- VersiCode: Towards Version-controllable Code GenerationTongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu et al. · Submitted to ICLR 2025 · Sep 26, 2024
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development, marked by frequent library updates. This gap significantly limits LLMs' de…
- Improving Code Style for Accurate Code GenerationNaman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez et al. · SyntheticData4ML 2023 Poster · Oct 30, 2023
Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional corre…
- Grounding Code Generation with Input-Output SpecificationsYeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski et al. · Submitted to ICLR 2024 · Sep 23, 2023
Large language models (LLMs) have demonstrated significant potential in code generation. However, the code generated by these models occasionally deviates from the user's intended outcome, resulting in executable but incorrect code. To miti…