Multimodal & Audio

Latest AI for Code Research Papers

The newest AI for Code papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks AI for Code so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest AI for Code papers in your inbox — free →

Recent papers

MicroRNAs-mediated environmental adaptation in marine microalgae: from physiological responses to biotechnology applications
Gabriele De Falco · Open Research Online (The O... · Jan 1, 2028
MicroRNAs (miRNAs) are small non-coding RNAs that can regulate gene expression post-transcriptionally by interacting with mRNAs. They are fine-tune modulators that have a pleiotropic effect, interacting with different targets. Thus, they ar…
Towards Knowledge Alignment in Code LLMs: Contrastive Unlearning for Evolving APIs
Huy Q. Tran, Dang H. Vu, Tuyen N. Dinh, Anh H. D. Nguyen et al. · arXiv · Jun 29, 2026
Large Language Models (LLMs) have recently achieved strong performance in code generation. However, due to knowledge cut-off and the rapid evolution of software libraries, they often generate deprecated API usages that lead to unreliable an…
To Tab or Not to Tab: Measuring Critical Engagement in AI Code Completion Tools Using Behavioral Signals and Attention Checks
Jessica Hutchison, Ian Tyler Applebaum, Kenneth Angelikas, Kush Rakesh Patel et al. · Science · Jun 29, 2026
AI code completion tools, such as Github Copilot, provide students with code suggestions to help them write programs. However, recent qualitative studies suggest that students fail to critically evaluate these suggestions. We present Clover…
The Illusion of Agentic Complexity in README.md Generation: Evaluating Single-Agent vs. Multi-Agent RAG Systems
Abu Saleh, Tesfay Welegebreal Tesfay, Phuong T. Nguyen, Juri Di Rocco et al. · arXiv · Jun 29, 2026
Large Language Models (LLMs) are increasingly utilized to automate several software engineering tasks, including code completion, code summarization, testing, and the generation of repository-level documentation. While Multi-Agent Systems (…
SWE-Together: Evaluating Coding Agents in Interactive User Sessions
Yifan Wu, Zhuokai Zhao, Songlin Li, Ho Hin Lee et al. · arXiv · Jun 29, 2026
Most coding-agent benchmarks are static: an agent receives a complete task description up front and is judged only by its final code. Real coding assistance is interactive, with users clarifying goals, adding constraints, and correcting mis…
Citation Discipline in Spec-Driven Development: A Cross-Model Empirical Study of Output Determinism and Automated Hallucination Detection in LLM-Generated Code
Subham Panda · arXiv · Jun 28, 2026
Spec-Driven Development (SDD) frameworks guide Large Language Model (LLM)-powered code generation through formal specifications, yet they differ fundamentally in how they enforce traceability between requirements and generated code. This pa…
From Code Generation to Code Reasoning: A Survey of Inference-Time Methods in LLM-Based Code Generation
ACL ARR 2026 May Submission · May 26, 2026
Large language models (LLMs) have rapidly advanced code generation, evolving from prompt-based function synthesis to iterative, execution-guided, tool-integrated, and agentic software engineering systems. While recent progress has produced …
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Junqiao Wang, Zeng Zhang, Yangfan He, Zihao Zhang et al. · ICLR 2026 Workshop LLM Reasoning · Mar 8, 2026
With the rapid evolution of large language models (LLM), reinforcement learning (RL) has emerged as a pivotal technique for code generation and optimization in various domains. This paper presents a systematic survey of the application of R…
CodeGenGuard: A Watermark for Code Generation Models
Borui Yang, Mingxuan Ma, Liyao Xiang, Nan Chen et al. · ICLR 2026 Poster · Jan 26, 2026
Code language models (LMs) represent valuable intellectual property (IP) as their training involves immense investments, including large-scale code corpora, proprietary annotations, extensive computational resources, and specialized designs…
VERINA: Benchmarking Verifiable Code Generation
Zhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel et al. · ICLR 2026 Poster · Jan 26, 2026
Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation---jointly generating co…
From Code Generation to Code Reasoning: A Survey of Inference-Time Methods in LLM-Based Code Generation
ACL ARR 2026 January Submission · Jan 6, 2026
Large language models (LLMs) have rapidly advanced the state of code generation, evolving from prompt-based function synthesis to iterative, execution-guided, and agentic software engineering systems. While recent progress has led to impres…
CWM: An Open-Weights LLM for Research on Code Generation with World Models
Fair CodeGen team. Jade Copet, Quentin Carbonneaux, Gal Cohen, Jonas Gehring et al. · arXiv.org · Sep 30, 2025
We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train …
Process Supervision-Guided Policy Optimization for Code Generation
Ning Dai, Zheng Wu, Renjie Zheng, Ziyun Wei et al. · Submitted to ICLR 2026 · Sep 20, 2025
Reinforcement learning (RL) with unit test feedback has enhanced large language models’ (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental improvem…
A Survey on Code Generation with LLM-based Agents
Yihong Dong, Xue Jiang, Jiaru Qian, Tian Wang et al. · arXiv.org · Jul 31, 2025
Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm. Distinct from previous code generation techniques, code generation agents are characterized by three core features. 1) Aut…
On the Effectiveness of LLM-as-a-Judge for Code Generation and Summarization
Giuseppe Crupi, Rosalia Tufano, Alejandro Velasco, A. Mastropaolo et al. · IEEE Transactions on Software Engineering · Jul 22, 2025
Large Language Models (LLMs) have been recently exploited as judges for complex natural language processing tasks, such as Q&A (Question & Answer). The basic idea is to delegate to an LLM the assessment of the “quality” of the output provid…
Verina: Benchmarking Verifiable Code Generation
Zhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel et al. · AI4Math@ICML25 Poster · Jul 9, 2025
Large language models (LLMs) are being increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging, which often requires manual review. Verifiable code generation---jointly generating …
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu et al. · arXiv.org · Jun 25, 2025
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particular…
MAGE: A Multi-Agent Engine for Automated RTL Code Generation
Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu et al. · Design Automation Conference · Jun 22, 2025
The automatic generation of RTL code (e.g., Verilog) through natural language instructions has emerged as a promising direction with the advancement of large language models (LLMs). However, producing RTL code that is both syntactically and…
CLEVER: A Curated Benchmark for Formally Verified Code Generation
A. Thakur, Jasper Lee, G. Tsoukalas, M. Sistla et al. · arXiv.org · May 20, 2025
We introduce ${\rm C{\small LEVER}}$, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a specification that matches a held-out ground-trut…
Multi-Turn Code Generation Through Single-Step Rewards
Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · ICML 2025 spotlightposter · May 1, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang · arXiv.org · Apr 24, 2025
Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Larg…
Type-Constrained Code Generation with Language Models
Niels Mündler, Jingxuan He, Hao Wang, Koushik Sen et al. · Proc. ACM Program. Lang. · Apr 12, 2025
Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although constrain…
RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique
Shang Liu, Wenji Fang, Yao Lu, Jing Wang et al. · IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems · Apr 1, 2025
The automatic generation of RTL code (e.g., Verilog) using natural language instructions and large language models (LLMs) has attracted significant research interest recently. However, most existing approaches heavily rely on commercial LLM…
Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs
Yuqi Zhu, Ge Li, Xue Jiang, Jia Li et al. · arXiv.org · Mar 19, 2025
Chain-of-Thought (CoT) reasoning has been demonstrated as an effective technique for improving the problem-solving capabilities of large language models (LLMs) in the context of code generation. However, existing CoT methods often exhibit a…
Beyond Code Generation: LLM-supported Exploration of the Program Design Space
J.D. Zamfirescu-Pereira, Eunice Jun, Michael Terry, Qian Yang et al. · International Conference on Human Factors in Computing Systems · Mar 10, 2025
In this work, we explore explicit Large Language Model (LLM)-powered support for the iterative design of computer programs. Program design, like other design activity, is characterized by navigating a space of alternative problem formulatio…
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
Wei Li, Xin Zhang, Zhongxin Guo, Shaoguang Mao et al. · Annual Meeting of the Association for Computational Linguistics · Mar 9, 2025
Implementing new features in repository-level codebases is a crucial application of code generation models. However, current benchmarks lack a dedicated evaluation framework for this capability. To fill this gap, we introduce FEA-Bench, a b…
Multi-Turn Code Generation Through Single-Step Rewards
Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · SSI-FM Poster · Mar 8, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
Multi-Turn Code Generation Through Single-Step Rewards
Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · ICLR 2025 Workshop VerifAI Poster · Mar 6, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
Multi-Turn Code Generation Through Single-Step Rewards
Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush et al. · Reasoning and Planning for LLMs @ ICLR2025 · Mar 5, 2025
We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple ye…
S*: Test Time Scaling for Code Generation
Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li et al. · Conference on Empirical Methods in Natural Language Processing · Feb 20, 2025
Increasing test-time compute for LLMs shows promise across domains but remains underexplored in code generation, despite extensive study in math. In this paper, we propose S*, the first hybrid test-time scaling framework that substantially …

Track AI for Code on Distill AI — start free →

Latest AI for Code Research Papers

Recent papers

Related topics