Language & NLP

Latest Text Summarization Research Papers

The newest Text Summarization papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Text Summarization so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Text Summarization papers in your inbox — free →

Recent papers

Text summarization via global structure awareness
Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Yibei Liu et al. · CoRR 2026 · Dec 31, 2026
Text summarization is a fundamental task in natural language processing (NLP), and the information explosion has made long-document processing increasingly demanding, making summarization essential. Existing research mainly focuses on model…
Transformer-Assisted LLM-Based Source Code Summarisation: to Enable More Secure Software Development
Jesse Phillips, Tracy Hall, Paul Rayson, Mo El-Haj · arXiv · Jul 23, 2026
Neural Source Code Summarisation (NSCS) aims to generate natural language summaries of source code to improve developers' and maintainers' understanding of code. Source code summaries are vital during the maintenance phase of the Secure Sof…
When Does Knowledge Distillation Hurt? Reliability-Aware Distillation for Low-Resource Language Summarization
Dipto Sumit, Ankan Kumar Roy Srizon, Sadia Khair Rodela, Atia Haque Asha et al. · arXiv · Jul 22, 2026
Knowledge distillation (KD) is a standard approach for compressing sequence-to-sequence models, but its per-sample effects are rarely examined. On the BanSum Bangla summarization benchmark, we find that standard KD improves ROUGE-L by only …
Dialogue Summarization with Emotion Dynamics Using Topic- and Participant-Centric Decomposition
Linyun Xiang, Mark Neerincx, Stephanie Tan · arXiv · Jul 16, 2026
Existing text summarization research has focused much on monologic information (e.g., newspaper articles, reports) without accounting for the interaction between speakers or authors. In contrast, dialogues are a rich communication channel w…
Where Should RL Post-Training Compute Go? Model Size, Search, Learning, and Feedback
Patrick Wilhelm, Odej Kao · arXiv · Jul 15, 2026
Reinforcement Learning (RL) post-training is increasingly used to adapt foundation models for reasoning, planning, and feedback-driven robot-learning pipelines, but constrained post-training resources are often summarized by a single total …
Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent
Lei Bai, Zongsheng Cao, Yang Chen, Zhiyao Cui et al. · arXiv · Jun 29, 2026
We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and…
Coverage-Driven KV Cache Eviction for Efficient and Improved Inference of LLM
Shuvendu Roy, Mengyao Zhai, Hossein Hajimirsadeghi, Golnoosh Samei · arXiv · Jun 28, 2026
Large language models (LLMs) excel at complex tasks like question answering and summarization, thanks to their ability to handle long-context inputs. However, deploying LLMs is costly, not only due to the high computational demands of quadr…
A Tree-of-Thoughts Inspired Hybrid Approach for Legal Case Judgement Summarization using LLMs
Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh · arXiv · Jun 26, 2026
In recent times, Large Language Models (LLMs) are increasingly being used for legal case judgement summarization. Most prior works have tried traditional extractive and abstractive summarization of case judgements. However, hybrid or extrac…
Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation
Xiang Gao, Kaiwen Dong, Yuguang Yao, Padmaja Jonnalagedda et al. · arXiv · Jun 26, 2026
World models in partially observed environments rely on latent representations that summarize interaction history, but in many modern LLM-based architectures predictive performance fails to reflect representation quality due to history bypa…
\textsc{DiARC}: Distinguishing Positive and Negative Samples Helps Improving ARC-like Reasoning Ability of Large Language Models
Yuxuan Yang, Feiyang Li, Yile Wang · arXiv · Jun 25, 2026
The Abstraction and Reasoning Corpus (ARC;~\citealp{chollet2019measure}) contains tasks that require summarizing patterns from limited grid samples and predicting output grids. Recently, many large language model based approaches have attem…
Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning
Poojitha Thota, Shirin Nilizadeh · arXiv · Jun 24, 2026
Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model behavior. I…
Optimizing Abstractive Summarization With Fine-Tuned PEGASUS
Sadiul Arefin Rafi, Naimur Rahman, Kazi Nazibul Islam, Ha-mim Ahmad et al. · arXiv · Jun 24, 2026
Abstractive text summarization is the technique of generating a short and concise summary comprising the salient ideas of a source text without making a subset of the salient sentences from the source text. The introduction of transformer m…
Less is More: Quality-Aware Training Data Selection for Scientific Summarization
Maria Nefeli Paraskevopoulou, Tatiana Passali, Grigorios Tsoumakas · arXiv · Jun 23, 2026
Scientific long-document summarization datasets commonly treat author-written abstracts as gold reference summaries, although their quality and alignment with the source article vary. At the same time, publicly available scientific summariz…
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning
Shiding Zhu, Yudi Qi, Yajie Wang, Jiaze Li et al. · arXiv · Jun 23, 2026
Experience-driven self-evolution is critical for large language model (LLM) agents to improve through open-world interaction. However, existing experience learning methods mostly rely on single-agent loops, where the same agent executes tas…
Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization
Shuo Guan · arXiv · Jun 22, 2026
End-to-end large language models (LLMs) produce fluent multi-document summaries but remain prone to hallucination, and the attributions they offer are typically coarse (whole documents or passages) and generated post hoc, leaving each summa…
TriggerBench: Investigating Prospective Memory for Large Language Models
Tianhua Zhang, Xinjiang Wang, Qianxi Zhang, Qi Chen et al. · arXiv · Jun 22, 2026
While Large Language Models (LLMs) are increasingly deployed in long interactions, existing evaluations focus predominantly on retrospective memory (RM) via explicit queries. Prospective memory (PM), the critical ability to spontaneously re…
Measuring & Mitigating Over-Alignment for LLMs in Multilingual Criminal Law Courts
Arthur Wuhrmann, Gaetan Stein, Daniel Brunner, Andrei Kucharavy · arXiv · Jun 22, 2026
While the wider applicability of LLMs in the legal field is currently debated due to their reliability and the gravity of any errors, narrow uses with well-understood and mitigated risks have emerged. Notably the Swiss Federal Supreme Court…
Plans Don't Persist: Why Context Management Is Load Bearing for LLM Agents
Aman Mehta, Anupam Datta · arXiv · Jun 22, 2026
Long-horizon agents depend on context management: systems compress, summarize, and evict old tokens so tasks can continue beyond finite windows. That is safe only when dropped information is no longer needed or has been internalized. Plans …
A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization
Vu Nguyen Nguyen Xuan, Huy Ngo Quang · arXiv · Jun 17, 2026
In this technical report, we focus on solving the challenge of Vietnamese multi-document abstractive summarization, introduced in the International Workshop on Vietnamese Language and Speech Processing (VLSP) 2022. We choose to follow the p…
ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement
Bohou Zhang, Xiaoyu Tao, Mingyue Cheng, Huijie Liu et al. · arXiv · Jun 17, 2026
Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile these two r…
Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text
Hongbo Du, Zixin Lu, Jiaming Qu · arXiv · Jun 16, 2026
Large language models (LLMs) are increasingly used for clinical text tasks such as summarization and revision. While most studies evaluate the fluency and coherence of LLM-generated text, whether LLMs correctly preserve diagnostic uncertain…
The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports
Samar Ansari · arXiv · Jun 16, 2026
AI-assisted clinical documentation tools increasingly summarize, standardize, and reformat radiology reports using large language models (LLMs). We present a controlled measurement of the resulting information degradation. Using 450 chest X…
daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization
Dayuan Fu, Mohan Jiang, Tongyu Wang, Dian Yang et al. · arXiv · Jun 15, 2026
GPU kernel optimization represents a paradigm where functional correctness is assumed and execution efficiency is the objective. We present daVinci-kernel, a reinforcement learning framework that couples skill discovery with skill exploitat…
Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization
Mariia Onyshchuk, Maksym-Vasyl Tarnavskyi, Marta Sumyk · arXiv · Jun 11, 2026
Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any supervision. We extend th…
NTS-CoT: Mitigating Hallucinations in LLM-based News Timeline Summarization with Chain-of-Thought Reasoning
Feng Lyu, Huiqin Yan, Sijing Duan, Hao Wu et al. · arXiv · Jun 11, 2026
The rapid updates of online news make tracking event developments challenging, highlighting the need for timeline summarization (TLS). Hallucinations, where LLM-generated content deviates from source news, still remain a critical issue in L…
Context-Driven Incremental Compression for Multi-Turn Dialogue Generation
Yeongseo Jung, Jaehyeok Kim, Eunseo Jung, Jiachuan Wang et al. · arXiv · Jun 10, 2026
Modern conversational agents condition on an ever-growing dialogue history at each turn, incurring redundant attention and encoding costs that grow with conversation length. Naive truncation or summarization degrades fidelity, while existin…
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks
Mengyu Zheng, Kai Han, Boxun Li, Haiyang Xu et al. · arXiv · Jun 10, 2026
General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and pred…
Agreement in Representation Space for Open-Ended Self-Consistency
Paula Ontalvilla, Gorka Azkune, Aitor Ormazabal · arXiv · Jun 10, 2026
Self-consistency improves LLM reasoning by sampling multiple outputs and selecting the most consistent answer, but existing formulations largely rely on exact matching and therefore remain limited to tasks with categorical outputs. In this …
Detecting Speculative Language in Biomedical Texts using Recurrent Neural Tensor Networks
Dhruv Dixit · arXiv · Jun 9, 2026
In this investigation, we delve into the automated detection of speculative language within biomedical articles by utilizing distributed sentence representations and advanced deep learning techniques. The implications of such identification…
RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning
Yiteng Mao, Kenan Xu, Yijia Lyu, Wenhao Li et al. · arXiv · Jun 8, 2026
While Large Language Models (LLMs) have achieved near-perfect performance in \emph{solving} high-school mathematics, their ability to \emph{evaluate} the diverse reasoning processes of real human students remains under-examined. To bridge t…

Track Text Summarization on Distill AI — start free →

Latest Text Summarization Research Papers

Recent papers

Related topics