Latest Text Summarization Research Papers
The newest Text Summarization papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Text Summarization so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Text Summarization papers in your inbox — free →Recent papers
- Text summarization via global structure awarenessJiaquan Zhang, Chaoning Zhang, Shuxu Chen, Yibei Liu et al. · CoRR 2026 · Dec 31, 2026
Text summarization is a fundamental task in natural language processing (NLP), and the information explosion has made long-document processing increasingly demanding, making summarization essential. Existing research mainly focuses on model…
- Detecting Speculative Language in Biomedical Texts using Recurrent Neural Tensor NetworksDhruv Dixit · arXiv · Jun 9, 2026
In this investigation, we delve into the automated detection of speculative language within biomedical articles by utilizing distributed sentence representations and advanced deep learning techniques. The implications of such identification…
- RealMath-Eval: Why SOTA Judges Struggle with Real Human ReasoningYiteng Mao, Kenan Xu, Yijia Lyu, Wenhao Li et al. · arXiv · Jun 8, 2026
While Large Language Models (LLMs) have achieved near-perfect performance in \emph{solving} high-school mathematics, their ability to \emph{evaluate} the diverse reasoning processes of real human students remains under-examined. To bridge t…
- Acoustic Cue Alignment in Audio Language Models for Speech Emotion RecognitionIosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller · arXiv · Jun 5, 2026
Instruction-following audio language models (ALMs) can be augmented with explicit acoustic cues, yet it remains unclear whether such cues are used in a grounded way when the raw audio is already available. We study this question in speech e…
- When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt VariationsMahdi Alkaeed · arXiv · Jun 5, 2026
Large Language Models (LLMs) are increasingly used in healthcare for tasks such as clinical question answering, diagnosis support, and report summarization. Despite their promise, these models remain highly sensitive to subtle prompt pertur…
- Forgive or forget: Understanding the context of hate in audio retrieval systemsArghya Pal, Sailaja Rajanala, Raphael C. -W. Phan, Shekhar Nayak · arXiv · Jun 4, 2026
Handling toxic retrieval in text-to-audio systems is challenging due to contextual dependencies. Existing strategies (e.g., rephrasing, summarization) risk altering intent or omitting details. We propose a post hoc causal debiasing framewor…
- Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance CategorizationBaris Karacan, Vaibhav Bhargava, Barbara Di Eugenio, Natalie Parde et al. · arXiv · Jun 1, 2026
Effective "all-team" summarization in high-complexity settings like the Neonatal Intensive Care Unit (NICU) requires aggregating insights from diverse disciplines (physicians, nurses, therapists) spread across hundreds of clinical free-text…
- From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing AgentsRongsheng Zhang, Ruofan Hu, Weijie Chen, Jiji Tang et al. · arXiv · May 25, 2026
While role-playing agents excel in short-term interactions, long-term conversations overwhelm context windows, motivating external memory frameworks. Current systems typically rely on persona-agnostic summarization, which records facts with…
- Structure Retention in Embedding Spaces as a Predictor of Benchmark PerformanceAmanda Myntti, Jenna Kanerva, Veronika Laippala, Filip Ginter · arXiv · May 21, 2026
In this paper, we show that high-performing embedding models organize their embedding spaces in a consistent way. We evaluate 25 contemporary embedding models on five MTEB tasks spanning four diverse task categories (retrieval, bitext minin…
- The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-ThoughtMoritz Brösamle, Stephan Eckstein · arXiv · May 18, 2026
Existing expressivity results for transformers typically rely on hardmax attention, high precision, and other architectural modifications that disconnect them from the models used in practice. We bridge this gap by analyzing standard transf…
- Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational SpeechKaavya Chaparala, Thomas Thebaud, Jesús Villalba López, Laureano Moro-Velazquez et al. · arXiv · May 17, 2026
There are not enough established benchmarks for the task fo speech summarization. Creating new benchmarks demands human annotation, as LLMs could embed systemic errors and bias into datasets. We test ten annotation workflows varying input m…
- Herculean: An Agentic Benchmark for Financial IntelligenceXueqing Peng, Zhuohan Xie, Yupeng Cao, Haohang Li et al. · arXiv · May 14, 2026
As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial vi…
- BOOKMARKS: Efficient Active Storyline Memory for Role-playingLetian Peng, Ziche Liu, Yiming Huang, Longfei Yun et al. · arXiv · May 13, 2026
Memory systems are critical for role-playing agents (RPAs) to maintain long-horizon consistency. However, existing RPA memory methods (e.g., profiling) mainly rely on recurrent summarization, whose compression inevitably discards important …
- Multimodal Abstractive Summarization of Instructional Videos with Vision-Language ModelsMaham Nazir, Muhammad Aqeel, Richong Zhang, Francesco Setti · arXiv · May 12, 2026
Multimodal video summarization requires visual features that align semantically with language generation. Traditional approaches rely on CNN features trained for object classification, which represent visual concepts as discrete categories …
- Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card GenerationSike Xiang, Shuang Chen, Kevin Qinghong Lin, Jialin Yu et al. · arXiv · May 12, 2026
Clinical check-up reports are multimodal documents that combine page layouts, tables, numerical biomarkers, abnormality flags, imaging findings, and domain-specific terminology. Such heterogeneous evidence is difficult for laypersons to int…
- Tracing Uncertainty in Language Model "Reasoning"Nils Grünefeld, Bertram Højer, Philipp Mondorf, Barbara Plank et al. · arXiv · May 8, 2026
Language model (LM) "reasoning", commonly described as Chain-of-Thought or test-time scaling, often improves benchmark performance, but the dynamics underlying this process remain poorly understood. We study these dynamics through the lens …
- Canon Formation in the Age of AI: Metadata Packet for Disambiguation, Training-Layer Selection, and Retrocausal Reception (v1.1)Lee Sharks · Zenodo (CERN European Organ... · May 8, 2026
AI does not merely represent an existing canon; through training, indexing, retrieval, summarization, and citation, AI systems participate in canon formation by altering which texts become visible, reusable, and culturally actionable. This …
- Generating Query-Focused Summarization Datasets from Query-Free Summarization DatasetsYllias Chali, Deen Abdullah · arXiv · May 6, 2026
Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search for suitable datasets for Query-Focused Summarization (QFS), we identify two research que…
- S^2tory: Story Spine Distillation for Movie Script SummarizationMingzhe Lu, Yanbing Liu, Qihao Wang, Jiarui Zhang et al. · arXiv · May 5, 2026
Movie scripts pose a fundamental challenge for automatic summarization due to their non-linear, cross-cut narrative structure, which makes surface-level saliency methods ineffective at preserving core story progression. To address this, we …
- PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literatureVerena Jasmin Hallitschke, Carsten Eickhoff, Philipp Berens · arXiv · May 4, 2026
Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophthalmologi…
- The Compliance Gap: Why AI Systems Promise to Follow Process Instructions but Don'tKwan Soo Shin · arXiv · May 3, 2026
An auditor instructs an AI assistant: "open each file individually using the Read tool -- no scripts, no agents." The AI replies "Yes" -- then issues a single batched call summarizing all fifty files at once. We call this the Compliance Gap…
- APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and InterpretationPengyun Zhu, Qiheng Sun, Long Wen, Yanbo Wang et al. · arXiv · Apr 30, 2026
Privacy policies are essential for users to understand how service providers handle their personal data. However, these documents are often long and complex, as well as filled with technobabble and legalese, causing users to unknowingly acc…
- Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language ModelsGongbo Zhang, Wen Wang, Ye Tian, Li Yuan · arXiv · Apr 29, 2026
Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference…
- OCR-Memory: Optical Context Retrieval for Long-Horizon Agent MemoryJinze Li, Yang Zhang, Xin Yang, Jiayi Qu et al. · arXiv · Apr 29, 2026
Autonomous LLM agents increasingly operate in long-horizon, interactive settings where success depends on reusing experience accumulated over extended histories. However, existing agent memory systems are fundamentally constrained by text-c…
- LLM-ReSum: A Framework for LLM Reflective Summarization through Self-EvaluationHuyen Nguyen, Haoxuan Zhang, Yang Zhang, Junhua Ding et al. · arXiv · Apr 28, 2026
Reliable evaluation of large language model (LLM)-generated summaries remains an open challenge, particularly across heterogeneous domains and document lengths. We conduct a comprehensive meta-evaluation of 14 automatic summarization metric…
- LongSumEval: Question-Answering Based Evaluation and Feedback-Driven Refinement for Long Document SummarizationHuyen Nguyen, Haoxuan Zhang, Yang Zhang, Haihua Chen et al. · arXiv · Apr 28, 2026
Evaluating long document summaries remains the primary bottleneck in summarization research. Existing metrics correlate weakly with human judgments and produce aggregate scores without explaining deficiencies or guiding improvement, prevent…
- Odysseys: Benchmarking Web Agents on Realistic Long Horizon TasksLawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov · arXiv · Apr 27, 2026
Existing web agent benchmarks have largely converged on short, single-site tasks that frontier models are approaching saturation on. However, real world web use consists of long-horizon, multi-site workflows. Common web navigation tasks, su…
- DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM InferenceZahra Dehghanighobadi, Asja Fischer · arXiv · Apr 27, 2026
Long-context reasoning is a critical capability of large language models (LLMs), enabling applications such as long-document understanding, summarization, and code generation. However, efficient autoregressive inference relies on the key-va…
- ReLeVAnT: Relevance Lexical Vectors for Accurate Legal Text ClassificationIshaan Gakhar, Harsh Nandwani · arXiv · Apr 24, 2026
The classification of legal documents from an unstructured data corpus has several crucial applications in downstream tasks. Documents relevant to court filings are key in use cases such as drafting motions, memos, and outlines, as well as …
- A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from DocumentsPraval Sharma · arXiv · Apr 23, 2026
Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-doma…