Language & NLP

Latest Large Language Models Research Papers

The newest Large Language Models papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Large Language Models so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Large Language Models papers in your inbox — free →

Recent papers

Constraint-Based Physicalism
Kendall Taylor · Zenodo (CERN European Organ... · Feb 17, 2028
DISCLOSURE:This paper presents the author's own original philosophical framework, refined through hudreds of iterative exchanges and adversarial critiques, with the author directing each stage of revision. The final text was generated with …
Defying the Catholic Secondary School Enrollment Decline: A Case Study Exploring Strategic Enrollment Management Practices in an All-Boys Catholic High School
Traci A Koval · Seton Hall University eRepo... · May 15, 2027
Catholic secondary schools in the United States continue to face persistent enrollment decline, closures, and consolidations, creating an urgent need for sustainable and mission-centered strategies. This qualitative case study examined Hill…
How Large Language Models Are Reshaping Skills and Job Requirements for Public Health Professionals in Saudi Arabia
Mulfi Alkhinjar · Scholarship @ Claremont (Th... · Jan 1, 2027
Context: Large Language Models (LLMs) such as ChatGPT, Gemini, and DeepSeek are transforming professional work across sectors by enhancing information processing and decision support. In public health, these technologies offer the potential…
The Sociolinguistics of Machine Identity: LLM Personality and Ideology Propagation
Guangni Li · Knowledge Commons (Lakehead... · Dec 31, 2026
Do large language models (LLMs) possess a measurable "personality," and how do the linguistic properties of training corpora shape their cognitive style and downstream reasoning? This paper approaches these questions from a sociolinguistic …
Surprisal Theory is Tautological (without Rational Grounding)
Ryan Cotterell · arXiv · Jul 23, 2026
Surprisal theory holds that the human processing difficulty of a linguistic unit in context is an affine function of its surprisal under some language model. I argue this claim is a tautology without further constraint: for any non-negative…
MedGame: Storytelling Gamification Empowered by Large Language Models for Medical Education
Qian Wu, Xinrong Zhou, Zizhan Ma, Kai Chen et al. · arXiv · Jul 23, 2026
Large Language Models (LLMs) show promise for medical education, but most existing systems focus on localized interactions such as question answering or single-turn feedback, rather than organizing an entire clinical case into a decision-ce…
Artificial Epanorthosis: Why large language models overuse a classical rhetorical figure, and how to mitigate it
Federico Boggia · arXiv · Jul 23, 2026
A rhetorical figure that Cicero and Quintilian catalogued two thousand years ago reappears, systematically, in the text of large language models: epanorthosis, the self-correction of the specimen «This is not a course. It is a journey of tr…
What, Where, and How: Disentangling the Roles of Task, Language, and Model in Code Model Representations
Piotr Wilam · arXiv · Jul 23, 2026
Do independently trained language models come to represent the same thing in the same way? We answer for code, extending a recently introduced concept-circuit extraction method to a 2x2 design -- Python and Rust crossed with Qwen2.5-Coder-7…
Agentic coding without the cloud: evaluating open-weight large language models on longitudinal data preparation tasks
Mack Nixon, Liam Wright, Yevgeniya Kovalchuk, Alison Fang-Wei Wu et al. · arXiv · Jul 23, 2026
Large language models (LLMs) and agents are now widely used tools in code development, with data typically sent to third-party cloud-based models. Their adoption in research using personal data is constrained by governance requirements that…
RUMBA: Russian User Memory Benchmark
Elizaveta Shevtsova, Inna Glebkina, Mark Baushenko, Pavel Gulyaev et al. · arXiv · Jul 23, 2026
The ability to handle long-term memory in LLMs is becoming increasingly critical, yet existing benchmarks remain English-centric and rely on aggregate retrieval metrics, failing to capture interactions between long-range context, temporal i…
When Trivia Is Not Trivial: Everyday Knowledge Failures in Multilingual LLMs
Anna Mosolova, Djamé Seddah · arXiv · Jul 23, 2026
Quiz rooms, trivia nights, and quiz shows challenge human knowledge across a wide range of topics, from canonical facts to everyday culture. In this paper, we examine whether large language models (LLMs) can perform competitively in such se…
Euclid-MCP: A Model Context Protocol Server for Deterministic Logical Reasoning via Prolog
Bartolomeo Bogliolo · arXiv · Jul 23, 2026
Large Language Models (LLMs) excel at natural language understanding and generation but remain unreliable for multi-step logical reasoning, especially in safety-critical or compliance-sensitive domains. Recent neuro-symbolic approaches addr…
Capital Markets LLM Reliability Score (CM-LRS): From Plausible to Bankable
Prerit Ahuja · arXiv · Jul 23, 2026
In capital-markets workflows the question is rarely whether a large language model can produce a fluent draft, but whether the draft is bankable: defensible in front of a counter-party or a regulator, with the documents in hand. Existing me…
GRADRAG: Cross-Component Prompt Adaptation for Coordinated Multi-Agent RAG
Paolo Pedinotti, Enrico Santus · arXiv · Jul 23, 2026
Retrieval-Augmented Generation (RAG) systems increasingly employ multiple LLM agents. Yet, most prior work optimizes components in isolation rather than coordinating improvements across the pipeline. We introduce GRADRAG, a framework for cr…
AI Assistants Overassist
Verona Teo, Raghav Jain, Tobias Gerstenberg, Max Kleiman-Weiner · arXiv · Jul 23, 2026
Large language models (LLMs) are increasingly used as tutors and thought partners, helping users reason through problems. While guidance from AI assistants can scaffold thinking and foster learning, such benefits depend on how they help--fo…
Adaptive Depth Sparse Framework: Similarity-Driven Resource Allocation for Pre-Trained LLMs
Yidu Wu, Xiang Wang, Kejie Zhao, Zhangchi Wang et al. · arXiv · Jul 23, 2026
Large language models (LLMs) achieve strong generation and reasoning performance, but the Transformer architecture incurs high inference cost. Existing acceleration methods often rely on task-specific fine-tuning or training from scratch, i…
news-crawler-LM: A Small Long-Context Model For High-Quality News Crawling
Pascal Stolzenburg, Jonas Golde, Max Dallabetta, Alan Akbik · arXiv · Jul 23, 2026
Extracting structured content from news pages remains challenging due to heterogeneous HTML layouts, inconsistent markup, and substantial boilerplate such as navigation elements and advertisements. Rule-based news crawlers can achieve high …
A Unified Moral-Value Dataset for Instruction Tuning
Zhaohui Zeng, Florian Mai · arXiv · Jul 23, 2026
Large language models (LLMs) have developed rapidly and become valuable tools in everyday life. However, how to align LLMs to a particular set of human values is still an open problem. Recent studies show that instruction tuning has strong …
A Comparative Evaluation of Embeddings and LLMs in a Greek Book Publisher Setting - The CUP Dataset
Katerina Papantoniou, Panagiotis Papadakos, Theodore Patkos, Dimitris Garefalakis et al. · arXiv · Jul 23, 2026
We present CUP, a Greek book retrieval benchmark consisting of 868 catalog records and 104 expert-annotated queries with graded relevance judgments. We evaluate sparse (BM25), dense (sentence-transformers), hybrid, and LLM-assisted retrieva…
One More Turn, Less Regret: A Regret-Based Multi-Turn Benchmark for LLMs' Clarification Policies
Minh Ngoc Ta, My Anh Tran Nguyen, Duong D. Nguyen, Yuxia Wang et al. · arXiv · Jul 23, 2026
Ambiguous user requests make clarification a sequential decision problem for conversational LLM assistants: they must decide whether to ask, what to ask, when to stop, and when to answer. We introduce RegretBench, a multi-turn benchmark tha…
Notes to Self: Can LLMs Benefit from Experiential Abstractions?
Chang Liu, Xinyu Li, Artur Dubrawski · arXiv · Jul 22, 2026
Humans distill experience into reusable abstractions, e.g., strategies and cautionary reminders, and apply them to gradually solve problems more effectively. We study whether Large Language Models (LLMs) can similarly benefit from such expe…
Test-Time Training for Modality Order Consistency in Vision-Language Models
Aditi Gupta, Yossi Gandelsman · arXiv · Jul 22, 2026
We find that vision-language models are sensitive to a specific semantically irrelevant change: the order in which the image and question are presented. Across three models and three benchmarks, image first prompting consistently outperform…
The Blessing of Dimensionality: How Near-Orthogonality in High-Dimensional Spaces Explains Temporal Portability
Abigail Woodring, Adrian Chan, Rana Muhammad Shahroz Khan, Sukwon Yun et al. · arXiv · Jul 22, 2026
Fine-tuning has been widely used to adapt large language models (LLMs) for domain-specific tasks. Parameter efficient fine-tuning (PEFT) methods such as low-rank adaptation (LoRA) are frequently used to reduce computational costs. PortLLM i…
Sound Probabilistic Safety Bounds for Large Language Models
Mahdi Nazeri, Anne-Kathrin Schmuck, Sadegh Soudjani, Alessandro Abate · arXiv · Jul 22, 2026
We propose a novel framework for computing rigorous bounds on the probability that a large language model (LLM) generates harmful output to a given prompt. We study a new application of the Clopper-Pearson confidence intervals to obtain pro…
Which Values Do LLMs Confuse? A Schwartz-Based Recognition Study
Andrei Chetvergov, Stepan Ukolov, Timofei Sivoraksha, Alexander Evseev et al. · arXiv · Jul 22, 2026
Large language models are increasingly evaluated through the values they endorse, but such evaluations presuppose that models can identify the value expressed in a concrete situation. We study this prerequisite as controlled top-1 recogniti…
The Maskability Index: Predicting Task-Objective Alignment in Pretrained Language Models
Ahmad Pouramini, Mahsa Afsharzadeh · arXiv · Jul 22, 2026
Large-scale pretrained language models such as T5 and BERT have demonstrated strong capabilities for generating structured knowledge. However, their performance depends on how closely the prompting strategy matches the objectives used durin…
Exposure is Optional: Learning Unlike Coordination in Language Models
Jiamu Luo, Shane Steinert-Threlkeld · arXiv · Jul 22, 2026
Coordination, a fundamental linguistic structure, remains a subject of intense debate, and its exact nature continues to elude theoretical linguistics. A common view holds that only same-category constituents can be conjoined, which has bee…
On the Systematic Challenges of Culturally Loaded Machine Translation: Dream of the Red Chamber as the Cultural Lens
Yiming Wang, Jiayuan Di · arXiv · Jul 22, 2026
Culturally loaded translation poses unique challenges for machine translation (MT), as meanings are deeply embedded in socio-cultural contexts beyond surface linguistic forms. Although large language models (LLMs) have enabled MT systems to…
surprisal is Not a Theory
Andrés Buxó-Lugo, Aniello De Santo, Morgan Grobol, Ryan J. Hubbard et al. · arXiv · Jul 22, 2026
Surprisal Theory is often characterized as a computational-level explanation per (Marr, 1982). We argue in this work that, even though a computational level narrative has been used to support "representation-agnostic research" within comput…
Gotta Catch them all: the modes of Sycophancy
Shreyans Jain, Alexandra Yost, Amirali Abdullah · arXiv · Jul 22, 2026
Large language models often align with users' beliefs at the expense of factual accuracy, a behavior known as sycophancy. Prior mechanistic studies largely treat sycophancy as a single behavioral dimension that can be uniformly amplified or…

Track Large Language Models on Distill AI — start free →

Latest Large Language Models Research Papers

Recent papers

Related topics