Latest Dialogue Systems Research Papers
The newest Dialogue Systems papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Dialogue Systems so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Dialogue Systems papers in your inbox — free →Recent papers
- Multi-Faceted Interactivity Alignment in Full-Duplex Speech ModelsAtsumoto Ohashi, Neil Zeghidour, Alexandre Défossez, Eugene Kharitonov · arXiv · Jun 9, 2026
Full-duplex spoken dialogue models can listen and speak simultaneously, making them a promising architecture for natural conversation. However, current models are trained solely with supervised learning through token-level likelihood maximi…
- ConvMemory v2: A Recall-Preserving Top-10 Evidence Reranker for Conversational Memory RetrievalTaiheng Pan · arXiv · Jun 9, 2026
We describe ConvMemory v2, an opt-in token-evidence reranker that sits after the lightweight ConvMemory v1 reranker and reorders only v1's protected top-10 candidate set. v2 is a fine-tuned ms-marco-MiniLM-L-6-v2 cross-encoder (22,713,601 p…
- When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning ModelsSai Kartheek Reddy Kasu, Nils Lukas, Samuele Poppi · arXiv · Jun 9, 2026
Failures in multi-turn reasoning models are largely invisible to terminal-score evaluation. A model can lock onto an unsafe stance early in a long dialogue, yet its final-turn refusal rate may appear indistinguishable from a robustly aligne…
- Detecting Knowledge Gaps from Conversational AI Interactions Using Curriculum Prerequisite GraphsYoussef Medhat, Junsoo Park, Ploy Thajchayapong, Ashok K. Goel · arXiv · Jun 9, 2026
Large online courses generate thousands of student questions directed at conversational AI teaching assistants, yet these interaction logs remain largely untapped as diagnostic signals. We present a pipeline that maps student questions from…
- ParaBridge: Bridging Paralinguistic Perception and Dialogue Behavior in Speech Language ModelsYuxiang Wang, Qinke Ni, Shengbo Cai, Wan Lin et al. · arXiv · Jun 9, 2026
Speech carries more information than just words: a child's voice, a fearful tone, or a noisy background should all lead a sufficiently competent spoken-dialogue assistant to different replies. Current Speech Language Models (SLMs) can recog…
- Expert-Level Crisis Detection in Mental Health ConversationsGrace Byun, Abigail Lott, Rebecca Lipschutz, Sean T. Minton et al. · arXiv · Jun 9, 2026
Real-world crisis intervention is inherently conversational, yet existing research largely focuses on static texts.Real-world crisis intervention is inherently conversational, yet existing research largely focuses on static texts. When appl…
- Catching One in Five: LLM-as-Judge Blind Spots in Production Multi-Turn Transaction AgentsSawyer Zhang, Alexander Wang, Sophie Lei · arXiv · Jun 9, 2026
LLM-as-judge is the default instrument for evaluating conversational agents, yet its reliability is almost always reported as agreement with human ratings, not recall of real defects. We study a deployed multi-turn food-and-beverage orderin…
- UXBench: Benchmarking User Experience in AI AssistantsMengze Hong, Xia Zeng, Zeyang Lei, Sheng Wang et al. · arXiv · Jun 8, 2026
As AI assistants serve millions of users daily, evaluating user experience (UX) beyond general model capability has become increasingly important. We present UXBench, the first user-centric benchmark grounded in real user feedback signals f…
- One Model, Multiple Goals: Adaptive Multi-Objective Learning for E-commerce Dialogue SystemsMingzhe Li, Jing Xiang, Enguo Zhou, Lang Gao et al. · arXiv · Jun 8, 2026
Dialogue systems in e-commerce scenarios often need to satisfy multiple objectives: accurately reasoning over user profiles (e.g., eligibility, credit limit) to ensure correct decision-making and user state interpretation, while also genera…
- Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement LearningDaoyu Wang, Mingyue Cheng, Qingchuan Li, Shuo Yu et al. · arXiv · Jun 8, 2026
Agentic reinforcement learning (RL) has become an important post-training paradigm for turning LLMs from static chatbots into interactive agents, giving rise to representative applications such as OpenClaw. Existing work mainly focuses on p…
- Bridging the Agent-World Gap: Text World Models for LLM-based AgentsYixia Li, Hongru Wang, Peng Lai, Zhiwen Ruan et al. · arXiv · Jun 8, 2026
Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions…
- M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent InteractionsZhengjun Huang, Wenxuan Liu, Zhoujin Tian, Wei Chen et al. · arXiv · Jun 5, 2026
Language agents are increasingly deployed over accumulating multimodal information, yet existing benchmarks assume a human-human form with sparse visuals and straightforward content, evaluating neither reasoning over authentic multimodal fi…
- An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing DetectionCarl Lochstampfor, Ayan Roy · arXiv · Jun 5, 2026
Our prior work introduced COVA, a synthetically generated multi-turn conversational smishing dataset of 3,201 labeled conversations, establishing baseline detection benchmarks across eight models. While XGBoost with TF-IDF features achieved…
- Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online DiscussionsXinnong Zhang, Wanting Shan, Hanjia Lyu, Zhongyu Wei et al. · arXiv · Jun 4, 2026
Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether t…
- Staying with the Uncertainty: Uncertainty-Scaffolding Strategies for Artificial Moral Advisors in LLM-to-LLM Simulated ConversationsSalvatore Greco, Hainiu Xu, Jacopo Domenicucci, Yulan He et al. · arXiv · Jun 4, 2026
LLMs are increasingly deployed as Artificial Moral Advisors (AMA) in a variety of contexts: what kind of conversational patterns should they display? In this paper, we study how AMA can help their interlocutors "stay with the uncertainty". …
- WAXAL-NET: Finetuned Edge ASR Across 19 African LanguagesVictor Tolulope Olufemi, Oreoluwa Babatunde, Ramsey Njema, Bolarinwa Gbotemi et al. · arXiv · Jun 1, 2026
We evaluate whether compact domain-specialized ASR models can outperform massively multilingual foundation models for conversational African speech across 19 languages in the WAXAL corpus. Fine-tuned edge models achieve a macro-averaged WER…
- The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal DialogueSherzod Hakimov, Mattia D'Agostini, Ivan Samodelkin, David Schlangen · arXiv · Jun 1, 2026
We introduce the Image Reconstruction Game, a fully automated benchmark in which a vision-language model issues corrective instructions to an image generator across multiple turns, making accumulated common ground directly observable as a r…
- THRD: A Training-Free Multi-Turn Defense Framework for Jailbreak Attacks on Large Language ModelsZhiqing Ma, Zhonghao Xu, Dong Yu, Chen Kang et al. · arXiv · Jun 1, 2026
Multi-turn jailbreak attacks pose a growing threat to LLMs by exploiting conversational dynamics such as gradual escalation and cross-turn coordination. Existing defenses either rely on costly retraining -- often degrading model utility -- …
- RCEM: Embedder Equipped with Query Rewriting Skill for Robust Conversational Search in Distributional ShiftKilho Son, Paul Hsu, Cha Zhang, Dinei Florencio · arXiv · Jun 1, 2026
Conversational search has become increasingly important in retrieval-augmented generation (RAG) systems, where users interact with AI assistants through multi-turn conversations containing context-dependent queries. We propose RCEM, a conve…
- VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational AgentsAmrita Mazumdar, Seonwook Park, Rajarshi Roy, Nikhil Srihari et al. · arXiv · May 28, 2026
Natural human conversation is full-duplex and audio-visual: people simultaneously speak and listen while continuously interpreting and producing nonverbal cues, such as nods, smiles, and gestures. To support successful human-agent interacti…
- Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information SeekingSongbo Hu, Yinhong Liu, Ej Zhou, Evgeniia Razumovskaia et al. · arXiv · May 28, 2026
Creating spoken dialogue datasets is methodologically challenging, and these challenges are amplified when the goal is to build multilingual, multi-parallel datasets at scale. This work introduces HEALTHDIAL, a large-scale, multilingual, an…
- Who Am I? History-Aware Profiles for Student Simulation in Tutoring DialoguesZhangqi Duan, Shuyan Huang, Alexander Scarlatos, Jaewook Lee et al. · arXiv · May 28, 2026
A key part of developing large language model (LLM)-powered, automated tutoring tools is student simulation, i.e., using LLMs to role-play as students, which can facilitate tutor model evaluation and training. Existing work mostly focuses o…
- User-Aware Active Knowledge Acquisition for Emotional Support DialogueMufan Xu, Kehai Chen, Jiahao Hu, Xinchao Xu et al. · arXiv · May 28, 2026
Emotional support plays an important role in dialogue systems, and its success depends on adapting to a user's evolving and implicit needs across multi-turn interactions while leveraging the strong reasoning capacity of large language model…
- GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM AgentsJohannes Moll, Jean-Philippe Corbeil, Jiazhen Pan, Martin Hadamitzky et al. · arXiv · May 28, 2026
LLM agents acting in structured environments fail in operational rather than conversational ways, and reliability depends on procedural knowledge of the environment. Prior self-improvement methods accumulate natural-language guidance withou…
- When Seekers Are Hard to Help: Evaluating Emotional Support Dialogue Systems in Worst-Case InteractionsJiajie Yang, Yangchun Li, Guanyi Chen, Rui Fan et al. · arXiv · May 27, 2026
Emotional Support Dialogue Systems (ESDSes) are increasingly evaluated and trained with LLM-simulated seekers. However, such simulated seekers often behave as cooperative, average-case users who disclose clearly, respond constructively, and…
- ConvMemory: A Lightweight Learned Memory Reranker, a Negative Attribution Result, and a Research-Preview Conflict EditorTaiheng Pan · arXiv · May 27, 2026
We describe ConvMemory, a small 3.6M-parameter learned reranker for conversational long-term memory retrieval, trained with cross-encoder teacher supervision over fused dense and lexical features. On the LongMemEval memory family, ConvMemor…
- MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational AgentsZihan Li, Xingyu Fan, Feifei Li, Wenhui Que · arXiv · May 27, 2026
Existing agent memory systems universally follow what we term a Memory-as-Tool paradigm where a single query triggers one-shot retrieval of flat passage lists, suffering from passive invocation, reasoning-retrieval decoupling, and structura…
- Personality, Role, and Expressive Style in Large Language Models: An Interactionist AnalysisMoe Nagao, Koichiro Terao, Mikio Nakano, Naoto Iwahashi · arXiv · May 27, 2026
Prompt-based personality control is a key technique for designing large language model (LLM) dialogue agents that behave consistently across social contexts. However, specifying Big Five personality traits (BFTs) in a prompt does not ensure…
- Forgotten Words: Benchmarking NeoBERT for Dementia Detection in Low-Resource Conversational Filipino and English SpeechRez Samantha Z. Floresca, Edric Castel C. Hao, Hannah Grachiella Buñales, Chelsea Dominique E. Temprosa et al. · arXiv · May 25, 2026
Dementia detection from spontaneous speech offers a scalable approach to cognitive screening, yet NLP systems remain predominantly English-centric. This limitation is especially acute in the Philippines, where Filipino-English code-switchin…
- SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLM Dialogue via RL-Driven Prompt OptimisationMichael Orme, Yanchao Yu, Zhiyuan Tan · arXiv · May 25, 2026
Ensuring safe and contextually appropriate behaviour in Large Language Models (LLMs) remains a critical challenge for real-world deployment. We present \textbf{SafeCtrl-RL}, an inference-time behavioural control framework that enables adapt…