Latest Machine Translation Research Papers
The newest Machine Translation papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Machine Translation so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Machine Translation papers in your inbox — free →Recent papers
- Who Brought Easter Eggs to Eid? Auditing Cultural Translation of Math Word Problems Across Diverse Languages and RegionsParisa Suchdev, Juniper Lovato · arXiv · Jun 9, 2026
Large language models are increasingly used to adapt math word problems for personalized learning at scale, but it remains an open question whether those adaptations are consistent across models, preserve cultural diversity at scale, and re…
- Multilingual Word-Level Forced Alignment with Self-Supervised Representations and Learned Dynamic ProgrammingRoy Weber, Meidan Zehavi, Rotem Rousso, Joseph Keshet · arXiv · Jun 9, 2026
We present a method for accurate multilingual word-level forced alignment, consisting of an alignment encoder and a learned alignment decoder. The encoder integrates two representations: one from the Massively Multilingual Speech (MMS) mode…
- Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic DownsamplingGuodong Lin, Ziqi Chen, Yuxiang Fu, Ke Li et al. · ICASSP · Jun 9, 2026
The rapid progress of large language models (LLMs) has opened up a new frontier for automatic speech recognition (ASR), making their effective integration a critical and challenging research direction. To this end, this work proposes a proj…
- Which LoRA? An Empirical Study on the Effectiveness of LoRA Techniques During Multilingual Instruction TuningThamali Wijewardhana, Napoleon H. Reyes, Surangika Ranathunga · arXiv · Jun 9, 2026
We investigate whether commonly available LoRA variants have an advantage over basic LoRA in multilingual instruction tuning. Experiments involving LoRA and four other variants on two datasets across diverse target languages show that there…
- OpenRTLSet: A Fully Open-Source Dataset for Large Language Model-based Verilog Module DesignJinghua Wang, Lily Jiaxin Wan, Sanjana Pingali, Scott Smith et al. · arXiv · Jun 9, 2026
OpenRTLSet introduces the largest fully open-source dataset for hardware design, offering over 131,000 diverse Verilog code samples to the research community and industry. Our dataset uniquely combines Verilog code from GitHub repositories …
- Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' MayanAlexander Chulzhanov, Soeren Eberhardt, Arjun Mukherjee · arXiv · Jun 8, 2026
Neural machine translation for digitally low-resource Indigenous languages is often hindered by extreme data scarcity, prompting reliance on extractive web-scraping. To ensure data sovereignty, this study introduces a data synthesis methodo…
- Beyond Accuracy: Community Perspectives on Machine TranslationYujun Wang, Ehud Reiter, Shimei Pan, Steffen Eger et al. · arXiv · Jun 8, 2026
Despite remarkable progress in machine translation (MT), non-AI communities have raised growing concerns about MT systems, suggesting a noticeable gap between technical advancement and the needs of real-world users. For instance, while NLP …
- OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource LanguagesDavid Guzmán, Luel Hagos Beyene, Jesujoba Oluwadara Alabi, Yejin Jeon et al. · arXiv · Jun 8, 2026
Recent advances in neural text-to-speech (TTS) and multilingual speech generation have substantially improved synthetic speech quality, yet these gains remain unevenly distributed across the world's languages. Existing models are still domi…
- Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource LanguagesChowdam Venkata Kumar, Kumud Tripathi, Pankaj Wasnik · arXiv · Jun 8, 2026
Multilingual ASR models such as Whisper perform well on high-resource languages but exhibit substantially higher Word Error Rates (WER) for Dravidian languages compared to Indo-Aryan ones. Through linguistic and dataset analysis, we show th…
- Reasoning without Gold Standards: A Proxy-Judge Theory of AutoformalizationLei Xu, Xin Quan, André Freitas · arXiv · Jun 8, 2026
Complex reasoning tasks increasingly require systems to produce outputs whose correctness cannot be judged by exact match against a single reference. Autoformalization (AF) is a representative example; it asks a model to translate informal …
- MUDIDI: A Two-Stage Framework for Multilingual Dictionary Digitization with Language ModelsDavid Setiawan, Temuulen Khishigsuren, Milind Agarwal, Pagnarith Pit et al. · arXiv · Jun 8, 2026
Multilingual dictionaries are among the most valuable documentary resources for low-resource and endangered languages, yet many remain available only as scans. For many decades, their digitization and conversion into a machine-readable form…
- Introducing multiplex semantic networks as multifaceted representations of creative associative knowledge across multilingual samplesEdith Haim, Kurt Haim, Roger E. Beaty, Cynthia S. Q. Siew et al. · arXiv · Jun 8, 2026
Creativity is a complex cognitive ability that relies on knowledge organisation and retrieval from semantic memory. Yet most research uses a single task to measure it, capturing only a fraction of this complexity. This study investigates mu…
- Culturally-Adapted Red-Teaming Across East and Southeast Asian Contexts: A Methodological and Comparative AnalysisHyeji Choi, Yongtaek Lim, Minwoo Kim · arXiv · Jun 8, 2026
Multilingual safety evaluation of large language models (LLMs) has predominantly relied on direct translation (DT) of English benchmarks into target languages - an approach that converts surface-level linguistic form while failing to reflec…
- KIT's Submission to Cross-Lingual Voice Cloning in IWSLT 2026Seymanur Akti, Alexander Waibel · arXiv · Jun 5, 2026
Cross-lingual voice cloning aims to generate speech in a target language while preserving speaker identity from a source-language reference. This task is central to speech translation and is the focus of the IWSLT 2026 Cross-Lingual Voice C…
- UrduMMLU: A Massive Multitask Benchmark for Urdu Language UnderstandingAhmer Tabassum, Sarfraz Ahmad, Hasan Iqbal, Owais Aijaz et al. · arXiv · Jun 5, 2026
Meaningful multilingual evaluation must test models in the target language and educational context. Urdu, spoken by more than 230 million people, lacks a broad MMLU-style benchmark built from native educational sources. We introduce UrduMML…
- Style or Content? Evaluating Style Classifiers with Controlled Content OverlapZhuo Liu, Haozheng Du, Xiangxiang Xu, Hangfeng He · arXiv · Jun 5, 2026
Style classifiers can use content cues that correlate with style labels in naturally collected data, yet we lack a systematic way to measure this reliance. We study this problem with a controlled content overlap setup built on parallel Bibl…
- mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages?Yerzhan Sapenov, Jaromir Savelka · arXiv · Jun 5, 2026
We introduce mmPISA-bench, a compact high-quality multilingual reasoning benchmark derived from the OECD Programme for International Student Assessment (PISA). The benchmark consists of 25 multiple-choice questions that require reasoning in…
- MADE: Beyond Scoring via a Multilingual Agentic Diagnosing Engine for Fine-Grained Evaluation InsightsYilun Liu, Miao Zhang, Shimin Tao, Minggui He et al. · arXiv · Jun 5, 2026
Multilingual and multicultural benchmarks now cover dozens of languages and model families, but the resulting score landscapes remain metric-rich and insight-poor, necessitating fine-grained multilingual post-evaluation diagnosis. However, …
- Translate-R1: Cost-Aware Translation Tool Use via Reinforcement LearningPratik Jayarao, Chaitanya Dwivedi, Himanshu Gupta, Neeraj Varshney et al. · arXiv · Jun 5, 2026
The performance gap across languages in LLMs is well documented, and closing it natively requires pretraining or fine-tuning on corpora that, for most languages, do not exist. Translation offers an alternative: converting an input into the …
- Reinforcement Learning Elicits Contextual Learning of Unseen Language TranslationHanxu Hu, Zdeněk Šnajdr, Pinzhen Chen, Jannis Vamvas et al. · arXiv · Jun 4, 2026
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific lan…
- A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM TranslationPetr Parshakov · arXiv · Jun 4, 2026
We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs from 74 na…
- "Chi nas dal soch el sent de legn" -- Auditing Text Corpora for LombardEdoardo Signoroni, Pavel Rychlý · arXiv · Jun 4, 2026
Several of the world's languages are still under-resourced in terms of Natural Language Processing (NLP) tools. This is mostly due to the lack of high-quality datasets to train, develop, and evaluate systems and models for several tasks, su…
- Ouvia: A User-centered Framework for Measuring Usability of Speech Translation in Real-World Communication ScenariosGiuseppe Attanasio, Beatrice Savoldi, Daniel Chechelnitsky, Matteo Negri et al. · arXiv · Jun 4, 2026
Speech translation (ST) is increasingly adopted in user applications, yet its evaluation largely focuses on decontextualized testbeds and holistic quality, rather than end users' communication needs. We introduce Ouvia, an evaluation framew…
- Automatic Labelling of Speech Translation ErrorsDominik Macháček, Maike Züfle, Ondrej Klejch · arXiv · Jun 4, 2026
Errors in speech translations reduce trustworthiness of Speech Translation (ST) systems and can have serious consequences. Yet currently there is no established methodology for evaluating confidence and quality estimation of speech translat…
- English-to-Prakrit Machine Translation via Multilingual Transfer LearningOm Choksi, Smit Kareliya, Shrikant Malviya, Pruthwik Mishra · arXiv · Jun 4, 2026
We study English-to-Prakrit machine translation in a low-resource setting where the target language is unsupported by IndicTrans2. We adapt the multilingual model by mapping Prakrit to the Hindi language tag (hin_Deva) without modifying the…
- Better Literary Translation: A Multi-Aspect Data Generation and LLM Training ApproachZhihao Lin, Ziqi Zhu, Hao Huang, Guanghui Wang et al. · arXiv · Jun 4, 2026
Literary translation poses unique challenges due to the scarcity of high-quality annotated data and the need to balance expression fluency with literary effect. We present a multi-aspect iterative refinement framework that generates high-qu…
- Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language PairsGio Paik, Hyunseo Shin, Soungmin Lee · arXiv · Jun 4, 2026
Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse la…
- SN-WER: Script-Normalized WER for Multi-Script Indic ASR EvaluationPriyaranjan Pattnayak · arXiv · Jun 1, 2026
Word Error Rate (WER) is the dominant metric for automatic speech recognition (ASR), but it can overestimate errors when references and hypotheses encode the same words in different scripts. This issue is common in multilingual settings whe…
- Learning When to Translate for Multilingual ReasoningDeokhyung Kang, Hyounghun Kim, Gary Geunbae Lee · arXiv · Jun 1, 2026
Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can…
- WAXAL-NET: Finetuned Edge ASR Across 19 African LanguagesVictor Tolulope Olufemi, Oreoluwa Babatunde, Ramsey Njema, Bolarinwa Gbotemi et al. · arXiv · Jun 1, 2026
We evaluate whether compact domain-specialized ASR models can outperform massively multilingual foundation models for conversational African speech across 19 languages in the WAXAL corpus. Fine-tuned edge models achieve a macro-averaged WER…