Infrastructure

Latest Efficient ML Research Papers

The newest Efficient ML papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Efficient ML so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Efficient ML papers in your inbox — free →

Recent papers

Security- as- a- service: enhancing cloud security through managed security solutions
Zainab S. Attarbashi, Azana Hafizah Mohd Aman, Salem Sati, Nur-Adib Maspo et al. · The International Islamic U... · Dec 1, 2026
Cloud computing plays an important role in modern businesses by enabling flexible, efficient storage, analysis, and access to data and applications. However, this reliance also introduces new security challenges. Ensuring cloud security and…
Ultrasonic phased array measurement & compression for in-process weld bevel estimation
Angelos Dimakos, Nina E. Sweeney, Charalampos Loukas, Ewan Nicolson et al. · Strathprints: The Universit... · Aug 1, 2026
This work presents a real-time ultrasonic inspection pipeline for weld bevel orientation measurement in robotic welding applications, addressing two major barriers to industrial deployment: reliable bevel geometry estimation in challenging …
X$^3$-OPD: Distilling Reasoning into Large Audio-Language Models via On-Policy Alignment
Dongjie Fu, Di Cao, Xize Cheng, Zihan Zhang et al. · arXiv · Jul 23, 2026
While large audio-language models have achieved remarkable progress in auditory perception, they still lag behind text-based large language models in deep logical reasoning, primarily due to the scarcity of high-quality audio reasoning data…
Windowed-MTP: Removing the Full-Context Draft-KV Tax at Million-Token Context
Alagappan Valliappan · arXiv · Jul 23, 2026
Speculative decoding accelerates autoregressive generation by having a cheap draft propose tokens that a target verifies in parallel. Frontier models increasingly ship a built-in Multi-Token-Prediction (MTP/NEXTN) draft head under the assum…
KroQuant: Kronecker-Structured Block Transforms for Efficient Post-Training Quantization of Diffusion Transformers
Yann Bouquet, Alireza Khodamoradi, Kristof Denolf, Mathieu Salzmann · arXiv · Jul 23, 2026
Post-training quantization (PTQ) of diffusion transformers (DiTs) to W4A4 severely degrades output quality, because activations entering each linear layer contain outliers that 4-bit formats cannot represent. The standard fix applies an inv…
Classical Hardware Acceleration of Quantum Autoencoders for Real-Time Anomaly Detection in Collider Experiments
Ivan Ge, Sagar Addepalli, Abhilasha Dave, Julia Gonski · arXiv · Jul 22, 2026
Quantum machine learning (QML) algorithms in high energy physics (HEP) can efficiently represent and leverage long-range, high-order correlations in high-dimensional collider data, potentially with fewer parameters and favorable scaling rel…
The Blessing of Dimensionality: How Near-Orthogonality in High-Dimensional Spaces Explains Temporal Portability
Abigail Woodring, Adrian Chan, Rana Muhammad Shahroz Khan, Sukwon Yun et al. · arXiv · Jul 22, 2026
Fine-tuning has been widely used to adapt large language models (LLMs) for domain-specific tasks. Parameter efficient fine-tuning (PEFT) methods such as low-rank adaptation (LoRA) are frequently used to reduce computational costs. PortLLM i…
User-Centric Modeling of Transactional Sequences with Explainable State Space Models
Ivan Palagin · arXiv · Jul 22, 2026
We propose a hybrid approach for user-centric modeling of transactional event sequences that combines contrastive representation learning (CoLES) with State Space Models (SSMs). While contrastive methods yield high-quality compressed user r…
CircuitKIT : Circuit Discovery, Evaluation, and Application Toolkit for Mechanistic Interpretability
Pratinav Seth, Hem Gosalia, Aditya Kasliwal, Vinay Kumar Sankarapu · arXiv · Jul 21, 2026
Circuit analysis can support not only model explanation but also downstream interventions such as pruning, editing, steering, and selective fine-tuning. However, conducting such analyses currently requires stitching together separate implem…
Thermodynamics-Informed Input Reparameterization for Neural Prediction of Real-Fluid Thermodynamic Properties in Supercritical Combustion
Haoze Zhang, Han Li, Ke Xiao, Yangchen Xu et al. · arXiv · Jul 21, 2026
Real-fluid thermodynamic property evaluation is a major computational cost in supercritical combustion simulations. In the enthalpy-based pressure-correction formulation, the closure evaluates temperature T, density $ρ$, and compressibility…
In-Context Time Series Classification with Random Convolutional Features
Joscha Cüppers, Jilles Vreeken · arXiv · Jul 21, 2026
Time series classification is central to domains like medical signal analysis, industrial monitoring, and sensor-based activity recognition, where class information manifests as localized shapes, specific frequencies, temporal shifts, or co…
AdaFlash: Adaptive Speculative Decoding via On-Policy Distilled Diffusion Drafters
Yu-Yang Qian, Hao-Cong Wu, Chen Chen, Jiacheng Sun et al. · arXiv · Jul 21, 2026
Speculative decoding, in which a lightweight draft model first generates a draft sequence that is then verified in parallel by the target model, has become a prevalent paradigm for accelerating large language model inference. Recent work su…
MeanFlowNFT: Bringing Forward-Process RL to Average-Velocity Generators
Yushi Huang, Xiangxin Zhou, Jun Zhang, Liefeng Bo et al. · arXiv · Jul 16, 2026
MeanFlow generators achieve fast few-step sampling by predicting average velocities over time intervals, making them attractive for efficient generation. Reinforcement learning (RL) has become a powerful way to align diffusion and flow mode…
On-Policy Delta Distillation
Byeongho Heo, Jaehui Hwang, Sangdoo Yun, Dongyoon Han · arXiv · Jul 16, 2026
On-policy distillation is an alternative post-training method in reinforcement learning that alleviates the constraints imposed by reward models by providing token-level supervision from a teacher model. Although on-policy distillation has …
Efficient Sequential Calibration with $O(T^{2/3-ε})$ Error Bound
Zihan Zhang · arXiv · Jul 14, 2026
We study the online binary sequential calibration problem. A recent breakthrough by \citet{dagan2024breaking} overcomes the classical $T^{2/3}$ barrier for calibration error. Building on this result, we present an efficient randomized for…
Accelerated Mixing Time of Randomized Hamiltonian Monte Carlo
Siddharth Mitra, Vishwak Srinivasan, Xiuyuan Wang, Andre Wibisono · arXiv · Jul 14, 2026
We show the Randomized Hamiltonian Monte Carlo (RHMC) algorithm has accelerated mixing time guarantees for sampling from log-concave probability distributions. RHMC proceeds by repeatedly simulating the continuous-time Hamiltonian dynamics …
Accelerating Masked Diffusion Large Language Models: A Survey of Efficient Inference Techniques
Daehoon Gwak, Minhyung Lee, Junwoo Park, Jaegul Choo · arXiv · Jul 14, 2026
Diffusion large language models (dLLMs) offer a theoretical advantage in parallel generation over standard autoregressive models. However, parallel generation alone does not guarantee practical speedups. Realizing this efficiency requires s…
AVQ-Attention: Adaptive Vector-Quantized Attention
Winfried van den dool, Patrick Forré, Amir Habibian, Yuki M. Asano et al. · arXiv · Jul 14, 2026
The $\mathcal{O}(N^2)$ complexity of attention over $N$ tokens remains a computational bottleneck in transformer models. Vector-Quantized (VQ) attention reduces this to $\mathcal{O}(MN)$ by representing keys with $M$ codewords, but applies …
Directional Constraints for Efficient Exploration in Safe Reinforcement Learning
Paolo Magliano, Puze Liu, Jan Peters, Davide Tateo et al. · arXiv · Jul 14, 2026
Reinforcement Learning has revolutionized the landscape of robotic research, allowing robust learning of complex robotic skills in simulation. However, real-world deployment in open-ended environments requires strong safety guarantees to pr…
Requential Coding: Pushing the Limits of Model Compression with Self-Generated Training Data
Shikai Qiu, Marc Finzi, Yujia Zheng, Kun Zhang et al. · arXiv · Jul 13, 2026
Compression is fundamental to intelligence. A model that can represent its training data as a short code has discovered regularities that enable generalization. Large neural networks may learn functions far simpler than their parameter coun…
HiFi-LLP: High-Fidelity, Low-Cost Latency Predictors with Confidence for Robust HW-NAS
Shambhavi Balamuthu Sampath, Behzad Shomali, Nael Fasfous, Moritz Thoma et al. · arXiv · Jul 13, 2026
With deep neural networks (DNNs) increasingly deployed on edge devices, hardware (HW)-aware optimization techniques--such as HW-aware compression and HW-aware neural architecture search (HW-NAS)--have become essential. These methods rely on…
CatRetriever: Contrastive Representation Learning for Slab-to-Bulk Retrieval in Generative Catalyst Discovery
Jungho Oh, Woosung Kim, Dong Hyeon Mok, Jonggeol Na et al. · arXiv · Jul 13, 2026
Inverse design is an emerging data-driven paradigm for efficiently navigating vast chemical spaces to discover new materials with targeted properties, and in the context of heterogeneous catalysis, surface generative models have recently ad…
Entropy-Constrained Machine Learning with Residual Data Augmentation for Modeling Chemical Kinetics
Okezzi Ukorigho, Opeoluwa Owoyele · arXiv · Jul 10, 2026
We present a physics-constrained machine learning framework for accelerating the direct numerical simulation (DNS) of turbulent reacting flows. The model replaces the direct evaluation of detailed chemical source terms with a surrogate that…
Multimodal Scenario Similarity Search for Autonomous Driving
Tamás Matuszka, András Tamásy, Balázs Szolár · arXiv · Jul 10, 2026
Large-scale autonomous-driving datasets contain vast numbers of recorded scenarios, creating a need for efficient retrieval methods that can identify situations similar to a given query. Existing approaches typically rely on either visual r…
SLORR: Simple and Efficient In-Training Low-Rank Regularization
David González-Martínez, Shiwei Liu · arXiv · Jul 9, 2026
Low-rank factorization is widely used to compress neural networks, but modern models are often not naturally amenable to aggressive factorization without significant accuracy loss. Existing training-time low-rank regularizers can improve co…
Super Weights in LLMs and the Failure of Selective Training
Shreyas Subramanian, Adewale Akinfaderin, Akarsha Sehwag · arXiv · Jul 9, 2026
Recent work identified Super Weights, individual parameters whose removal degrades model performance by orders of magnitude. We show that this degradation due to pruning Super Weights does not universally apply to all LLMs. Furthermore, if …
Deep Learning for Joint Narrowband Interference Cancellation and Soft Demodulation in OFDM Systems
Emmanouil Kavvousanos, Francky Catthoor, Vassilis Paliouras · arXiv · Jul 9, 2026
Narrowband interference (NBI) severely degrades orthogonal frequency-division multiplexing (OFDM) systems by corrupting subcarriers and rendering classical soft demodulation ineffective. Conventional compressed-sensing (CS) mitigation exhib…
A Practical Investigation of Training-free Relaxed Speculative Decoding
Guoxuan Xia, Luka Ribar, Paul Balanca · arXiv · Jul 9, 2026
Speculative decoding accelerates sampling from an autoregressive LLM by using a faster auxiliary model to draft tokens which are then verified in parallel by the LLM. Standard speculative decoding is lossless: its rejection and resampling s…
BiSCo-LLM: Lookup-Free Binary Spherical Coding for Extreme Low-Bit Large Language Model Compression
Yuantian Shao, Peisong Wang, Zhilei Liu, Chuangyi Li et al. · arXiv · Jul 9, 2026
Large language models (LLMs) are increasingly constrained by memory capacity, weight bandwidth, and checkpoint storage during deployment. Existing low-bit compression methods mainly follow two directions. Scalar or group-wise quantization i…
Selective Timestep Weighting and Advantage-Based Replay for Sample-Efficient Diffusion RLHF
Eric Zhu, Abhinav Shrivastava, Soumik Mukhopadhyay · arXiv · Jul 8, 2026
Reinforcement learning from human feedback (RLHF) has emerged as a powerful paradigm for aligning generative models with human preferences. However, applying RLHF to diffusion models remains highly feedback inefficient, as existing approach…

Track Efficient ML on Distill AI — start free →

Latest Efficient ML Research Papers

Recent papers

Related topics