Latest Optimization Research Papers
The newest Optimization papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Optimization so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Optimization papers in your inbox — free →Recent papers
- OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinibAbhijoy Sarkar, Aarchi Singh Thakur · arXiv · Jun 9, 2026
Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computatio…
- TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement LearningHeming Zou, Qi Wang, Yun Qu, Yuhang Jiang et al. · arXiv · Jun 9, 2026
Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward cont…
- Rethinking the Divergence Regularization in LLM RLJiarui Yao, Xiangxin Zhou, Penghui Qi, Wee Sun Lee et al. · arXiv · Jun 8, 2026
Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential…
- Learning Dynamics Reveal a Hierarchy of Weight-Induced Layerwise Gram MetricsClaudio Nordio · arXiv · Jun 8, 2026
We study feed-forward ReLU networks with fixed readout and quadratic loss. The aim is to rewrite gradient descent not primarily as a dynamics in weight space, but as a collective dynamics closed in terms of fields defined on the training-se…
- Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex OptimizationMing Sun, Kun Yuan · arXiv · Jun 5, 2026
Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication e…
- Second-Order Path Kernel Interpolation Formulas in Machine LearningJin Guo, Roy Y. He, Jean-Michel Morel · arXiv · Jun 5, 2026
Understanding how training data shape neural network predictions is a central problem in modern learning theory. In 2020, Pedro Domingos proposed an interpolation formula valid for every model learned by deterministic gradient descent. It e…
- Drifting Models for Surrogate Flow ModelingChris R. Jung, Markus Dörr, Natalie Jüngling, Jennifer Niessner et al. · arXiv · Jun 5, 2026
While Computational Fluid Dynamics (CFD) provides high-fidelity flow fields for optimizing indoor environments, its computational cost limits rapid exploration. To solve this problem generative surrogates offer better distribution modeling …
- Amortized Neural Optimization for Pre-Layout Signal Integrity Design Space Exploration using Differentiable SurrogatesJulian Withöft, Werner John, Emre Ecik, Ralf Brüning et al. · arXiv · Jun 5, 2026
Pre-layout design space exploration (DSE) for high-speed signal integrity (SI) analysis is often limited by the computational cost of simulations and iterative optimization algorithms within modern electronic design automation (EDA) workflo…
- The Proxy Benders DecompositionChangkun Guan, El Mehdi Er Raqabi, Mathieu Tanneau, Pascal Van Hentenryck · arXiv · Jun 5, 2026
Benders decomposition is a fundamental framework for solving large-scale mixed-integer optimization problems with complicating variables that, when fixed, yield significantly easier subproblems. However, classical Benders decomposition repe…
- RREDCoT: Segment-Level Reward Redistribution for Reasoning ModelsMykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter · arXiv · Jun 4, 2026
Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to …
- Event Detection for Parameter-to-KPI Dependency Learning for AI-RANChristie Djidjev, Nicholas Kaminski · arXiv · Jun 4, 2026
Next-generation wireless networks are expected to rely on multiple concurrent AI-driven control functions that optimize different network objectives simultaneously, particularly in AI-integrated and open radio access network architectures s…
- Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation LossThomas T. Zhang, Alok Shah, Yifei Zhang, Vincent Zhang et al. · arXiv · Jun 4, 2026
Many modern applications of deep learning involve training a neural network via a one-step prediction loss (e.g., $L^2$ regression, cross-entropy), but deploy the network by rolling out along its own predictions. Key examples include autore…
- Drifting Preference Optimization for One-Step Generative ModelsZhou Jiang, Yandong Wen, Zhen Liu · arXiv · Jun 1, 2026
One-step text-to-image generators are attractive for deployment because they generate an image with a single forward pass, but preference finetuning them remains difficult: standard alignment methods often rely on policy likelihoods, denois…
- Wasserstein Contraction of Coordinate Ascent Variational InferenceRocco Caprio, Adrien Corenflos, Sam Power · arXiv · May 28, 2026
We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results…
- Affective Music Recommendation: A Rollout-Based World Model for Offline Preference OptimizationAudrey Chan, Aaron Labbé, Jacob Lavoie, Jordan Bannister et al. · arXiv · May 27, 2026
Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by the listener's affective state, but online experimentation on emotion is ethicall…
- Principled Algorithms for Optimizing Generalized Metrics in Multi-Label LearningMehryar Mohri, Yutao Zhong · arXiv · May 27, 2026
Many real-world classification tasks require predicting multiple labels per instance, necessitating the optimization of complex evaluation metrics such as the $F$-measure and Jaccard index. While the Empirical Utility Maximization (EUM) fra…
- Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned BiasesDongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee · arXiv · May 26, 2026
Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignm…
- Probabilistic Smoothing with Ratio-Monotone Transforms for Global OptimizationKukyoung Jang, Taehyun Cho, Junrui Zhang, Ping Xu et al. · arXiv · May 26, 2026
Probabilistic smoothing is a standard tool for global optimization, but existing methods rely on Gaussian kernels and specific transforms, often resulting in strong hyperparameter sensitivity and limited robustness. We propose a general smo…
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AIShangding Gu · arXiv · May 25, 2026
This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the…
- Forgetting in Language Models: Capacity, Optimization, and Self-Generated ReplayMartin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara et al. · arXiv · May 25, 2026
Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast, languag…
- Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model UncertaintyJinwoo Go, Xiaoning Qian, Byung-Jun Yoon · arXiv · May 25, 2026
Bayesian optimal experimental design (BOED) selects experiments to maximize information gain about model parameters. However, in decision-critical settings, reducing parameter uncertainty does not necessarily improve downstream decisions, a…
- Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement LearningZhaoyu Zhu, Rui Gao, Shuang Li · arXiv · May 25, 2026
Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distributions. For the entropy-regularized RL objective, WPG evolves each state-conditi…
- When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM JudgesParth Darshan, Abhishek Divekar · arXiv · May 25, 2026
Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural…
- Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting ModelsKrishnakumar Balasubramanian · arXiv · May 21, 2026
We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of th…
- Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State ReformulationSamson Gourevitch, Yazid Janati, Dario Shariatian, Umut Simsekli et al. · arXiv · May 21, 2026
Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform …
- Cyber-Physical Anomaly Detection in IoT-Enabled Smart Grids Using Machine Learning and Metaheuristic Feature OptimizationAdis Alihodžić, Eva Tuba, Milan Tuba · arXiv · May 21, 2026
Modern smart grids rely on dense measurement infrastructures, communication links, and intelligent field devices. Although this improves supervision and control, it also increases vulnerability to cyber-physical disruptions. Operators must …
- PIXLRelight: Controllable Relighting via Intrinsic ConditioningMiguel Farinha, Ronald Clark · arXiv · May 18, 2026
We present PIXLRelight, a feed-forward approach for physically controllable single-image relighting. Existing methods either provide limited lighting control (e.g. through text or environment maps), accumulate errors when chaining inverse a…
- General Preference Reinforcement LearningMuhammad Umer, Muhammad Ahmed Mohsin, Ahsan Bilal, Arslan Chaudhry et al. · arXiv · May 18, 2026
Post-training has split large language model (LLM) alignment into two largely disconnected tracks. Online reinforcement learning (RL) with verifiable rewards drives emergent reasoning on math and code but depends on a programmatic verifier …
- Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGradZijian Liu · arXiv · May 18, 2026
Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalizatio…
- COOPO: Cyclic Offline-Online Policy Optimization AlgorithmQisai Liu, Zhanhong Jiang, Joshua Russell Waite, Aditya Balu et al. · arXiv · May 18, 2026
Offline reinforcement learning struggles with distributional shift and constrained performance due to static dataset limitations, while online RL demands prohibitive environment interactions. The recent advent of hybrid offline-to-online me…