Generation

Latest Diffusion Models Research Papers

The newest Diffusion Models papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Diffusion Models so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Diffusion Models papers in your inbox — free →

Recent papers

Positivity and long-term behaviour of a diffusion model with measure-valued nonlocal reaction term : applications in bioscience and engineering
Xiao Yang, Qiyao Peng, SC Hille · Lancaster EPrints (Lancaste... · Sep 5, 2026
The behaviour is investigated of solutions to a diffusion equation on the real line with nonlocal and singular reaction term, i.e., given by a Dirac source or sink at the origin. It gives a simplified representation of for example a control…
Preserved diffusion MRI measures despite subtle behavioral and hippocampal CA3 alterations in a preclinical model of cerebral small vessel disease
Che Mohd Nasril Che Mohd Nassir, Hafizah Abdul Hamid, Nurfatihah Pahdin, Izzatul Farhanis Abdul Halim et al. · Universiti Putra Malaysia I... · Sep 1, 2026
First-principles study of helium migration in stishovite in the Earth's mantle
Yu Huang, Hang Ren · DOAJ (DOAJ: Directory of Op... · Sep 1, 2026
The migration mechanisms of helium (He) in anhydrous and hydrous stishovite under Earth's mantle conditions are studied using density functional theory (DFT) and climbing image nudged elastic band (CI-NEB) transition state calculations. In …
Streaming Multi-Agent Autoregressive Diffusion Model with World State Registers
Sicheng Mo, Yuheng Li, Ziyang Leng, Krishna Kumar Singh et al. · arXiv · Jul 23, 2026
Multi-agent interactive world models should not only generate consistent observations, but also maintain world states that persist across agents and evolve across views. Existing autoregressive video diffusion pipelines carry forward observ…
Inference-Time Scaling of Diffusion Models via Progressive Seed Pruning
Rogerio Guimaraes, Pietro Perona · arXiv · Jul 23, 2026
Diffusion and flow-matching models dominate conditional image generation, yet inference-time scaling for these models is far less developed than for autoregressive language models. Because final quality is highly sensitive to the initial no…
SANA-Video 2.0: Hybrid Linear Attention with Attention Residuals for Efficient Video Generation
Junsong Chen, Jincheng Yu, Yitong Li, Shuchen Xue et al. · arXiv · Jul 23, 2026
We introduce SANA-Video 2.0, a hybrid video diffusion transformer instantiated at 5B and 14B scales under a unified architecture. Designed to generate high-quality video up to 720p on a single GPU, SANA-Video 2.0 matches full-softmax video …
Towards Robust Iris Recognition Through Occlusion Identification and Conditional Diffusion-Based Reconstruction
Kamrul Hasan, Mylene C. Q. Farias, Oleg V. Komogortsev · arXiv · Jul 23, 2026
Iris recognition is a reliable biometric approach that identifies individuals using the distinctive and stable texture of the iris. However, recognition performance can degrade when discriminative iris texture is partially occluded by eyeli…
ElasticTTT: Prior-Preserving Test-Time Tuning for Video Editing
Yueyi Liu, Chi Zhang, Sen Cui, Miao Liu · arXiv · Jul 23, 2026
Test-Time Tuning (TTT) on pretrained diffusion models has emerged as a powerful paradigm for video editing. However, there exists a foundational mismatch between the distribution-mapping nature of generative models and the single-point opti…
Texture++: Elevating 3D Asset Texture Resolution with a Region-Aware Diffusion Model
Shuaiwei Wang, Shi Li, Jieting Xu, Yuchi Huo et al. · arXiv · Jul 23, 2026
Numerous 3D assets are discarded due to low texture resolution, while current super-resolution models ignore texture maps and focus on natural images. An efficient and generalizable texture super-resolution model can revitalize a large corp…
Self Gradient Forcing: Native Long Video Extrapolation
Junhao Zhuang, Shiyi Zhang, Yuxuan Bian, Yaowei Li et al. · arXiv · Jul 22, 2026
Recent autoregressive video diffusion methods are increasingly built upon Self Forcing, where the student is trained on histories produced by its own rollout rather than ground-truth video contexts. This reduces exposure bias, but the histo…
Evolving Cache Schedules for Fast Diffusion Policy Inference
Siying Wang, Kangye Ji, Di Wang, Fei Cheng · arXiv · Jul 22, 2026
Diffusion policies achieve strong visuomotor control by iteratively denoising action chunks, but repeated denoising makes real-time deployment computationally demanding. Cache-based methods reduce inference cost by reusing intermediate acti…
HeadCast: Casting Attention Heads for Efficient Autoregressive Video Generation
Jinliang Shen, Lianghao Su, Zheming Li, Kang He et al. · arXiv · Jul 22, 2026
Autoregressive (AR) video diffusion models have become a promising paradigm for long and streaming video synthesis, but the continuously growing Key-Value (KV) cache makes attention the dominant inference cost, especially at high resolution…
Importance-Aware OBS Pruning for Diffusion Models
Ba-Thinh Lam, Srijan Das, Hieu Le · arXiv · Jul 22, 2026
We propose importance-aware pruning for diffusion models, a training-free framework that prioritizes preserving parameters critical to semantically salient image regions. To do so, we incorporate spatial importance maps -- derived from cond…
SIINR: Structurally Informed Implicit Neural Representations for super-resolution with uncertainty quantification of clinical quality diffusion MRI datasets
Tom Hendriks, William Consagra, Anna Vilanova, Yogesh Rathi et al. · arXiv · Jul 22, 2026
Diffusion Magnetic Resonance Imaging (dMRI) is a powerful tool for probing brain microstructure, but clinical acquisitions are often limited by low out-of-plane resolution, resulting in degraded structural information and reduced utility fo…
WearWow: Native 2K Multi-Garment Virtual Try-On via Adaptive Token Packing and Preference Alignment
Xujie Zhang, Runyan Du, Song Chang, Jiang Li et al. · arXiv · Jul 22, 2026
Synthesizing native 2K multi-garment virtual try-on is a formidable frontier in digital fashion, critically bottlenecked by two fundamental limitations: the O(N^2) memory explosion induced by 2k conditions, and the spectral bias of diffusio…
Appearance Pointers -- Multimodal Region Control of Diffusion Transformers
Rahul Sajnani, Yulia Gryaditskaya, Radomír Měch, Srinath Sridhar et al. · arXiv · Jul 21, 2026
Controllable image generation remains challenging for creative professionals, who often require precise regional control over materials, object identities, and spatial arrangements that cannot be reliably achieved through text prompting alo…
ROMS-IMLE: A Minimalist Approach to Competitive Single-Step Generative Modelling
Chirag Vashist, Ke Li · arXiv · Jul 21, 2026
Generative models have undergone many generations of evolution, from VAEs/GANs to diffusion/flow matching. Along the way, the underlying techniques have become more complicated and various beliefs about what drives strong empirical performa…
Text Template Tokens Are Implicit Semantic Registers in Diffusion Transformers
Maohua Li, Qirui Li, Yanke Zhou, Yiduo Li et al. · arXiv · Jul 21, 2026
Text-to-image diffusion transformers (DiTs) jointly process text and image tokens, yet their internal computation during denoising remains poorly understood. We introduce a causal interpretability framework for modern large-scale DiTs that …
Hierarchical Denoising For Multi-Step Visual Reasoning
Zezhong Qian, Xiaowei Chi, Chak-Wing Mak, Tianze Zhou et al. · arXiv · Jul 16, 2026
Video models are evolving into vision foundation models, yet they still lack human-like multi-step reasoning. Streaming autoregressive diffusion models are efficient but limited in reasoning, while bidirectional diffusion enables global rev…
MeanFlowNFT: Bringing Forward-Process RL to Average-Velocity Generators
Yushi Huang, Xiangxin Zhou, Jun Zhang, Liefeng Bo et al. · arXiv · Jul 16, 2026
MeanFlow generators achieve fast few-step sampling by predicting average velocities over time intervals, making them attractive for efficient generation. Reinforcement learning (RL) has become a powerful way to align diffusion and flow mode…
DAPGNet: Dynamic Adaptive Physics-Guided Graph Diffusion Network for Hyperspectral Image Classification
Pengkun Wang, Weijia Cao, Ning Wang, Xiaofei Yang · arXiv · Jul 16, 2026
Hyperspectral image (HSI) classification requires reliable pixel-relation modeling under spectral variability, mixed pixels, and heterogeneous boundaries. Existing graph-based HSI classifiers usually construct graph topology from spatial pr…
DriftWorld: Fast World Modeling through Drifting
Susie Lu, Haonan Chen, Weirui Ye, Yilun Du · arXiv · Jul 16, 2026
Predictive world models enable robots to plan by imagining the outcomes of their actions, but their value for control hinges on generating many rollouts quickly. This creates a bottleneck for diffusion-based world models: multistep sampling…
Weakly-Supervised RGB-D Salient Object Detection via SAM-driven Pseudo Annotation and State Space Interaction-based Diffusion
Wenqi Si, Gongyang Li, Shixiang Shi, Weisi Lin · arXiv · Jul 16, 2026
Weakly-supervised RGB-D Salient Object Detection (SOD) is explored to reduce the heavy burden of pixel-level annotations. But scribble annotations lack the structure and details of objects, resulting in inaccurate saliency maps. In this pap…
From Draft to Draft-Free: One-Step Video Object Removal via Privileged Distillation and Fast Planting
Zizhao Chen, Ping Wei, Guang Dai, Jingdong Wang et al. · arXiv · Jul 16, 2026
Video object removal is a fundamental yet challenging task in video editing. Despite recent progress, existing methods typically fall into two categories. Traditional approaches based on optical flow or attention mechanisms often introduce …
The Seriality Gap in Video Diffusion Models
Jorge Diaz Chao, Konpat Preechakul, Yuxi Liu, Yutong Bai · arXiv · Jul 14, 2026
When one ball strikes another, then another, video models should predict the consequences of each bounce. In controlled experiments on multi-ball hard-sphere dynamics, we find that the performance of standard bidirectional video diffusion d…
Exact and Calibrated Diffusion Reconstruction for Digital Breast Tomosynthesis
Imade Bouftini · arXiv · Jul 14, 2026
Limited-angle digital breast tomosynthesis (DBT) reconstructs a volume from a few low-dose projections over a narrow arc. At a representative nine-view, $25^{\circ}$ protocol more than 98% of image space is unmeasured, so a learned prior mu…
Cycle-World: Mitigating Error Accumulation in Long-term Video World Models via Reverse-Prediction Cycle Consistency
Zihan Su, Teng Hu, Jiangning Zhang, Ruiyan Wang et al. · arXiv · Jul 13, 2026
Autoregressive diffusion models have enabled high-quality video generation, yet their sequential nature inherently suffers from error accumulation. In long-horizon video synthesis, minor prediction deviations compound over time, inevitably …
Feature-Space Guided Diffusion for Realistic Ultrasound Image Synthesis
Marina Domínguez, Nélida Mirabet-Herranz, Valery Naranjo · arXiv · Jul 13, 2026
Conditional diffusion models can generate anatomically plausible medical ultrasound (US) images, but anatomical plausibility alone does not ensure realistic B-mode appearance. Most US pipelines adapt standard generative architectures and co…
Wan-Dancer: A Hierarchical Framework for Minute-scale Coherent Music-to-Dance Generation
Mingyang Huang, Peng Zhang, Li Hu, Guangyuan Wang et al. · arXiv · Jul 10, 2026
Generating long-duration, high-definition, and rhythmically synchronized dance videos directly from music remains a significant challenge, primarily due to the temporal constraints of current diffusion models, which typically fail beyond 20…
LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models
Cheng-De Fan, Chun-Wei Tuan Mu, Chen-Wei Chang, Chin-Yang Lin et al. · arXiv · Jul 9, 2026
Recovering high-quality video from sparse event streams is a challenging task. Regression methods often blur textures, while existing generative models struggle with long-term stability. We propose LongE2V, a novel approach that leverages p…

Track Diffusion Models on Distill AI — start free →

Latest Diffusion Models Research Papers

Recent papers

Related topics