Generation

Latest Image Generation Research Papers

The newest Image Generation papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Image Generation so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.

Get the latest Image Generation papers in your inbox — free →

Recent papers

From the old to the new generation of a product : unlearn, improve, and prosper
Nikolaos Kyriakopoulos, Paraskevas C. Argouslidis, Dionysis Skarmeas, Spiros Gounaris · Strathprints: The Universit... · Dec 14, 2026
Purpose Drawing on the theories of planned obsolescence and dynamic capabilities, this study aims to jointly address marketing and organizational aspects of the transition from the existing to the new generation of a product (i.e. a product…
Hubungan Intensitas Penggunaan TikTok dan Harga Diri dengan Social Comparison sebagai Mediator pada Perempuan Emerging Adult
Zalfa Ayu Shabira, Endah Mastuti · Universitas Airlangga Repos... · Dec 9, 2026
Penelitian ini bertujuan untuk mengetahui hubungan antara intensitas penggunaan TikTok dan harga diri dengan social comparison sebagai mediator dari perempuan emerging adult. Intensitas penggunaan TikTok didefinisikan sebagai tingkat keterl…
Catalyst poisoning influences from various functional groups of energy carriers towards electrochemical oxidation reactions on non-noble high-entropy alloy anodes in acidic media
Tahawy Rafat, Muflihah Salma Aridha, Hara Kosuke, Ohto Tatsuhiko et al. · Institutional Repositories ... · Dec 1, 2026
Electrolytic synthesis of energy carriers using renewable energy and fuel cells that use energy carriers for regeneration are important technologies for achieving our carbon-neutral society. However, electrochemical reactions in electrolyte…
A Comprehensive Study on the Synthesis, Nonlinear Optical Properties, and Biological Applications of Substituted Piperazine Derivatives
Sahaya Infant Lasalle B, Senthil Pandian Muthu, P. Ramasamy · DOAJ (DOAJ: Directory of Op... · Dec 1, 2026
Piperazine is an important organic heterocyclic compound featuring a six-membered ring with two nitrogen atoms positioned opposite each other and four carbon atoms. This moiety is present in numerous widely recognized drugs with diverse the…
Inference-Time Scaling of Diffusion Models via Progressive Seed Pruning
Rogerio Guimaraes, Pietro Perona · arXiv · Jul 23, 2026
Diffusion and flow-matching models dominate conditional image generation, yet inference-time scaling for these models is far less developed than for autoregressive language models. Because final quality is highly sensitive to the initial no…
Appearance Pointers -- Multimodal Region Control of Diffusion Transformers
Rahul Sajnani, Yulia Gryaditskaya, Radomír Měch, Srinath Sridhar et al. · arXiv · Jul 21, 2026
Controllable image generation remains challenging for creative professionals, who often require precise regional control over materials, object identities, and spatial arrangements that cannot be reliably achieved through text prompting alo…
ExpertVerse: A General-Purpose Benchmark for Expert-Level Reasoning in Knowledge-Intensive Visual Synthesis
Yuan Wang, Yongchao Du, Mengting Chen, Jinsong Lan et al. · arXiv · Jul 21, 2026
Recent advances in multimodal generative models have enabled instruction-based image generation to move beyond semantic manipulation to knowledge-driven visual reasoning. However, these methods focus on explicit commonsense reasoning, shall…
ROMS-IMLE: A Minimalist Approach to Competitive Single-Step Generative Modelling
Chirag Vashist, Ke Li · arXiv · Jul 21, 2026
Generative models have undergone many generations of evolution, from VAEs/GANs to diffusion/flow matching. Along the way, the underlying techniques have become more complicated and various beliefs about what drives strong empirical performa…
Read It Back: Pretrained MLLMs Are Zero-Shot Reward Models for Text-to-Image Generation
Runhui Huang, Qihui Zhang, Zhe Liu, Yu Gao et al. · arXiv · Jul 13, 2026
In this paper, we propose SpectraReward, a training-free reward function that turns pretrained MLLMs into off-the-shelf reward models for image-generation reinforcement learning. Instead of asking the MLLM to judge a generated image or answ…
Feature-Space Guided Diffusion for Realistic Ultrasound Image Synthesis
Marina Domínguez, Nélida Mirabet-Herranz, Valery Naranjo · arXiv · Jul 13, 2026
Conditional diffusion models can generate anatomically plausible medical ultrasound (US) images, but anatomical plausibility alone does not ensure realistic B-mode appearance. Most US pipelines adapt standard generative architectures and co…
VCDP: Variation-Conditioned Distributional Proxy Learning for Semi-Supervised Medical Image Segmentation
Zimu Zhang, Yiheng Zhong, Zhuoru Zhang, Yingzhen Hu et al. · arXiv · Jul 8, 2026
Semi-supervised 3D medical image segmentation reduces the need for dense voxel-level annotations by exploiting unlabeled volumes. Although existing methods such as consistency regularization, pseudo-labeling, and co-training improve predict…
Vision as Unified Multimodal Generation
Xiaoyang Han, Jianhua Li, Kewang Deng, Zukai Chen et al. · arXiv · Jul 7, 2026
We formulate computer vision as unified multimodal generation, where heterogeneous visual tasks are expressed in the native text and image generation spaces of a unified multimodal model, without task-specific architectures. Under this form…
PIPBench: A Profile-Inclusive Framework for Personalized Image Generation Evaluation
Yuhang Wu, Shuxiang Zhang, Wee Hian Ching, Chi Zhang et al. · arXiv · Jul 7, 2026
Recent text-to-image models such as DALLE-3 excel at following diverse prompts yet remain blind to individual aesthetic preferences. We study personalized image generation, where models must align outputs with a user's implicit visual prefe…
EquiSteer: Cross-Attention Steering Towards a Fairer Text-Guided Image Generation
Tatiana Gaintseva, Akshit Achara, Gregory Slabaugh, Jiankang Deng et al. · arXiv · Jul 1, 2026
Text-to-image diffusion models power everyday creative tasks, but they still reproduce the demographic biases in their training data. On common prompts such as ``a photo of a nurse,'' ``a photo of a CEO'', they skew their outputs toward one…
$μ$Flow: Leveraging Average Images for Improving Generalisation of Deepfake Faces Detectors
Orazio Pontorno, Mattia Litrico, Luca Guarnera, Mario Valerio Giuffrida et al. · arXiv · Jun 29, 2026
Current generative models, including GANs and diffusion models, have reached an outstanding level of photorealism, posing significant risks to privacy and security. To ensure real-world applicability, deepfake detectors must generalise effe…
3D Scene-Adaptive Trajectory-Controllable Human Image Animation with Camera Movement
Deyin Liu, Jicheng Xu, Lin Yuanbo Wu, Xiaowei Zhao et al. · arXiv · Jun 29, 2026
Human image animation, which aims to generate a video of a reference subject following a provided action sequence, has received increasing research interest. With the development of diffusion-based/flow-based video foundation models, existi…
DanceOPD: On-Policy Generative Field Distillation
Wei Zhou, Xiongwei Zhu, Zelin Xu, Bo Dong et al. · arXiv · Jun 25, 2026
Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, e…
Ask, Solve, Generate: Self-Evolving Unified Multimodal Understanding and Generation via Self-Consistency Rewards
Ritesh Thawkar, Shravan Venkatraman, Omkar Thawakar, Abdelrahman Shaker et al. · arXiv · Jun 25, 2026
Most unified large multimodal models (LMMs) that support both visual understanding and image generation still rely on curated post-training supervision, such as human annotations, preference labels, or external reward models. We ask whether…
Sculpting NeRF Geometry: Human-Preference Fine-Tuning of a 3D-Aware Face GAN
Archer Moore, Mingming Gong, Liam Hodgkinson · arXiv · Jun 25, 2026
Reinforcement learning from human feedback (RLHF) for 3D generation is now established across a number of works, but most existing pipelines optimise explicit surface representations, often by converting radiance fields into meshes and trai…
MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation
Yang Chen, Xiaowei Xu, Shuai Wang, Xinwen Zhang et al. · arXiv · Jun 24, 2026
Normalizing Flows (NFs) are powerful generative models capable of exact density estimation and sampling. However, their strict invertibility often forces the model to exhaust its capacity on low-level pixel details, hindering the capture of…
FunPiQ: A New Benchmark for Pixel-Level Quality Assessment in Fundus Images
Pengwei Wang, José Morano, Virginia Mares, Hrvoje Bogunović · arXiv · Jun 24, 2026
Color fundus photography (CFP) is the most common ophthalmic imaging modality for large-scale screening. However, it is highly susceptible to degradations, making robust fundus image quality assessment (FIQA) crucial. The criteria for what …
DiffusionBench: On Holistic Evaluation of Diffusion Transformers
Xingjian Leng, Jaskirat Singh, Zhanhao Liang, Ethan Smith et al. · arXiv · Jun 23, 2026
Diffusion transformer (DiT) research on image generation has converged to a single evaluation setup: class-conditional generation on ImageNet. While methods improve the FID and related metrics, it is increasingly unclear whether they reflec…
IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation
Zixuan Li, Haokun Lin, Yicheng Xiao, Zhiwei Li et al. · arXiv · Jun 23, 2026
Unified multi-modal large language models (MLLMs) have achieved strong text-to-image generation quality, but still struggle with structure-aware prompt following, where object counts, spatial relations, attribute bindings, and coarse layout…
High-Fidelity Synthetic Transmission Electron Microscopy Image Generation Using Diffusion Probabilistic Models for Data-Limited Semiconductor Metrology
Johannes Boehm, Bappaditya Dey · arXiv · Jun 23, 2026
Advanced semiconductor nodes drastically increased demand for Transmission Electron Microscopy (TEM), yet destructive sample preparation, slow imaging and high costs severely limit the availability of diverse datasets needed for downstream …
Compact Object-Level Representations with Open-Vocabulary Understanding for Indoor Visual Relocalization
Zhaopeng Cui, Jiarui Hu, Jingbo Liu, Boming Zhao et al. · arXiv · Jun 23, 2026
Indoor visual relocalization plays a critical role in emerging spatial and embodied AI applications. However, prior research was predominantly devoted to low-level vision schemes, struggling to perceive scene semantics and compositions, whi…
Keep The Essentials: Efficient Reference Conditioned Generation via Token Dropping
Rishubh Parihar, Ayush Raina, R. Venkatesh Babu, Or Patashnik · arXiv · Jun 22, 2026
Reference-based diffusion models enable highly controllable image generation by leveraging elements from input images to guide prompt-driven synthesis. However, these models are computationally expensive in runtime, and their cost scales se…
Semantic Browsing: Controllable Diversity for Image Generation
Sara Dorfman, Maya Vishnevsky, Omer Dahary, Or Patashnik et al. · arXiv · Jun 22, 2026
Modern text-to-image models excel in visual fidelity and prompt adherence. However, this strict adherence comes at the cost of diversity: generated samples tend to collapse into a single visual interpretation. Existing methods to improve di…
Hedgementation = Hedgerow Segmentation: A Remote Sensing Benchmark
Nathan Senyard, Salem Hamdani, Astrid Zhang, Derek Wang et al. · arXiv · Jun 22, 2026
We propose Hedgementation: a new benchmark to evaluate machine learning models for hedgerow mapping from remote sensing data at country scale and 10m$^2$ spatial resolution. We combine and harmonize multiple remote sensing data products and…
SSD: Spatially Speculative Decoding Accelerates Autoregressive Image Generation
Shilong Xiang, Zirui Zhang, Lijun Yu, Chengzhi Mao · arXiv · Jun 18, 2026
Autoregressive models excel in visual generation by treating images as 1D sequences of discrete tokens, mirroring language modeling. However, this flattening discards the intrinsic 2D spatial locality of visual signals, creating severe comp…
The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation
Nicolas Dufour, Alexei A. Efros, Patrick Pérez · arXiv · Jun 18, 2026
The Frechet Inception Distance (FID) is the de facto arbiter of image generation, yet most papers report just a single number from a single trained model using a single sampling seed. How reproducible is that number if we retrain the model,…

Track Image Generation on Distill AI — start free →

Latest Image Generation Research Papers

Recent papers

Related topics