Latest Object Detection Research Papers
The newest Object Detection papers from across the field — arXiv, NeurIPS, CVPR, Nature, and more — refreshed daily and ranked by relevance. Distill AI tracks Object Detection so you don’t have to: get the standout work delivered to your inbox every morning, with 2-sentence summaries and the option to chat with any paper.
Get the latest Object Detection papers in your inbox — free →Recent papers
- Democratising Camera Trap AI: An Open-Source Model for Detecting UK MammalsPaul Fergus, Philip Stephens, Russell A. Hill, Lee Oliver et al. · arXiv · Jun 9, 2026
Camera traps have become a cornerstone of biodiversity monitoring, but the artificial intelligence that turns vast quantities of images into usable ecological data is often locked behind commercial platforms or trained on fauna that does no…
- Using the YOLOv12 Model for Verifying the Correct Color Sequence of Wires in Network Cables (Patch Cords) on the Production LineAmin Doroodchi, Danial Soleimany · arXiv · Jun 9, 2026
In the production process of network cables, ensuring the correct color sequence of wire pairs inside the standard connector plays a critical role in the final performance of the cable, as any misplacement or color-ordering error can lead t…
- Analyzing Training-Free Corruption Detection for Object Detection DatasetsChristian Sieberichs, Simon Geerkens, Thomas Waschulzik, Viswanathan Ramesh et al. · arXiv · Jun 9, 2026
Annotation errors are widespread in computer vision datasets and can significantly degrade the performance of systems trained on them, particularly in complex tasks such as object detection. Several approaches exist to identify annotation e…
- ATN3D: Density-Aware LiDAR-Radar Early 3D Object Detection Under Extreme SparsityDebojyoti Biswas, Xianbiao Hu · arXiv · Jun 8, 2026
3D object detection is the backbone of perception for automated vehicles (AV) and broader intelligent transportation systems applications. Long-range detection is challenging because sensing evidence is sparse; yet this ``long-range'' scena…
- Adversarial Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic SegmentationLucas Görnhardt, Timo Bartels, Niklas Schwarz, Tim Fingscheidt · arXiv · Jun 8, 2026
Conventional one-hot encodings often yield poorly calibrated models, being overconfident under attack, and letting entropy-based detection algorithms fail. Previous image classification works have demonstrated that Hadamard-coded output rep…
- ContextShift: A Controlled Benchmark for Context Dependence in Object DetectionDan Zlotnikov, Alex Lazarovich, Ohad Ben-Shahar · arXiv · Jun 8, 2026
Modern object detectors achieve strong performance on standard benchmarks, yet their robustness to contextual variation remains insufficiently understood. Prior evaluations largely rely on aggregate metrics such as AP on uncontrolled distri…
- RT-SDGOD: Real-Time Single-Domain Generalized Object DetectionYupeng Zhang, Fangzhuo Gao, Ruize Han, Wei Feng et al. · arXiv · Jun 8, 2026
In real-world deployment under strict real-time constraints, weather and imaging variations induce significant distribution shifts, severely degrading detectors. Single-Domain Generalized Object Detection aims to mitigate this issue, yet ex…
- Taming Perception Jitter: Uncertainty-Aware LiDAR Object Detection for Reliable Motion ClassificationCornelius Schröder, Žygimantas Marcinkus, Markus Lienkamp · arXiv · Jun 8, 2026
Reliable motion classification is critical for autonomous driving, as false dynamic predictions of static objects can cascade into unnecessary planner interventions. Unstable bounding box predictions can lead to spurious velocity estimates …
- Proposal Refinement for Few-Shot Object DetectionYuan Zeng, Bin Song, Jie Guo, Yuwen Chen · arXiv · Jun 8, 2026
Few-shot object detection has gained widely attention in recent years. Some excellent algorithms have been proposed to handle this task. However, most of these algorithms rely on the performance of few-shot classification. Unlike previous a…
- Differences in Detection: Explainability Where it MattersJohannes Theodoridis, Johannes Maucher, Andreas Schilling · arXiv · Jun 5, 2026
We propose Differences in Detection (DnD), an intuitive method to compare two object detection models. Based on the same matching algorithm, it complements the standard metrics of mean Average Precision ($mAP$) and TIDE error analysis with …
- CL-CLIP: CLIP-Based Continual Learning Framework with Cost-Volume Category Decoupling for Object DetectionZihan Liu, Yuguang Yang, Shengjie Su, Jianing Pang et al. · arXiv · Jun 5, 2026
Continual Object Detection (COD) requires a detector to acquire new categories over time while preserving previously learned ones. This goal is closely related to open-vocabulary detection, since both settings require reasoning over categor…
- Unveiling the Unknown: Open Vocabulary Object Detection with Scene GraphsYi Chen, Yinghao Lu, Zhehao Li, Chenchen Yan et al. · arXiv · Jun 4, 2026
Open-vocabulary object detection seeks to identify novel object categories that were not part of the training data. Many knowledge distillation-based approaches have shown promising performance by transferring knowledge from pre-trained vis…
- Next-Generation Parallel Decoder for LPDR: Architectural Optimization and Class-Balanced GAN-AugmentationShawaiz Obaid, Nida Chandio, Neha Jamil, Muhammad Khuram Shahzad · arXiv · Jun 4, 2026
Real-Time License Plate Detection and Recognition (LPDR) forms the backbone of modern smart cities. Although the YOLOV5-PDLPR model substantially improved system efficiency through a parallel decoder approach, its performance is still affec…
- Cycle Consistency in Video Object-Centric LearningRongzhen Zhao, Zhiyuan Li, Ruonan Wei, Juho Kannala et al. · arXiv · May 28, 2026
Self-supervised video Object-Centric Learning (OCL) aims to discover distinct objects and associate them across time, whereas self-supervised Multi-Object Tracking (MOT) focuses on associating pre-defined object detections or segmentations.…
- LV-OSD: Language-Vision-Complementary Open-Set Object DetectionYupeng Zhang, Ruize Han, Wei Feng, Song Wang et al. · arXiv · May 27, 2026
Object detection is an important task in computer vision, which aims to detect the objects of interest. through the given category list or query images. In this work, we propose a new problem of language-visual-complementary open-set object…
- Pixel-Level Pavement Distress Assessment Using Instance SegmentationLogan Dewick, Bibesh Pyakurel, Kong Pheng Yang, Nazim Choudhury et al. · arXiv · May 25, 2026
Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for …
- SAM3-Assisted Training of Lightweight YOLO Models for Precision Pig FarmingMarcos Vinicius Mendes Faria, Thiago Borges Pereira, Isabella C. F. S. Condotta, Thiago Meireles Paixão et al. · arXiv · May 25, 2026
Deep learning-based object detection has revolutionized Precision Livestock Farming (PLF), yet a critical barrier remains: high-performance Foundation Models (such as SAM 3) are too computationally intensive for edge deployment, while light…
- Decoupling Ego-Motion from Target Dynamics via Dual-Interval Motion Cues for UAV DetectionLiuyang Wang, Feitian Zhang · arXiv · May 21, 2026
Object detection from Unmanned Aerial Vehicles (UAVs) is challenged by severe ego-motion, camera jitter, and large scale variations. While modern detectors perform well on static images, their direct application to UAV video often fails, pa…
- Detection of Virus and Small Cell Patches in Foci Images Using Switchable Convolution and Feature Pyramid NetworksAmrita Singh, Snehasis Mukherjee · arXiv · May 21, 2026
Accurate detection and counting of virus patches in focus-forming unit (FFU) images, also known as foci images, are important for quantifying viral infection and analyzing cellular structures. This task is challenging because biomedical tar…
- EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric VideosRuiping Liu, Junwei Zheng, Yufan Chen, Di Wen et al. · arXiv · May 18, 2026
Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchm…
- CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and BenchmarkWei Wang, Yuqian Yuan, Tianwei Lin, Wenqiao Zhang et al. · arXiv · May 18, 2026
Spatial intelligence requires multimodal large language models (MLLMs) to move beyond single-view perception and reason consistently about objects, visibility, geometry, and interactions across multiple viewpoints. However, progress in cros…
- Characterizing the visual representation of objects from the child's viewJane Yang, Tarun Sepuri, Alvin Wei Ming Tan, Khai Loong Aw et al. · arXiv · May 14, 2026
Children acquire object category representations from their everyday experiences in the first few years of life. What do the inputs to this learning process look like? We analyzed first-person videos of young children's visual experience at…
- 6D Pose Estimation via Keypoint Heatmap Regression with RGB-D Residual Neural NetworksIsmail Aljosevic, Amir Masoud Almasi, Ana Parovic, Ashkan Shafiei · arXiv · May 8, 2026
In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoint…
- AMIEOD: Adaptive Multi-Experts Image Enhancement for Object Detection in Low-Illumination ScenesXiaochen Huang, Honggang Chen, Weicheng Zhang, Xiaobo Dai et al. · arXiv · May 7, 2026
In multimedia application scenarios, images captured under low-illumination conditions often lead to lower accuracy in visual perception tasks compared to those taken in well-lit environments. To tackle this challenge, we propose AMIEOD, an…
- A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive WarpingMaxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley et al. · arXiv · May 6, 2026
Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no exis…
- Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T PatternXiaopei Zhu, Guanning Zeng, Zhanhao Hu, Jun Zhu et al. · arXiv · May 6, 2026
Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, p…
- Reference-based Category Discovery: Unsupervised Object Detection with Category AwarenessYichen Li, Qiankun Liu, Ying Fu · arXiv · May 6, 2026
Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels,…
- Reward-Guided Semantic Evolution for Test-time Adaptive Object DetectionLihua Zhou, Mao Ye, Xiatian Zhu, Nianxin Li et al. · arXiv · May 6, 2026
Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted …
- StateVLM: A State-Aware Vision-Language Model for Robotic Affordance ReasoningXiaowen Sun, Matthias Kerzel, Mengdi Li, Xufeng Zhao et al. · arXiv · May 5, 2026
Vision-language models (VLMs) have shown remarkable performance in various robotic tasks, as they can perceive visual information and understand natural language instructions. However, when applied to robotics, VLMs remain subject to a fund…
- The Detector Teaches Itself: Lightweight Self-Supervised Adaptation for Open-Vocabulary Object DetectionYazhe Wan, Changjae Oh · arXiv · May 5, 2026
Open-vocabulary object detection aims to recognize objects from an open set of categories, which leverages vision-language models (VLMs) pre-trained on large-scale image-text data. The cooperative paradigm combines an object detector with a…