Object Detection
Short Definition
Full Definition
Object detection is one of the most practically important tasks in computer vision, enabling machines to not just recognize what is in an image but also precisely locate each object. Unlike image classification which assigns a single label to an entire image, object detection must identify multiple objects of potentially different classes and draw tight bounding boxes around each one. The field has undergone dramatic progress since the deep learning revolution. The R-CNN family (R-CNN, Fast R-CNN, Faster R-CNN) introduced the two-stage approach: first proposing candidate regions, then classifying each region. YOLO (You Only Look Once) pioneered the one-stage approach, treating detection as a single regression problem and achieving real-time performance. SSD (Single Shot Detector) offered another efficient one-stage alternative. Modern detectors like YOLOv8 and DETR (Detection Transformer) achieve impressive accuracy at real-time speeds. Object detection powers autonomous vehicles (detecting pedestrians, vehicles, traffic signs), surveillance systems, medical imaging (detecting tumors), retail analytics (tracking products on shelves), industrial automation (quality inspection), and augmented reality (placing virtual objects relative to real ones). The field continues to advance with 3D object detection, video object detection, and open-vocabulary detection using models like OWL-ViT that can detect any object described in text.
Technical Explanation
Two-stage detectors: Faster R-CNN uses a Region Proposal Network (RPN) to generate candidate boxes, then classifies and refines each. One-stage detectors: YOLO divides the image into a grid and predicts B bounding boxes per cell, each with (x, y, w, h, confidence, class_probabilities). Non-Maximum Suppression (NMS) removes duplicate detections by keeping the highest confidence box and removing overlapping boxes above an IoU threshold. Evaluation uses mean Average Precision (mAP): for each class, compute precision-recall curve and average precision, then average across classes. IoU (Intersection over Union) measures bounding box overlap: IoU = area(intersection) / area(union). Anchor-free detectors like FCOS predict offsets from each pixel to box boundaries, eliminating predefined anchor boxes.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level