How do you feel about this article? Help us to provide better content for you.
Thank you! Your feedback has been received.
There was a problem submitting your feedback, please try again later.
What do you think of this article?
On Friday, September 28th, the YOLO Vision event was held, where the company Ultralytics announced the launch of their new product, YOLOv11, available on September 30th, 2024. Ultralytics is known for creating YOLOv5 and YOLOv8, which have made significant contributions to advancing research in object detection.
What is Object Detection?
Object detection is a computer vision task that identifies and locates objects in images or videos. It is widely used in applications such as surveillance, self-driving cars, and robotics. Object detection algorithms are typically divided into two categories:
Single-Shot vs. Two-Stage Detectors
Single-Shot Detection:
Detects and classifies objects in one pass, making it faster but less accurate for small objects.
Two-Stage Detection:
First identifies potential object locations, then refines the predictions, offering higher accuracy but slower speed.
Metrics
- Intersection over Union (IoU)
Measures the overlap between predicted and ground-truth bounding boxes.
- Precision and Recall
-
Precision measures the proportion of correctly predicted positives to the total predicted positives:
-
Recall measures the proportion of correctly predicted positives to all actual positives:
- Average Precision (AP)
Average Precision (AP) combines precision and recall into a single metric by calculating the area under the precision-recall curve:
- Mean Average Precision (mAP)
Mean Average Precision (mAP) is the average of the AP values over all classes:
where N is the number of classes and APi is the average precision of i class.
How Does YOLO Work?
YOLO (You Only Look Once) is a real-time object detection algorithm that uses a single deep convolutional neural network (CNN) to predict bounding boxes and class probabilities for objects in an image simultaneously. It divides the image into a grid, and each grid cell predicts a set of bounding boxes and confidence scores. The entire image is processed in one pass, making it faster than traditional methods like two-stage detectors, such as Faster R-CNN. YOLO effectively balances speed and accuracy for real-time applications.
Evolution of YOLO
YOLOv1: Introduced a single-stage detection pipeline for real-time object detection, prioritizing speed over accuracy. It struggled with small object detection and localization errors.
YOLOv2 (YOLO9000): Improved accuracy without sacrificing speed, introduced batch normalization, high-resolution classifiers, and anchor boxes for better object localization.
YOLOv3: Added residual connections and feature pyramid networks, improving detection for small and large objects. Still struggled with crowded scenes and occlusion.
YOLOv4: Optimized performance with Cross-Stage Partial Networks (CSPNet) and the Mish activation function. It outperformed previous models but had increased complexity.
YOLOv5: Developed by Ultralytics, YOLOv5 is optimized for ease of use and fast inference, featuring multiple model sizes for different hardware capabilities.
Anchor-Free Ultralytics Head: YOLOv5 adopted an anchor-free detection head, enhancing performance across diverse scenarios by eliminating reliance on predefined anchor boxes.
YOLOv6: Introduced by Meituan, YOLOv6 achieved top-tier accuracy on the COCO dataset with innovations like the Bidirectional Concatenation (BiC) module and Anchor-Aided Training (AAT).
YOLOv7: Focused on real-time object detection with efficient layer aggregation networks, making it effective for mobile and embedded systems.
YOLOv8: Featured advanced architectures and transformer-based components, supporting multi-task learning and improving both speed and accuracy for various applications.
YOLO-NAS: Used Neural Architecture Search (NAS) to tailor the model architecture to specific tasks, datasets, and hardware, offering enhanced performance at the cost of increased computational power.
YOLOv9: Released in April 2024, YOLOv9 introduced innovations like Programmable Gradient Information (PGI) and the Generalized Efficient Layer Aggregation Network (GELAN), excelling in real-time object detection.
YOLOv10: Launched in May 2024, YOLOv10 removed the dependency on Non-Maximum Suppression (NMS) and optimized computational efficiency with features like lightweight classification heads and large-kernel convolutions.
YOLOv11
YOLOv11, introduced at the YOLO Vision 2024 event, delivers higher accuracy, faster processing speeds, and improved feature extraction while using fewer parameters compared to earlier models like YOLOv8. It supports a range of tasks, including object detection, instance segmentation, pose estimation, and object tracking, making it ideal for real-time deployment in cloud and edge environments.
Key Features
- Improved Feature Extraction: Enhanced backbone and neck architecture for precise detection and complex tasks.
- Efficiency and Speed Optimization: Streamlined designs and training processes for faster processing.
- Higher Accuracy with Fewer Parameters: 22% fewer parameters than YOLOv8m while achieving superior mAP on the COCO dataset.
- Versatile Deployment Options: Suitable for edge devices and cloud platforms, compatible with NVIDIA GPUs.
- Support for Multiple Tasks: Capable of object detection, instance segmentation, image classification, pose estimation, and more.
This demo video was taken at the YOLO Vision Event 2024, for more information you can watch the Ultralytics event here.
Comments