README.md
3.5 KB · 83 lines · markdown Raw
1 ---
2 library_name: transformers
3 license: apache-2.0
4 language:
5 - en
6 pipeline_tag: object-detection
7 tags:
8 - object-detection
9 - vision
10 datasets:
11 - coco
12 widget:
13 - src: >-
14 https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
15 example_title: Savanna
16 - src: >-
17 https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
18 example_title: Football Match
19 - src: >-
20 https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
21 example_title: Airport
22 ---
23 ## RT-DETRv2
24
25 ### **Overview**
26
27 The RT-DETRv2 model was proposed in [RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer](https://arxiv.org/abs/2407.17140) by Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu. RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies like dynamic data augmentation and scale-adaptive hyperparameters.
28 These changes enhance flexibility and practicality while maintaining real-time performance.
29
30 This model was contributed by [@jadechoghari](https://x.com/jadechoghari) with the help of [@cyrilvallez](https://huggingface.co/cyrilvallez) and [@qubvel-hf](https://huggingface.co/qubvel-hf)
31
32 This is
33 ### **Performance**
34
35 RT-DETRv2 consistently outperforms its predecessor across all model sizes while maintaining the same real-time speeds.
36
37 ![rt-detr-v2-graph.png](https://huggingface.co/datasets/jadechoghari/images/resolve/main/rt-detr-v2-graph.png)
38
39 ### **How to use**
40
41 ```python
42 import torch
43 import requests
44
45 from PIL import Image
46 from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
47
48 url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
49 image = Image.open(requests.get(url, stream=True).raw)
50
51 image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
52 model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd")
53
54 inputs = image_processor(images=image, return_tensors="pt")
55
56 with torch.no_grad():
57 outputs = model(**inputs)
58
59 results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.5)
60
61 for result in results:
62 for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
63 score, label = score.item(), label_id.item()
64 box = [round(i, 2) for i in box.tolist()]
65 print(f"{model.config.id2label[label]}: {score:.2f} {box}")
66 ```
67
68 ```
69 cat: 0.97 [341.14, 25.11, 639.98, 372.89]
70 cat: 0.96 [12.78, 56.35, 317.67, 471.34]
71 remote: 0.95 [39.96, 73.12, 175.65, 117.44]
72 sofa: 0.86 [-0.11, 2.97, 639.89, 473.62]
73 sofa: 0.82 [-0.12, 1.78, 639.87, 473.52]
74 remote: 0.79 [333.65, 76.38, 370.69, 187.48]
75 ```
76
77 ### **Training**
78
79 RT-DETRv2 is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 − 0.95 with a step size of 0.05), and APval50 commonly used in real scenarios.
80
81 ### **Applications**
82
83 RT-DETRv2 is ideal for real-time object detection in diverse applications such as **autonomous driving**, **surveillance systems**, **robotics**, and **retail analytics**. Its enhanced flexibility and deployment-friendly design make it suitable for both edge devices and large-scale systems + ensures high accuracy and speed in dynamic, real-world environments.