README.md
| 1 | --- |
| 2 | library_name: transformers |
| 3 | license: apache-2.0 |
| 4 | language: |
| 5 | - en |
| 6 | pipeline_tag: object-detection |
| 7 | tags: |
| 8 | - object-detection |
| 9 | - vision |
| 10 | datasets: |
| 11 | - coco |
| 12 | widget: |
| 13 | - src: >- |
| 14 | https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg |
| 15 | example_title: Savanna |
| 16 | - src: >- |
| 17 | https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg |
| 18 | example_title: Football Match |
| 19 | - src: >- |
| 20 | https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg |
| 21 | example_title: Airport |
| 22 | --- |
| 23 | ## RT-DETRv2 |
| 24 | |
| 25 | ### **Overview** |
| 26 | |
| 27 | The RT-DETRv2 model was proposed in [RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer](https://arxiv.org/abs/2407.17140) by Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu. RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies like dynamic data augmentation and scale-adaptive hyperparameters. |
| 28 | These changes enhance flexibility and practicality while maintaining real-time performance. |
| 29 | |
| 30 | This model was contributed by [@jadechoghari](https://x.com/jadechoghari) with the help of [@cyrilvallez](https://huggingface.co/cyrilvallez) and [@qubvel-hf](https://huggingface.co/qubvel-hf) |
| 31 | |
| 32 | This is |
| 33 | ### **Performance** |
| 34 | |
| 35 | RT-DETRv2 consistently outperforms its predecessor across all model sizes while maintaining the same real-time speeds. |
| 36 | |
| 37 |  |
| 38 | |
| 39 | ### **How to use** |
| 40 | |
| 41 | ```python |
| 42 | import torch |
| 43 | import requests |
| 44 | |
| 45 | from PIL import Image |
| 46 | from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor |
| 47 | |
| 48 | url = 'http://images.cocodataset.org/val2017/000000039769.jpg' |
| 49 | image = Image.open(requests.get(url, stream=True).raw) |
| 50 | |
| 51 | image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd") |
| 52 | model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd") |
| 53 | |
| 54 | inputs = image_processor(images=image, return_tensors="pt") |
| 55 | |
| 56 | with torch.no_grad(): |
| 57 | outputs = model(**inputs) |
| 58 | |
| 59 | results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.5) |
| 60 | |
| 61 | for result in results: |
| 62 | for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]): |
| 63 | score, label = score.item(), label_id.item() |
| 64 | box = [round(i, 2) for i in box.tolist()] |
| 65 | print(f"{model.config.id2label[label]}: {score:.2f} {box}") |
| 66 | ``` |
| 67 | |
| 68 | ``` |
| 69 | cat: 0.97 [341.14, 25.11, 639.98, 372.89] |
| 70 | cat: 0.96 [12.78, 56.35, 317.67, 471.34] |
| 71 | remote: 0.95 [39.96, 73.12, 175.65, 117.44] |
| 72 | sofa: 0.86 [-0.11, 2.97, 639.89, 473.62] |
| 73 | sofa: 0.82 [-0.12, 1.78, 639.87, 473.52] |
| 74 | remote: 0.79 [333.65, 76.38, 370.69, 187.48] |
| 75 | ``` |
| 76 | |
| 77 | ### **Training** |
| 78 | |
| 79 | RT-DETRv2 is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 − 0.95 with a step size of 0.05), and APval50 commonly used in real scenarios. |
| 80 | |
| 81 | ### **Applications** |
| 82 | |
| 83 | RT-DETRv2 is ideal for real-time object detection in diverse applications such as **autonomous driving**, **surveillance systems**, **robotics**, and **retail analytics**. Its enhanced flexibility and deployment-friendly design make it suitable for both edge devices and large-scale systems + ensures high accuracy and speed in dynamic, real-world environments. |