README.md · mask2former-swin-large-ade-semantic

README.md

3.1 KB · 67 lines · markdown Raw

1	`---`
2	`license: other`
3	`tags:`
4	`- vision`
5	`- image-segmentation`
6	`datasets:`
7	`- coco`
8	`widget:`
9	`- src: http://images.cocodataset.org/val2017/000000039769.jpg`
10	`example_title: Cats`
11	`- src: http://images.cocodataset.org/val2017/000000039770.jpg`
12	`example_title: Castle`
13	`---`
14
15	`# Mask2Former`
16
17	`Mask2Former model trained on ADE20k semantic segmentation (large-sized version, Swin backbone). It was introduced in the paper [Masked-attention Mask Transformer for Universal Image Segmentation`
18	`](https://arxiv.org/abs/2112.01527) and first released in [this repository](https://github.com/facebookresearch/Mask2Former/).`
19
20	`Disclaimer: The team releasing Mask2Former did not write a model card for this model so this model card has been written by the Hugging Face team.`
21
22	`## Model description`
23
24	`Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,`
25	`[MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without`
26	`without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.`
27
28	`![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/mask2former_architecture.png)`
29
30	`## Intended uses & limitations`
31
32	`You can use this particular checkpoint for panoptic segmentation. See the [model hub](https://huggingface.co/models?search=mask2former) to look for other`
33	`fine-tuned versions on a task that interests you.`
34
35	`### How to use`
36
37	`Here is how to use this model:`
38
39	```python
40	`import requests`
41	`import torch`
42	`from PIL import Image`
43	`from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation`
44
45
46	`# load Mask2Former fine-tuned on ADE20k semantic segmentation`
47	`processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-ade-semantic")`
48	`model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-ade-semantic")`
49
50	`url = "http://images.cocodataset.org/val2017/000000039769.jpg"`
51	`image = Image.open(requests.get(url, stream=True).raw)`
52	`inputs = processor(images=image, return_tensors="pt")`
53
54	`with torch.no_grad():`
55	`outputs = model(**inputs)`
56
57	# model predicts class_queries_logits of shape `(batch_size, num_queries)`
58	# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
59	`class_queries_logits = outputs.class_queries_logits`
60	`masks_queries_logits = outputs.masks_queries_logits`
61
62	`# you can pass them to processor for postprocessing`
63	`predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]`
64	`# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)`
65	```
66
67	`For more code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/master/en/model_doc/mask2former).`