README.md
| 1 | --- |
| 2 | license: other |
| 3 | tags: |
| 4 | - vision |
| 5 | - image-segmentation |
| 6 | datasets: |
| 7 | - scene_parse_150 |
| 8 | widget: |
| 9 | - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000001.jpg |
| 10 | example_title: House |
| 11 | - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000002.jpg |
| 12 | example_title: Castle |
| 13 | --- |
| 14 | |
| 15 | # SegFormer (b1-sized) model fine-tuned on ADE20k |
| 16 | |
| 17 | SegFormer model fine-tuned on ADE20k at resolution 512x512. It was introduced in the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Xie et al. and first released in [this repository](https://github.com/NVlabs/SegFormer). |
| 18 | |
| 19 | Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. |
| 20 | |
| 21 | ## Model description |
| 22 | |
| 23 | SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. |
| 24 | |
| 25 | ## Intended uses & limitations |
| 26 | |
| 27 | You can use the raw model for semantic segmentation. See the [model hub](https://huggingface.co/models?other=segformer) to look for fine-tuned versions on a task that interests you. |
| 28 | |
| 29 | ### How to use |
| 30 | |
| 31 | Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: |
| 32 | |
| 33 | ```python |
| 34 | from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation |
| 35 | from PIL import Image |
| 36 | import requests |
| 37 | |
| 38 | feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512") |
| 39 | model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b1-finetuned-ade-512-512") |
| 40 | |
| 41 | url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
| 42 | image = Image.open(requests.get(url, stream=True).raw) |
| 43 | |
| 44 | inputs = feature_extractor(images=image, return_tensors="pt") |
| 45 | outputs = model(**inputs) |
| 46 | logits = outputs.logits # shape (batch_size, num_labels, height/4, width/4) |
| 47 | ``` |
| 48 | |
| 49 | For more code examples, we refer to the [documentation](https://huggingface.co/transformers/model_doc/segformer.html#). |
| 50 | |
| 51 | ### BibTeX entry and citation info |
| 52 | |
| 53 | ```bibtex |
| 54 | @article{DBLP:journals/corr/abs-2105-15203, |
| 55 | author = {Enze Xie and |
| 56 | Wenhai Wang and |
| 57 | Zhiding Yu and |
| 58 | Anima Anandkumar and |
| 59 | Jose M. Alvarez and |
| 60 | Ping Luo}, |
| 61 | title = {SegFormer: Simple and Efficient Design for Semantic Segmentation with |
| 62 | Transformers}, |
| 63 | journal = {CoRR}, |
| 64 | volume = {abs/2105.15203}, |
| 65 | year = {2021}, |
| 66 | url = {https://arxiv.org/abs/2105.15203}, |
| 67 | eprinttype = {arXiv}, |
| 68 | eprint = {2105.15203}, |
| 69 | timestamp = {Wed, 02 Jun 2021 11:46:42 +0200}, |
| 70 | biburl = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib}, |
| 71 | bibsource = {dblp computer science bibliography, https://dblp.org} |
| 72 | } |
| 73 | ``` |
| 74 | |