README.md
| 1 | --- |
| 2 | language: en |
| 3 | library_name: transformers |
| 4 | tags: |
| 5 | - vision |
| 6 | - image-segmentation |
| 7 | - nvidia/mit-b5 |
| 8 | - transformers.js |
| 9 | - onnx |
| 10 | datasets: |
| 11 | - celebamaskhq |
| 12 | --- |
| 13 | |
| 14 | # Face Parsing |
| 15 | |
| 16 |  |
| 17 | |
| 18 | [Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer). |
| 19 | |
| 20 | > ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova). |
| 21 | |
| 22 | ## Usage in Python |
| 23 | |
| 24 | Exhaustive list of labels can be extracted from [config.json](https://huggingface.co/jonathandinu/face-parsing/blob/65972ac96180b397f86fda0980bbe68e6ee01b8f/config.json#L30). |
| 25 | |
| 26 | | id | label | note | |
| 27 | | :-: | :--------- | :---------------- | |
| 28 | | 0 | background | | |
| 29 | | 1 | skin | | |
| 30 | | 2 | nose | | |
| 31 | | 3 | eye_g | eyeglasses | |
| 32 | | 4 | l_eye | left eye | |
| 33 | | 5 | r_eye | right eye | |
| 34 | | 6 | l_brow | left eyebrow | |
| 35 | | 7 | r_brow | right eyebrow | |
| 36 | | 8 | l_ear | left ear | |
| 37 | | 9 | r_ear | right ear | |
| 38 | | 10 | mouth | area between lips | |
| 39 | | 11 | u_lip | upper lip | |
| 40 | | 12 | l_lip | lower lip | |
| 41 | | 13 | hair | | |
| 42 | | 14 | hat | | |
| 43 | | 15 | ear_r | earring | |
| 44 | | 16 | neck_l | necklace | |
| 45 | | 17 | neck | | |
| 46 | | 18 | cloth | clothing | |
| 47 | |
| 48 | ```python |
| 49 | import torch |
| 50 | from torch import nn |
| 51 | from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation |
| 52 | |
| 53 | from PIL import Image |
| 54 | import matplotlib.pyplot as plt |
| 55 | import requests |
| 56 | |
| 57 | # convenience expression for automatically determining device |
| 58 | device = ( |
| 59 | "cuda" |
| 60 | # Device for NVIDIA or AMD GPUs |
| 61 | if torch.cuda.is_available() |
| 62 | else "mps" |
| 63 | # Device for Apple Silicon (Metal Performance Shaders) |
| 64 | if torch.backends.mps.is_available() |
| 65 | else "cpu" |
| 66 | ) |
| 67 | |
| 68 | # load models |
| 69 | image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing") |
| 70 | model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing") |
| 71 | model.to(device) |
| 72 | |
| 73 | # expects a PIL.Image or torch.Tensor |
| 74 | url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6" |
| 75 | image = Image.open(requests.get(url, stream=True).raw) |
| 76 | |
| 77 | # run inference on image |
| 78 | inputs = image_processor(images=image, return_tensors="pt").to(device) |
| 79 | outputs = model(**inputs) |
| 80 | logits = outputs.logits # shape (batch_size, num_labels, ~height/4, ~width/4) |
| 81 | |
| 82 | # resize output to match input image dimensions |
| 83 | upsampled_logits = nn.functional.interpolate(logits, |
| 84 | size=image.size[::-1], # H x W |
| 85 | mode='bilinear', |
| 86 | align_corners=False) |
| 87 | |
| 88 | # get label masks |
| 89 | labels = upsampled_logits.argmax(dim=1)[0] |
| 90 | |
| 91 | # move to CPU to visualize in matplotlib |
| 92 | labels_viz = labels.cpu().numpy() |
| 93 | plt.imshow(labels_viz) |
| 94 | plt.show() |
| 95 | ``` |
| 96 | |
| 97 | ## Usage in the browser (Transformers.js) |
| 98 | |
| 99 | ```js |
| 100 | import { |
| 101 | pipeline, |
| 102 | env, |
| 103 | } from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0"; |
| 104 | |
| 105 | // important to prevent errors since the model files are likely remote on HF hub |
| 106 | env.allowLocalModels = false; |
| 107 | |
| 108 | // instantiate image segmentation pipeline with pretrained face parsing model |
| 109 | model = await pipeline("image-segmentation", "jonathandinu/face-parsing"); |
| 110 | |
| 111 | // async inference since it could take a few seconds |
| 112 | const output = await model(url); |
| 113 | |
| 114 | // each label is a separate mask object |
| 115 | // [ |
| 116 | // { score: null, label: 'background', mask: transformers.js RawImage { ... }} |
| 117 | // { score: null, label: 'hair', mask: transformers.js RawImage { ... }} |
| 118 | // ... |
| 119 | // ] |
| 120 | for (const m of output) { |
| 121 | print(`Found ${m.label}`); |
| 122 | m.mask.save(`${m.label}.png`); |
| 123 | } |
| 124 | ``` |
| 125 | |
| 126 | ### p5.js |
| 127 | |
| 128 | Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions. |
| 129 | |
| 130 | ```js |
| 131 | // ... |
| 132 | |
| 133 | // asynchronously load transformers.js and instantiate model |
| 134 | async function preload() { |
| 135 | // load transformers.js library with a dynamic import |
| 136 | const { pipeline, env } = await import( |
| 137 | "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0" |
| 138 | ); |
| 139 | |
| 140 | // important to prevent errors since the model files are remote on HF hub |
| 141 | env.allowLocalModels = false; |
| 142 | |
| 143 | // instantiate image segmentation pipeline with pretrained face parsing model |
| 144 | model = await pipeline("image-segmentation", "jonathandinu/face-parsing"); |
| 145 | |
| 146 | print("face-parsing model loaded"); |
| 147 | } |
| 148 | |
| 149 | // ... |
| 150 | ``` |
| 151 | |
| 152 | [full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh) |
| 153 | |
| 154 | ### Model Description |
| 155 | |
| 156 | - **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu) |
| 157 | - **Model type:** Transformer-based semantic segmentation image model |
| 158 | - **License:** non-commercial research and educational purposes |
| 159 | - **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co/docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203). |
| 160 | |
| 161 | ## Limitations and Bias |
| 162 | |
| 163 | ### Bias |
| 164 | |
| 165 | While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities. |
| 166 | |