README.md
5.5 KB · 166 lines · markdown Raw
1 ---
2 language: en
3 library_name: transformers
4 tags:
5 - vision
6 - image-segmentation
7 - nvidia/mit-b5
8 - transformers.js
9 - onnx
10 datasets:
11 - celebamaskhq
12 ---
13
14 # Face Parsing
15
16 ![example image and output](demo.png)
17
18 [Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer).
19
20 > ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova).
21
22 ## Usage in Python
23
24 Exhaustive list of labels can be extracted from [config.json](https://huggingface.co/jonathandinu/face-parsing/blob/65972ac96180b397f86fda0980bbe68e6ee01b8f/config.json#L30).
25
26 | id | label | note |
27 | :-: | :--------- | :---------------- |
28 | 0 | background | |
29 | 1 | skin | |
30 | 2 | nose | |
31 | 3 | eye_g | eyeglasses |
32 | 4 | l_eye | left eye |
33 | 5 | r_eye | right eye |
34 | 6 | l_brow | left eyebrow |
35 | 7 | r_brow | right eyebrow |
36 | 8 | l_ear | left ear |
37 | 9 | r_ear | right ear |
38 | 10 | mouth | area between lips |
39 | 11 | u_lip | upper lip |
40 | 12 | l_lip | lower lip |
41 | 13 | hair | |
42 | 14 | hat | |
43 | 15 | ear_r | earring |
44 | 16 | neck_l | necklace |
45 | 17 | neck | |
46 | 18 | cloth | clothing |
47
48 ```python
49 import torch
50 from torch import nn
51 from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
52
53 from PIL import Image
54 import matplotlib.pyplot as plt
55 import requests
56
57 # convenience expression for automatically determining device
58 device = (
59 "cuda"
60 # Device for NVIDIA or AMD GPUs
61 if torch.cuda.is_available()
62 else "mps"
63 # Device for Apple Silicon (Metal Performance Shaders)
64 if torch.backends.mps.is_available()
65 else "cpu"
66 )
67
68 # load models
69 image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")
70 model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")
71 model.to(device)
72
73 # expects a PIL.Image or torch.Tensor
74 url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"
75 image = Image.open(requests.get(url, stream=True).raw)
76
77 # run inference on image
78 inputs = image_processor(images=image, return_tensors="pt").to(device)
79 outputs = model(**inputs)
80 logits = outputs.logits # shape (batch_size, num_labels, ~height/4, ~width/4)
81
82 # resize output to match input image dimensions
83 upsampled_logits = nn.functional.interpolate(logits,
84 size=image.size[::-1], # H x W
85 mode='bilinear',
86 align_corners=False)
87
88 # get label masks
89 labels = upsampled_logits.argmax(dim=1)[0]
90
91 # move to CPU to visualize in matplotlib
92 labels_viz = labels.cpu().numpy()
93 plt.imshow(labels_viz)
94 plt.show()
95 ```
96
97 ## Usage in the browser (Transformers.js)
98
99 ```js
100 import {
101 pipeline,
102 env,
103 } from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0";
104
105 // important to prevent errors since the model files are likely remote on HF hub
106 env.allowLocalModels = false;
107
108 // instantiate image segmentation pipeline with pretrained face parsing model
109 model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
110
111 // async inference since it could take a few seconds
112 const output = await model(url);
113
114 // each label is a separate mask object
115 // [
116 // { score: null, label: 'background', mask: transformers.js RawImage { ... }}
117 // { score: null, label: 'hair', mask: transformers.js RawImage { ... }}
118 // ...
119 // ]
120 for (const m of output) {
121 print(`Found ${m.label}`);
122 m.mask.save(`${m.label}.png`);
123 }
124 ```
125
126 ### p5.js
127
128 Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions.
129
130 ```js
131 // ...
132
133 // asynchronously load transformers.js and instantiate model
134 async function preload() {
135 // load transformers.js library with a dynamic import
136 const { pipeline, env } = await import(
137 "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0"
138 );
139
140 // important to prevent errors since the model files are remote on HF hub
141 env.allowLocalModels = false;
142
143 // instantiate image segmentation pipeline with pretrained face parsing model
144 model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
145
146 print("face-parsing model loaded");
147 }
148
149 // ...
150 ```
151
152 [full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh)
153
154 ### Model Description
155
156 - **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu)
157 - **Model type:** Transformer-based semantic segmentation image model
158 - **License:** non-commercial research and educational purposes
159 - **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co/docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203).
160
161 ## Limitations and Bias
162
163 ### Bias
164
165 While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.
166