README.md · face-parsing

README.md

5.5 KB · 166 lines · markdown Raw

1	`---`
2	`language: en`
3	`library_name: transformers`
4	`tags:`
5	`- vision`
6	`- image-segmentation`
7	`- nvidia/mit-b5`
8	`- transformers.js`
9	`- onnx`
10	`datasets:`
11	`- celebamaskhq`
12	`---`
13
14	`# Face Parsing`
15
16	`![example image and output](demo.png)`
17
18	`[Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer).`
19
20	`> ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova).`
21
22	`## Usage in Python`
23
24	`Exhaustive list of labels can be extracted from [config.json](https://huggingface.co/jonathandinu/face-parsing/blob/65972ac96180b397f86fda0980bbe68e6ee01b8f/config.json#L30).`
25
26	`\| id \| label \| note \|`
27	`\| :-: \| :--------- \| :---------------- \|`
28	`\| 0 \| background \| \|`
29	`\| 1 \| skin \| \|`
30	`\| 2 \| nose \| \|`
31	`\| 3 \| eye_g \| eyeglasses \|`
32	`\| 4 \| l_eye \| left eye \|`
33	`\| 5 \| r_eye \| right eye \|`
34	`\| 6 \| l_brow \| left eyebrow \|`
35	`\| 7 \| r_brow \| right eyebrow \|`
36	`\| 8 \| l_ear \| left ear \|`
37	`\| 9 \| r_ear \| right ear \|`
38	`\| 10 \| mouth \| area between lips \|`
39	`\| 11 \| u_lip \| upper lip \|`
40	`\| 12 \| l_lip \| lower lip \|`
41	`\| 13 \| hair \| \|`
42	`\| 14 \| hat \| \|`
43	`\| 15 \| ear_r \| earring \|`
44	`\| 16 \| neck_l \| necklace \|`
45	`\| 17 \| neck \| \|`
46	`\| 18 \| cloth \| clothing \|`
47
48	```python
49	`import torch`
50	`from torch import nn`
51	`from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation`
52
53	`from PIL import Image`
54	`import matplotlib.pyplot as plt`
55	`import requests`
56
57	`# convenience expression for automatically determining device`
58	`device = (`
59	`"cuda"`
60	`# Device for NVIDIA or AMD GPUs`
61	`if torch.cuda.is_available()`
62	`else "mps"`
63	`# Device for Apple Silicon (Metal Performance Shaders)`
64	`if torch.backends.mps.is_available()`
65	`else "cpu"`
66	`)`
67
68	`# load models`
69	`image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")`
70	`model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")`
71	`model.to(device)`
72
73	`# expects a PIL.Image or torch.Tensor`
74	`url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"`
75	`image = Image.open(requests.get(url, stream=True).raw)`
76
77	`# run inference on image`
78	`inputs = image_processor(images=image, return_tensors="pt").to(device)`
79	`outputs = model(**inputs)`
80	`logits = outputs.logits # shape (batch_size, num_labels, ~height/4, ~width/4)`
81
82	`# resize output to match input image dimensions`
83	`upsampled_logits = nn.functional.interpolate(logits,`
84	`size=image.size[::-1], # H x W`
85	`mode='bilinear',`
86	`align_corners=False)`
87
88	`# get label masks`
89	`labels = upsampled_logits.argmax(dim=1)[0]`
90
91	`# move to CPU to visualize in matplotlib`
92	`labels_viz = labels.cpu().numpy()`
93	`plt.imshow(labels_viz)`
94	`plt.show()`
95	```
96
97	`## Usage in the browser (Transformers.js)`
98
99	```js
100	`import {`
101	`pipeline,`
102	`env,`
103	`} from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0";`
104
105	`// important to prevent errors since the model files are likely remote on HF hub`
106	`env.allowLocalModels = false;`
107
108	`// instantiate image segmentation pipeline with pretrained face parsing model`
109	`model = await pipeline("image-segmentation", "jonathandinu/face-parsing");`
110
111	`// async inference since it could take a few seconds`
112	`const output = await model(url);`
113
114	`// each label is a separate mask object`
115	`// [`
116	`// { score: null, label: 'background', mask: transformers.js RawImage { ... }}`
117	`// { score: null, label: 'hair', mask: transformers.js RawImage { ... }}`
118	`// ...`
119	`// ]`
120	`for (const m of output) {`
121	print(`Found ${m.label}`);
122	m.mask.save(`${m.label}.png`);
123	`}`
124	```
125
126	`### p5.js`
127
128	`Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions.`
129
130	```js
131	`// ...`
132
133	`// asynchronously load transformers.js and instantiate model`
134	`async function preload() {`
135	`// load transformers.js library with a dynamic import`
136	`const { pipeline, env } = await import(`
137	`"https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0"`
138	`);`
139
140	`// important to prevent errors since the model files are remote on HF hub`
141	`env.allowLocalModels = false;`
142
143	`// instantiate image segmentation pipeline with pretrained face parsing model`
144	`model = await pipeline("image-segmentation", "jonathandinu/face-parsing");`
145
146	`print("face-parsing model loaded");`
147	`}`
148
149	`// ...`
150	```
151
152	`[full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh)`
153
154	`### Model Description`
155
156	`- Developed by: [Jonathan Dinu](https://twitter.com/jonathandinu)`
157	`- Model type: Transformer-based semantic segmentation image model`
158	`- License: non-commercial research and educational purposes`
159	`- Resources for more information: Transformers docs on [Segformer](https://huggingface.co/docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203).`
160
161	`## Limitations and Bias`
162
163	`### Bias`
164
165	`While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.`
166