README.md · PickScore_v1

README.md

3.2 KB · 107 lines · markdown Raw

1	`# Model Card for PickScore v1`
2
3	`This model is a scoring function for images generated from text. It takes as input a prompt and a generated image and outputs a score.`
4	`It can be used as a general scoring function, and for tasks such as human preference prediction, model evaluation, image ranking, and more.`
5	`See our paper [Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation](https://arxiv.org/abs/2305.01569) for more details.`
6
7
8	`## Model Details`
9
10	`### Model Description`
11
12	`This model was finetuned from CLIP-H using the [Pick-a-Pic dataset](https://huggingface.co/datasets/yuvalkirstain/pickapic_v1).`
13
14	`### Model Sources [optional]`
15
16	`<!-- Provide the basic links for the model. -->`
17
18	`- Repository: [See the PickScore repo](https://github.com/yuvalkirstain/PickScore)`
19	`- Paper: [Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation](https://arxiv.org/abs/2305.01569).`
20	`- Demo [optional]: [Huggingface Spaces demo for PickScore](https://huggingface.co/spaces/yuvalkirstain/PickScore)`
21
22	`## How to Get Started with the Model`
23
24	`Use the code below to get started with the model.`
25
26	```python
27	`# import`
28	`from transformers import AutoProcessor, AutoModel`
29
30	`# load model`
31	`device = "cuda"`
32	`processor_name_or_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"`
33	`model_pretrained_name_or_path = "yuvalkirstain/PickScore_v1"`
34
35	`processor = AutoProcessor.from_pretrained(processor_name_or_path)`
36	`model = AutoModel.from_pretrained(model_pretrained_name_or_path).eval().to(device)`
37
38	`def calc_probs(prompt, images):`
39
40	`# preprocess`
41	`image_inputs = processor(`
42	`images=images,`
43	`padding=True,`
44	`truncation=True,`
45	`max_length=77,`
46	`return_tensors="pt",`
47	`).to(device)`
48
49	`text_inputs = processor(`
50	`text=prompt,`
51	`padding=True,`
52	`truncation=True,`
53	`max_length=77,`
54	`return_tensors="pt",`
55	`).to(device)`
56
57
58	`with torch.no_grad():`
59	`# embed`
60	`image_embs = model.get_image_features(**image_inputs)`
61	`image_embs = image_embs / torch.norm(image_embs, dim=-1, keepdim=True)`
62
63	`text_embs = model.get_text_features(**text_inputs)`
64	`text_embs = text_embs / torch.norm(text_embs, dim=-1, keepdim=True)`
65
66	`# score`
67	`scores = model.logit_scale.exp() * (text_embs @ image_embs.T)[0]`
68
69	`# get probabilities if you have multiple images to choose from`
70	`probs = torch.softmax(scores, dim=-1)`
71
72	`return probs.cpu().tolist()`
73
74	`pil_images = [Image.open("my_amazing_images/1.jpg"), Image.open("my_amazing_images/2.jpg")]`
75	`prompt = "fantastic, increadible prompt"`
76	`print(calc_probs(prompt, pil_images))`
77	```
78	`## Training Details`
79
80	`### Training Data`
81
82	`This model was trained on the [Pick-a-Pic dataset](https://huggingface.co/datasets/yuvalkirstain/pickapic_v1).`
83
84
85	`### Training Procedure`
86
87	`TODO - add paper.`
88
89
90	`## Citation [optional]`
91
92	`If you find this work useful, please cite:`
93
94	```bibtex
95	`@inproceedings{Kirstain2023PickaPicAO,`
96	`title={Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation},`
97	`author={Yuval Kirstain and Adam Polyak and Uriel Singer and Shahbuland Matiana and Joe Penna and Omer Levy},`
98	`year={2023}`
99	`}`
100	```
101
102	`APA:`
103
104	`[More Information Needed]`
105
106
107