README.md
3.2 KB · 107 lines · markdown Raw
1 # Model Card for PickScore v1
2
3 This model is a scoring function for images generated from text. It takes as input a prompt and a generated image and outputs a score.
4 It can be used as a general scoring function, and for tasks such as human preference prediction, model evaluation, image ranking, and more.
5 See our paper [Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation](https://arxiv.org/abs/2305.01569) for more details.
6
7
8 ## Model Details
9
10 ### Model Description
11
12 This model was finetuned from CLIP-H using the [Pick-a-Pic dataset](https://huggingface.co/datasets/yuvalkirstain/pickapic_v1).
13
14 ### Model Sources [optional]
15
16 <!-- Provide the basic links for the model. -->
17
18 - **Repository:** [See the PickScore repo](https://github.com/yuvalkirstain/PickScore)
19 - **Paper:** [Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation](https://arxiv.org/abs/2305.01569).
20 - **Demo [optional]:** [Huggingface Spaces demo for PickScore](https://huggingface.co/spaces/yuvalkirstain/PickScore)
21
22 ## How to Get Started with the Model
23
24 Use the code below to get started with the model.
25
26 ```python
27 # import
28 from transformers import AutoProcessor, AutoModel
29
30 # load model
31 device = "cuda"
32 processor_name_or_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
33 model_pretrained_name_or_path = "yuvalkirstain/PickScore_v1"
34
35 processor = AutoProcessor.from_pretrained(processor_name_or_path)
36 model = AutoModel.from_pretrained(model_pretrained_name_or_path).eval().to(device)
37
38 def calc_probs(prompt, images):
39
40 # preprocess
41 image_inputs = processor(
42 images=images,
43 padding=True,
44 truncation=True,
45 max_length=77,
46 return_tensors="pt",
47 ).to(device)
48
49 text_inputs = processor(
50 text=prompt,
51 padding=True,
52 truncation=True,
53 max_length=77,
54 return_tensors="pt",
55 ).to(device)
56
57
58 with torch.no_grad():
59 # embed
60 image_embs = model.get_image_features(**image_inputs)
61 image_embs = image_embs / torch.norm(image_embs, dim=-1, keepdim=True)
62
63 text_embs = model.get_text_features(**text_inputs)
64 text_embs = text_embs / torch.norm(text_embs, dim=-1, keepdim=True)
65
66 # score
67 scores = model.logit_scale.exp() * (text_embs @ image_embs.T)[0]
68
69 # get probabilities if you have multiple images to choose from
70 probs = torch.softmax(scores, dim=-1)
71
72 return probs.cpu().tolist()
73
74 pil_images = [Image.open("my_amazing_images/1.jpg"), Image.open("my_amazing_images/2.jpg")]
75 prompt = "fantastic, increadible prompt"
76 print(calc_probs(prompt, pil_images))
77 ```
78 ## Training Details
79
80 ### Training Data
81
82 This model was trained on the [Pick-a-Pic dataset](https://huggingface.co/datasets/yuvalkirstain/pickapic_v1).
83
84
85 ### Training Procedure
86
87 TODO - add paper.
88
89
90 ## Citation [optional]
91
92 If you find this work useful, please cite:
93
94 ```bibtex
95 @inproceedings{Kirstain2023PickaPicAO,
96 title={Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation},
97 author={Yuval Kirstain and Adam Polyak and Uriel Singer and Shahbuland Matiana and Joe Penna and Omer Levy},
98 year={2023}
99 }
100 ```
101
102 **APA:**
103
104 [More Information Needed]
105
106
107