README.md
5.8 KB · 134 lines · markdown Raw
1 ---
2 pipeline_tag: text-to-image
3 inference: false
4 ---
5
6 # SD-Turbo Model Card
7
8 <!-- Provide a quick summary of what the model is/does. -->
9 ![row01](output_tile.jpg)
10 SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation.
11 We release SD-Turbo as a research artifact, and to study small, distilled text-to-image models. For increased quality and prompt understanding,
12 we recommend [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
13
14 Please note: For commercial use, please refer to https://stability.ai/license.
15
16 ## Model Details
17
18 ### Model Description
19 SD-Turbo is a distilled version of [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1), trained for real-time synthesis.
20 SD-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the [technical report](https://stability.ai/research/adversarial-diffusion-distillation)), which allows sampling large-scale foundational
21 image diffusion models in 1 to 4 steps at high image quality.
22 This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an
23 adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps.
24
25 - **Developed by:** Stability AI
26 - **Funded by:** Stability AI
27 - **Model type:** Generative text-to-image model
28 - **Finetuned from model:** [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1)
29
30 ### Model Sources
31
32 For research purposes, we recommend our `generative-models` Github repository (https://github.com/Stability-AI/generative-models),
33 which implements the most popular diffusion frameworks (both training and inference).
34
35 - **Repository:** https://github.com/Stability-AI/generative-models
36 - **Paper:** https://stability.ai/research/adversarial-diffusion-distillation
37 - **Demo [for the bigger SDXL-Turbo]:** http://clipdrop.co/stable-diffusion-turbo
38
39
40 ## Evaluation
41 ![comparison1](image_quality_one_step.png)
42 ![comparison2](prompt_alignment_one_step.png)
43 The charts above evaluate user preference for SD-Turbo over other single- and multi-step models.
44 SD-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-Lora XL and LCM-Lora 1.5.
45
46 **Note:** For increased quality, we recommend the bigger version [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
47 For details on the user study, we refer to the [research paper](https://stability.ai/research/adversarial-diffusion-distillation).
48
49
50 ## Uses
51
52 ### Direct Use
53
54 The model is intended for both non-commercial and commercial usage. Possible research areas and tasks include
55
56 - Research on generative models.
57 - Research on real-time applications of generative models.
58 - Research on the impact of real-time generative models.
59 - Safe deployment of models which have the potential to generate harmful content.
60 - Probing and understanding the limitations and biases of generative models.
61 - Generation of artworks and use in design and other artistic processes.
62 - Applications in educational or creative tools.
63
64 For commercial use, please refer to https://stability.ai/membership.
65
66 Excluded uses are described below.
67
68 ### Diffusers
69
70 ```
71 pip install diffusers transformers accelerate --upgrade
72 ```
73
74 - **Text-to-image**:
75
76 SD-Turbo does not make use of `guidance_scale` or `negative_prompt`, we disable it with `guidance_scale=0.0`.
77 Preferably, the model generates images of size 512x512 but higher image sizes work as well.
78 A **single step** is enough to generate high quality images.
79
80 ```py
81 from diffusers import AutoPipelineForText2Image
82 import torch
83
84 pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
85 pipe.to("cuda")
86
87 prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."
88 image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
89 ```
90
91 - **Image-to-image**:
92
93 When using SD-Turbo for image-to-image generation, make sure that `num_inference_steps` * `strength` is larger or equal
94 to 1. The image-to-image pipeline will run for `int(num_inference_steps * strength)` steps, *e.g.* 0.5 * 2.0 = 1 step in our example
95 below.
96
97 ```py
98 from diffusers import AutoPipelineForImage2Image
99 from diffusers.utils import load_image
100 import torch
101
102 pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
103 pipe.to("cuda")
104
105 init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))
106 prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
107
108 image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
109 ```
110
111 ### Out-of-Scope Use
112
113 The model was not trained to be factual or true representations of people or events,
114 and therefore using the model to generate such content is out-of-scope for the abilities of this model.
115 The model should not be used in any way that violates Stability AI's [Acceptable Use Policy](https://stability.ai/use-policy).
116
117 ## Limitations and Bias
118
119 ### Limitations
120 - The quality and prompt alignment is lower than that of [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
121 - The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.
122 - The model cannot render legible text.
123 - Faces and people in general may not be generated properly.
124 - The autoencoding part of the model is lossy.
125
126
127 ### Recommendations
128
129 The model is intended for both non-commercial and commercial usage.
130
131 ## How to Get Started with the Model
132
133 Check out https://github.com/Stability-AI/generative-models
134