README.md · sdxl-turbo

1

---

2

pipeline_tag: text-to-image

3

inference: false

4

license: other

5

license_name: sai-nc-community

6

license_link: https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md

7

---

8

9

# SDXL-Turbo Model Card

10

11

12

![row01](output_tile.jpg)

13

SDXL-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation.

14

A real-time demo is available here: http://clipdrop.co/stable-diffusion-turbo

15

16

Please note: For commercial use, please refer to https://stability.ai/license.

17

18

## Model Details

19

20

### Model Description

21

SDXL-Turbo is a distilled version of [SDXL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), trained for real-time synthesis.

22

SDXL-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the [technical report](https://stability.ai/research/adversarial-diffusion-distillation)), which allows sampling large-scale foundational

23

image diffusion models in 1 to 4 steps at high image quality.

24

This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an

25

adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps.

26

27

- **Developed by:** Stability AI

28

- **Funded by:** Stability AI

29

- **Model type:** Generative text-to-image model

30

- **Finetuned from model:** [SDXL 1.0 Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)

31

32

### Model Sources

33

34

For research purposes, we recommend our `generative-models` Github repository (https://github.com/Stability-AI/generative-models),

35

which implements the most popular diffusion frameworks (both training and inference).

36

37

- **Repository:** https://github.com/Stability-AI/generative-models

38

- **Paper:** https://stability.ai/research/adversarial-diffusion-distillation

39

- **Demo:** http://clipdrop.co/stable-diffusion-turbo

40

41

42

## Evaluation

43

![comparison1](image_quality_one_step.png)

44

![comparison2](prompt_alignment_one_step.png)

45

The charts above evaluate user preference for SDXL-Turbo over other single- and multi-step models.

46

SDXL-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-XL evaluated at four (or fewer) steps.

47

In addition, we see that using four steps for SDXL-Turbo further improves performance.

48

For details on the user study, we refer to the [research paper](https://stability.ai/research/adversarial-diffusion-distillation).

49

50

51

## Uses

52

53

### Direct Use

54

55

The model is intended for both non-commercial and commercial usage. You can use this model for non-commercial or research purposes under this [license](https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md). Possible research areas and tasks include

56

57

- Research on generative models.

58

- Research on real-time applications of generative models.

59

- Research on the impact of real-time generative models.

60

- Safe deployment of models which have the potential to generate harmful content.

61

- Probing and understanding the limitations and biases of generative models.

62

- Generation of artworks and use in design and other artistic processes.

63

- Applications in educational or creative tools.

64

65

For commercial use, please refer to https://stability.ai/membership.

66

67

Excluded uses are described below.

68

69

### Diffusers

70

71

```

72

pip install diffusers transformers accelerate --upgrade

73

```

74

75

- **Text-to-image**:

76

77

SDXL-Turbo does not make use of `guidance_scale` or `negative_prompt`, we disable it with `guidance_scale=0.0`.

78

Preferably, the model generates images of size 512x512 but higher image sizes work as well.

79

A **single step** is enough to generate high quality images.

80

81

```py

82

from diffusers import AutoPipelineForText2Image

83

import torch

84

85

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")

86

pipe.to("cuda")

87

88

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

89

90

image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

91

```

92

93

- **Image-to-image**:

94

95

When using SDXL-Turbo for image-to-image generation, make sure that `num_inference_steps` * `strength` is larger or equal

96

to 1. The image-to-image pipeline will run for `int(num_inference_steps * strength)` steps, *e.g.* 0.5 * 2.0 = 1 step in our example

97

below.

98

99

```py

100

from diffusers import AutoPipelineForImage2Image

101

from diffusers.utils import load_image

102

import torch

103

104

pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")

105

pipe.to("cuda")

106

107

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))

108

109

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

110

111

image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]

112

```

113

114

### Out-of-Scope Use

115

116

The model was not trained to be factual or true representations of people or events,

117

and therefore using the model to generate such content is out-of-scope for the abilities of this model.

118

The model should not be used in any way that violates Stability AI's [Acceptable Use Policy](https://stability.ai/use-policy).

119

120

## Limitations and Bias

121

122

### Limitations

123

- The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.

124

- The model cannot render legible text.

125

- Faces and people in general may not be generated properly.

126

- The autoencoding part of the model is lossy.

127

128

129

### Recommendations

130

131

The model is intended for both non-commercial and commercial usage.

132

133

## How to Get Started with the Model

134

135

Check out https://github.com/Stability-AI/generative-models