README.md · Z-Image-Turbo

1

---

2

license: apache-2.0

3

language:

4

- en

5

pipeline_tag: text-to-image

6

library_name: diffusers

7

---

8

9

10

<h1 align="center">⚡️- Image<br><sub><sup>An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer</sup></sub></h1>

11

12

<div align="center">

13

14

[![Official Site](https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage)](https://tongyi-mai.github.io/Z-Image-blog/)&#160;

15

[![GitHub](https://img.shields.io/badge/GitHub-Z--Image-181717?logo=github&logoColor=white)](https://github.com/Tongyi-MAI/Z-Image)&#160;

16

[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)&#160;

17

[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Online_Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo)&#160;

18

[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Mobile_Demo-Z--Image--Turbo-red)](https://huggingface.co/spaces/akhaliq/Z-Image-Turbo)&#160;

19

[![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo)&#160;

20

[![ModelScope Space](https://img.shields.io/badge/🤖%20Online_Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster)&#160;

21

[![Art Gallery PDF](https://img.shields.io/badge/%F0%9F%96%BC%20Art_Gallery-PDF-ff69b4)](assets/Z-Image-Gallery.pdf)&#160;

22

[![Web Art Gallery](https://img.shields.io/badge/%F0%9F%8C%90%20Web_Art_Gallery-online-00bfff)](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)&#160;

23

<a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>

24

25

26

Welcome to the official repository for the Z-Image（造相）project!

27

28

</div>

29

30

31

32

## ✨ Z-Image

33

34

Z-Image is a powerful and highly efficient image generation model family with **6B** parameters. Currently there are four variants:

35

36

- 🚀 **Z-Image-Turbo** – A distilled version of Z-Image that matches or exceeds leading competitors with only **8 NFEs** (Number of Function Evaluations). It offers **⚡️sub-second inference latency⚡️** on enterprise-grade H800 GPUs and fits comfortably within **16G VRAM consumer devices**. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.

37

38

- 🎨 **Z-Image** – The foundation model behind Z-Image-Turbo. Z-Image focuses on **high-quality generation**, **rich aesthetics**, **strong diversity**, and **controllability**, well-suited for creative generation, **fine-tuning**, and downstream development. It supports a wide range of artistic styles, effective negative prompting, and high diversity across identities, poses, compositions, and layouts.

39

40

- 🧱 **Z-Image-Omni-Base** – The versatile foundation model capable of both **generation and editing tasks**. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development, providing the most "raw" and diverse starting point for the open-source community.

41

42

- ✍️ **Z-Image-Edit** – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.

43

44

### 📥 Model Zoo

45

46

| Model | Pre-Training | SFT | RL | Step | CFG | Task | Visual Quality | Diversity | Fine-Tunability | Hugging Face | ModelScope |

47

| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |

48

| **Z-Image-Omni-Base** | ✅ | ❌ | ❌ | 50 | ✅ | Gen. / Editing | Medium | High | Easy | *To be released* | *To be released* |

49

| **Z-Image** | ✅ | ✅ | ❌ | 50 | ✅ | Gen. | High | Medium | Easy | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint%20-Z--Image-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image) <br> [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Demo-Z--Image-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image) | [![ModelScope Model](https://img.shields.io/badge/🤖%20%20Checkpoint-Z--Image-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image) <br> [![ModelScope Space](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-Z--Image-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster) |

50

| **Z-Image-Turbo** | ✅ | ✅ | ✅ | 8 | ❌ | Gen. | Very High | Low | N/A | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint%20-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) <br> [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) | [![ModelScope Model](https://img.shields.io/badge/🤖%20%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo) <br> [![ModelScope Space](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster) |

51

| **Z-Image-Edit** | ✅ | ✅ | ❌ | 50 | ✅ | Editing | High | Medium | Easy | *To be released* | *To be released* |                                                                                                                                                                                                                                                                                           | *To be released*                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

52

53

### 🖼️ Showcase

54

55

📸 **Photorealistic Quality**: **Z-Image-Turbo** delivers strong photorealistic image generation while maintaining excellent aesthetic quality.

56

57

![Showcase of Z-Image on Photo-realistic image Generation](assets/showcase_realistic.png)

58

59

📖 **Accurate Bilingual Text Rendering**: **Z-Image-Turbo** excels at accurately rendering complex Chinese and English text.

60

61

![Showcase of Z-Image on Bilingual Text Rendering](assets/showcase_rendering.png)

62

63

💡  **Prompt Enhancing & Reasoning**: Prompt Enhancer empowers the model with reasoning capabilities, enabling it to transcend surface-level descriptions and tap into underlying world knowledge.

64

65

![reasoning.jpg](assets/reasoning.png)

66

67

🧠 **Creative Image Editing**: **Z-Image-Edit** shows a strong understanding of bilingual editing instructions, enabling imaginative and flexible image transformations.

68

69

![Showcase of Z-Image-Edit on Image Editing](assets/showcase_editing.png)

70

71

### 🏗️ Model Architecture

72

We adopt a **Scalable Single-Stream DiT** (S3-DiT) architecture. In this setup, text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level to serve as a unified input stream, maximizing parameter efficiency compared to dual-stream approaches.

73

74

![Architecture of Z-Image and Z-Image-Edit](assets/architecture.webp)

75

76

### 📈 Performance

77

According to the Elo-based Human Preference Evaluation (on [*Alibaba AI Arena*](https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I)), Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.

78

79

<p align="center">

80

<a href="https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I">

81

<img src="assets/leaderboard.png" alt="Z-Image Elo Rating on AI Arena"/><br />

82

    <span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Click to view the full leaderboard</span>

83

</a>

84

</p>

85

86

### 🚀 Quick Start

87

Install the latest version of diffusers, use the following command:

88

<details>

89

<summary><sup>Click here for details for why you need to install diffusers from source</sup></summary>

90

91

  We have submitted two pull requests ([#12703](https://github.com/huggingface/diffusers/pull/12703) and [#12715](https://github.com/huggingface/diffusers/pull/12715)) to the 🤗 diffusers repository to add support for Z-Image. Both PRs have been merged into the latest official diffusers release.

92

Therefore, you need to install diffusers from source for the latest features and Z-Image support.

93

94

</details>

95

96

```bash

97

pip install git+https://github.com/huggingface/diffusers

98

```

99

100

```python

101

import torch

102

from diffusers import ZImagePipeline

103

104

# 1. Load the pipeline

105

# Use bfloat16 for optimal performance on supported GPUs

106

pipe = ZImagePipeline.from_pretrained(

107

"Tongyi-MAI/Z-Image-Turbo",

108

torch_dtype=torch.bfloat16,

109

low_cpu_mem_usage=False,

110

)

111

pipe.to("cuda")

112

113

# [Optional] Attention Backend

114

# Diffusers uses SDPA by default. Switch to Flash Attention for better efficiency if supported:

115

# pipe.transformer.set_attention_backend("flash") # Enable Flash-Attention-2

116

# pipe.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3

117

118

# [Optional] Model Compilation

119

# Compiling the DiT model accelerates inference, but the first run will take longer to compile.

120

# pipe.transformer.compile()

121

122

# [Optional] CPU Offloading

123

# Enable CPU offloading for memory-constrained devices.

124

# pipe.enable_model_cpu_offload()

125

126

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

127

128

# 2. Generate Image

129

image = pipe(

130

prompt=prompt,

131

height=1024,

132

width=1024,

133

num_inference_steps=9, # This actually results in 8 DiT forwards

134

guidance_scale=0.0, # Guidance should be 0 for the Turbo models

135

generator=torch.Generator("cuda").manual_seed(42),

136

).images[0]

137

138

image.save("example.png")

139

```

140

141

## 🔬 Decoupled-DMD: The Acceleration Magic Behind Z-Image

142

143

[![arXiv](https://img.shields.io/badge/arXiv-2511.22677-b31b1b.svg)](https://arxiv.org/abs/2511.22677)

144

145

Decoupled-DMD is the core few-step distillation algorithm that empowers the 8-step Z-Image model.

146

147

Our core insight in Decoupled-DMD  is that the success of existing DMD (Distributaion Matching Distillation) methods is the result of two independent, collaborating mechanisms:

148

149

-   **CFG Augmentation (CA)**: The primary **engine** 🚀 driving the distillation process, a factor largely overlooked in previous work.

150

-   **Distribution Matching (DM)**: Acts more as a **regularizer** ⚖️, ensuring the stability and quality of the generated output.

151

152

By recognizing and decoupling these two mechanisms, we were able to study and optimize them in isolation. This ultimately motivated us to develop an improved distillation process that significantly enhances the performance of few-step generation.

153

154

![Diagram of Decoupled-DMD](assets/decoupled-dmd.webp)

155

156

## 🤖 DMDR: Fusing DMD with Reinforcement Learning

157

158

[![arXiv](https://img.shields.io/badge/arXiv-2511.13649-b31b1b.svg)](https://arxiv.org/abs/2511.13649)

159

160

Building upon the strong foundation of Decoupled-DMD, our 8-step Z-Image model has already demonstrated exceptional capabilities. To achieve further improvements in terms of semantic alignment, aesthetic quality, and structural coherence—while producing images with richer high-frequency details—we present **DMDR**.

161

162

Our core insight behind DMDR is that Reinforcement Learning (RL) and Distribution Matching Distillation (DMD) can be synergistically integrated during the post-training of few-step models. We demonstrate that:

163

164

- **RL Unlocks the Performance of DMD** 🚀

165

- **DMD Effectively Regularizes RL** ⚖️

166

167

![Diagram of DMDR](assets/DMDR.webp)

168

169

## ⏬ Download

170

```bash

171

pip install -U huggingface_hub

172

HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image-Turbo

173

```

174

175

## 📜 Citation

176

177

If you find our work useful in your research, please consider citing:

178

179

```bibtex

180

@article{team2025zimage,

181

title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},

182

author={Z-Image Team},

183

journal={arXiv preprint arXiv:2511.22699},

184

year={2025}

185

}

186

187

@article{liu2025decoupled,

188

title={Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield},

189

  author={Dongyang Liu and Peng Gao and David Liu and Ruoyi Du and Zhen Li and Qilong Wu and Xin Jin and Sihan Cao and Shifeng Zhang and Hongsheng Li and Steven Hoi},

190

journal={arXiv preprint arXiv:2511.22677},

191

year={2025}

192

}

193

194

@article{jiang2025distribution,

195

title={Distribution Matching Distillation Meets Reinforcement Learning},

196

  author={Jiang, Dengyang and Liu, Dongyang and Wang, Zanyi and Wu, Qilong and Jin, Xin and Liu, David and Li, Zhen and Wang, Mengmeng and Gao, Peng and Yang, Harry},

197

journal={arXiv preprint arXiv:2511.13649},

198

year={2025}

199

}

200

```