README.md
25.4 KB · 567 lines · markdown Raw
1 ---
2 license: other
3 license_name: tencent-hunyuan-community
4 license_link: LICENSE
5 pipeline_tag: text-to-image
6 library_name: transformers
7 ---
8
9
10 [中文文档](./README_zh_CN.md)
11
12 <div align="center">
13
14 <img src="./assets/logo.png" alt="HunyuanImage-3.0 Logo" width="600">
15
16 # 🎨 HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
17
18 </div>
19
20
21 <div align="center">
22 <img src="./assets/banner.png" alt="HunyuanImage-3.0 Banner" width="800">
23
24 </div>
25
26 <div align="center">
27 <a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
28 <a href=https://huggingface.co/tencent/HunyuanImage-3.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
29 <a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
30 <a href=https://arxiv.org/pdf/2509.23951 target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
31 <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
32 <a href=https://docs.qq.com/doc/DUVVadmhCdG9qRXBU target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a>
33 </div>
34
35
36 <p align="center">
37 👏 Join our <a href="./assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |
38 💻 <a href="https://hunyuan.tencent.com/modelSquare/home/play?modelId=289&from=/visual">Official website(官网) Try our model!</a>&nbsp&nbsp
39 </p>
40
41 ## 🔥🔥🔥 News
42
43 - **January 26, 2026**: 🚀 **[HunyuanImage-3.0-Instruct-Distil](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil)** - Distilled checkpoint for efficient deployment (8 steps sampling recommended).
44 - **January 26, 2026**: 🎉 **[HunyuanImage-3.0-Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct)** - Release of **Instruct (with reasoning)** for intelligent prompt enhancement and **Image-to-Image** generation for creative editing.
45 - **October 30, 2025**: 🚀 **[HunyuanImage-3.0 vLLM Acceleration](./vllm_infer/README.md)** - Significantly faster inference with vLLM support.
46 - **September 28, 2025**: 📖 **[HunyuanImage-3.0 Technical Report](https://arxiv.org/pdf/2509.23951)** - Comprehensive technical documentation now available.
47 - **September 28, 2025**: 🎉 **[HunyuanImage-3.0 Open Source](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)** - Inference code and model weights publicly available.
48
49
50 ## 🧩 Community Contributions
51
52 If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.
53
54 ## 📑 Open-source Plan
55
56 - HunyuanImage-3.0 (Image Generation Model)
57 - [x] Inference
58 - [x] HunyuanImage-3.0 Checkpoints
59 - [x] HunyuanImage-3.0-Instruct Checkpoints (with reasoning)
60 - [x] vLLM Support
61 - [x] Distilled Checkpoints
62 - [x] Image-to-Image Generation
63 - [ ] Multi-turn Interaction
64
65
66 ## 🗂️ Contents
67 - [🔥🔥🔥 News](#-news)
68 - [🧩 Community Contributions](#-community-contributions)
69 - [📑 Open-source Plan](#-open-source-plan)
70 - [📖 Introduction](#-introduction)
71 - [✨ Key Features](#-key-features)
72 - [🚀 Usage](#-usage)
73 - [📦 Environment Setup](#-environment-setup)
74 - [📥 Install Dependencies](#-install-dependencies)
75 - [HunyuanImage-3.0 (Text-to-image)](#hunyuanimage-30-text-to-image)
76 - [🔥 Quick Start with Transformers](#-quick-start-with-transformers)
77 - [1️⃣ Download model weights](#1-download-model-weights)
78 - [2️⃣ Run with Transformers](#2-run-with-transformers)
79 - [🏠 Local Installation & Usage](#-local-installation--usage)
80 - [1️⃣ Clone the Repository](#1-clone-the-repository)
81 - [2️⃣ Download Model Weights](#2-download-model-weights)
82 - [3️⃣ Run the Demo](#3-run-the-demo)
83 - [4️⃣ Command Line Arguments](#4-command-line-arguments)
84 - [🎨 Interactive Gradio Demo](#-interactive-gradio-demo)
85 - [1️⃣ Install Gradio](#1-install-gradio)
86 - [2️⃣ Configure Environment](#2-configure-environment)
87 - [3️⃣ Launch the Web Interface](#3-launch-the-web-interface)
88 - [4️⃣ Access the Interface](#4-access-the-interface)
89 - [HunyuanImage-3.0-Instruct](#hunyuanimage-30-instruct-instruction-reasoning-and-image-to-image-generation-including-editing-and-multi-image-fusion)
90 - [🔥 Quick Start with Transformers](#-quick-start-with-transformers-1)
91 - [1️⃣ Download model weights](#1-download-model-weights-1)
92 - [2️⃣ Run with Transformers](#2-run-with-transformers-1)
93 - [🏠 Local Installation & Usage](#-local-installation--usage-1)
94 - [1️⃣ Clone the Repository](#1-clone-the-repository-1)
95 - [2️⃣ Download Model Weights](#2-download-model-weights-1)
96 - [3️⃣ Run the Demo](#3-run-the-demo-1)
97 - [4️⃣ Command Line Arguments](#4-command-line-arguments-1)
98 - [5️⃣ For fewer Sampling Steps](#5-for-fewer-sampling-steps)
99 - [🧱 Models Cards](#-models-cards)
100 - [📊 Evaluation](#-evaluation)
101 - [Evaluation of HunyuanImage-3.0-Instruct](#evaluation-of-hunyuanimage-30-instruct)
102 - [Evaluation of HunyuanImage-3.0 (Text-to-Image)](#evaluation-of-hunyuanimage-30-text-to-image)
103 - [🖼️ Showcase](#-showcase)
104 - [Showcases of HunyuanImage-3.0-Instruct](#showcases-of-hunyuanimage-30-instruct)
105 - [📚 Citation](#-citation)
106 - [🙏 Acknowledgements](#-acknowledgements)
107 - [🌟🚀 Github Star History](#-github-star-history)
108
109 ---
110
111 ## 📖 Introduction
112
113 **HunyuanImage-3.0** is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image and image-to-image model achieves performance **comparable to or surpassing** leading closed-source models.
114
115
116 <div align="center">
117 <img src="./assets/framework.png" alt="HunyuanImage-3.0 Framework" width="90%">
118 </div>
119
120 ## ✨ Key Features
121
122 * 🧠 **Unified Multimodal Architecture:** Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation.
123
124 * 🏆 **The Largest Image Generation MoE Model:** This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.
125
126 * 🎨 **Superior Image Generation Performance:** Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.
127
128 * 💭 **Intelligent Image Understanding and World-Knowledge Reasoning:** The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It under stands user's input image, and leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.
129
130
131 ## 🚀 Usage
132
133 ### 📦 Environment Setup
134
135 * 🐍 **Python:** 3.12+ (recommended and tested)
136 * ⚡ **CUDA:** 12.8
137
138 #### 📥 Install Dependencies
139
140 ```bash
141 # 1. First install PyTorch (CUDA 12.8 Version)
142 pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
143
144 # 2. Install tencentcloud-sdk for Prompt Enhancement (PE) only for HunyuanImage-3.0 not HunyuanImage-3.0-Instruct
145 pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python
146
147 # 3. Then install other dependencies
148 pip install -r requirements.txt
149 ```
150
151 For **up to 3x faster inference**, install these optimizations:
152
153 ```bash
154 # FlashInfer for optimized moe inference. v0.5.0 is tested.
155 pip install flashinfer-python==0.5.0
156 ```
157 > 💡**Installation Tips:** It is critical that the CUDA version used by PyTorch matches the system's CUDA version.
158 > FlashInfer relies on this compatibility when compiling kernels at runtime.
159 > GCC version >=9 is recommended for compiling FlashAttention and FlashInfer.
160
161 > ⚡ **Performance Tips:** These optimizations can significantly speed up your inference!
162
163 > 💡**Notation:** When FlashInfer is enabled, the first inference may be slower (about 10 minutes) due to kernel compilation. Subsequent inferences on the same machine will be much faster.
164
165 ### HunyuanImage-3.0 (Text-to-image)
166
167 #### 🔥 Quick Start with Transformers
168
169 ##### 1️⃣ Download model weights
170
171 ```bash
172 # Download from HuggingFace and rename the directory.
173 # Notice that the directory name should not contain dots, which may cause issues when loading using Transformers.
174 hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
175 ```
176
177 ##### 2️⃣ Run with Transformers
178
179 ```python
180 from transformers import AutoModelForCausalLM
181
182 # Load the model
183 model_id = "./HunyuanImage-3"
184 # Currently we can not load the model using HF model_id `tencent/HunyuanImage-3.0` directly
185 # due to the dot in the name.
186
187 kwargs = dict(
188 attn_implementation="sdpa", # Use "flash_attention_2" if FlashAttention is installed
189 trust_remote_code=True,
190 torch_dtype="auto",
191 device_map="auto",
192 moe_impl="eager", # Use "flashinfer" if FlashInfer is installed
193 )
194
195 model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
196 model.load_tokenizer(model_id)
197
198 # generate the image
199 prompt = "A brown and white dog is running on the grass"
200 image = model.generate_image(prompt=prompt, stream=True)
201 image.save("image.png")
202 ```
203
204
205 #### 🏠 Local Installation & Usage
206
207 ##### 1️⃣ Clone the Repository
208
209 ```bash
210 git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
211 cd HunyuanImage-3.0/
212 ```
213
214 ##### 2️⃣ Download Model Weights
215
216 ```bash
217 # Download from HuggingFace
218 hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
219 ```
220
221 ##### 3️⃣ Run the Demo
222 The Pretrain Checkpoint does not automatically rewrite or enhance input prompts, for optimal results currently, we recommend community partners to use deepseek to rewrite the prompts. You can go to [Tencent Cloud](https://cloud.tencent.com/document/product/1772/115963#.E5.BF.AB.E9.80.9F.E6.8E.A5.E5.85.A5) to apply for an API Key.
223
224 ```bash
225 # Without PE
226 export MODEL_PATH="./HunyuanImage-3"
227 python3 run_image_gen.py \
228 --model-id $MODEL_PATH \
229 --verbose 1 \
230 --prompt "A brown and white dog is running on the grass" \
231 --bot-task image \
232 --image-size "1024x1024" \
233 --save ./image.png \
234 --moe-impl flashinfer
235
236 # With PE
237 export DEEPSEEK_KEY_ID="your_deepseek_key_id"
238 export DEEPSEEK_KEY_SECRET="your_deepseek_key_secret"
239 export MODEL_PATH="./HunyuanImage-3"
240 python3 run_image_gen.py \
241 --model-id $MODEL_PATH \
242 --verbose 1 \
243 --prompt "A brown and white dog is running on the grass" \
244 --bot-task image \
245 --image-size "1024x1024" \
246 --save ./image.png \
247 --moe-impl flashinfer \
248 --rewrite 1
249
250 ```
251
252 ##### 4️⃣ Command Line Arguments
253
254 | Arguments | Description | Recommended |
255 | ----------------------- | ------------------------------------------------------------ | ----------- |
256 | `--prompt` | Input prompt | (Required) |
257 | `--model-id` | Model path | (Required) |
258 | `--attn-impl` | Attention implementation. Either `sdpa` or `flash_attention_2`. | `sdpa` |
259 | `--moe-impl` | MoE implementation. Either `eager` or `flashinfer` | `flashinfer` |
260 | `--seed` | Random seed for image generation | `None` |
261 | `--diff-infer-steps` | Diffusion infer steps | `50` |
262 | `--image-size` | Image resolution. Can be `auto`, like `1280x768` or `16:9` | `auto` |
263 | `--save` | Image save path. | `image.png` |
264 | `--verbose` | Verbose level. 0: No log; 1: log inference information. | `0` |
265 | `--rewrite` | Whether to enable rewriting | `1` |
266
267 #### 🎨 Interactive Gradio Demo
268
269 Launch an interactive web interface for easy text-to-image generation.
270
271 ##### 1️⃣ Install Gradio
272
273 ```bash
274 pip install gradio>=4.21.0
275 ```
276
277 ##### 2️⃣ Configure Environment
278
279 ```bash
280 # Set your model path
281 export MODEL_ID="path/to/your/model"
282
283 # Optional: Configure GPU usage (default: 0,1,2,3)
284 export GPUS="0,1,2,3"
285
286 # Optional: Configure host and port (default: 0.0.0.0:443)
287 export HOST="0.0.0.0"
288 export PORT="443"
289 ```
290
291 ##### 3️⃣ Launch the Web Interface
292
293 **Basic Launch:**
294 ```bash
295 sh run_app.sh
296 ```
297
298 **With Performance Optimizations:**
299 ```bash
300 # Use both optimizations for maximum performance
301 sh run_app.sh --moe-impl flashinfer --attn-impl flash_attention_2
302 ```
303
304 ##### 4️⃣ Access the Interface
305
306 > 🌐 **Web Interface:** Open your browser and navigate to `http://localhost:443` (or your configured port)
307
308
309
310 <details>
311 <summary> Latest Version (Image-to-image & Text-image-to-image) </summary>
312
313 ### HunyuanImage-3.0-Instruct (Instruction reasoning and Image-to-image generation, including editing and multi-image fusion)
314
315 #### 🔥 Quick Start with Transformers
316
317 ##### 1️⃣ Download model weights
318
319 ```bash
320 # Download from HuggingFace and rename the directory.
321 # Notice that the directory name should not contain dots, which may cause issues when loading using Transformers.
322 hf download tencent/HunyuanImage-3.0-Instruct --local-dir ./HunyuanImage-3-Instruct
323 ```
324
325 ##### 2️⃣ Run with Transformers
326
327 ```python
328 from transformers import AutoModelForCausalLM
329
330 # Load the model
331 model_id = "./HunyuanImage-3-Instruct"
332 # Currently we can not load the model using HF model_id `tencent/HunyuanImage-3.0-Instruct` directly
333 # due to the dot in the name.
334
335 kwargs = dict(
336 attn_implementation="sdpa",
337 trust_remote_code=True,
338 torch_dtype="auto",
339 device_map="auto",
340 moe_impl="eager", # Use "flashinfer" if FlashInfer is installed
341 moe_drop_tokens=True,
342 )
343
344 model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
345 model.load_tokenizer(model_id)
346
347 # Image-to-Image generation (TI2I)
348 prompt = "基于图一的logo,参考图二中冰箱贴的材质,制作一个新的冰箱贴"
349
350 input_img1 = "./assets/demo_instruct_imgs/input_1_0.png"
351 input_img2 = "./assets/demo_instruct_imgs/input_1_1.png"
352 imgs_input = [input_img1, input_img2]
353
354 cot_text, samples = model.generate_image(
355 prompt=prompt,
356 image=imgs_input,
357 seed=42,
358 image_size="auto",
359 use_system_prompt="en_unified",
360 bot_task="think_recaption", # Use "think_recaption" for reasoning and enhancement
361 infer_align_image_size=True, # Align output image size to input image size
362 diff_infer_steps=50,
363 verbose=2
364 )
365
366 # Save the generated image
367 samples[0].save("image_edit.png")
368 ```
369
370 #### 🏠 Local Installation & Usage
371
372 ##### 1️⃣ Clone the Repository
373
374 ```bash
375 git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
376 cd HunyuanImage-3.0/
377 ```
378
379 ##### 2️⃣ Download Model Weights
380
381 ```bash
382 # Download from HuggingFace
383 hf download tencent/HunyuanImage-3.0-Instruct --local-dir ./HunyuanImage-3-Instruct
384 ```
385
386 ##### 3️⃣ Run the Demo
387
388 More demos in `run_demo_instruct.sh`.
389
390 ```bash
391 export MODEL_PATH="./HunyuanImage-3-Instruct"
392 bash run_demo_instruct.sh
393 ```
394
395 ##### 4️⃣ Command Line Arguments
396
397 | Arguments | Description | Recommended |
398 | ----------------------- | ------------------------------------------------------------ | ----------- |
399 | `--prompt` | Input prompt | (Required) |
400 | `--image` | Image to run. For multiple images, use comma-separated paths (e.g., 'img1.png,img2.png') | (Required) |
401 | `--model-id` | Model path | (Required) |
402 | `--attn-impl` | Attention implementation. Now only support 'sdpa' | `sdpa` |
403 | `--moe-impl` | MoE implementation. Either `eager` or `flashinfer` | `flashinfer` |
404 | `--seed` | Random seed for image generation. Use None for random seed | `None` |
405 | `--diff-infer-steps` | Number of inference steps | `50` |
406 | `--image-size` | Image resolution. Can be `auto`, like `1280x768` or `16:9` | `auto` |
407 | `--use-system-prompt` | System prompt type. Options: `None`, `dynamic`, `en_vanilla`, `en_recaption`, `en_think_recaption`, `en_unified`, `custom` | `en_unified` |
408 | `--system-prompt` | Custom system prompt. Used when `--use-system-prompt` is `custom` | `None` |
409 | `--bot-task` | Task type. `image` for direct generation; `auto` for text; `recaption` for re-write->image; `think_recaption` for think->re-write->image | `think_recaption` |
410 | `--save` | Image save path | `image.png` |
411 | `--verbose` | Verbose level | `2` |
412 | `--reproduce` | Whether to reproduce the results | `True` |
413 | `--infer-align-image-size` | Whether to align the target image size to the src image size | `True` |
414 | `--max_new_tokens` | Maximum number of new tokens to generate | `2048` |
415 | `--use-taylor-cache` | Use Taylor Cache when sampling | `False` |
416
417 ##### 5️⃣ For fewer Sampling Steps
418
419 We recommend using the model [HunyuanImage-3.0-Instruct-Distil](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil) with `--diff-infer-steps 8`, while keeping all other recommended parameter values **unchanged**.
420
421 ```bash
422 # Download HunyuanImage-3.0-Instruct-Distil from HuggingFace
423 hf download tencent/HunyuanImage-3.0-Instruct-Distil --local-dir ./HunyuanImage-3-Instruct-Distil
424
425 # Run the demo with 8 steps to samples
426 export MODEL_PATH="./HunyuanImage-3-Instruct-Distil"
427 bash run_demo_instruct_Distil.sh
428 ```
429
430 </details>
431
432 ## 🧱 Models Cards
433
434 ## 📊 Evaluation
435
436 ### Evaluation of HunyuanImage-3.0-Instruct
437 * 👥 **GSB (Human Evaluation)**
438 We adopted the GSB (Good/Same/Bad) evaluation method commonly used to assess the relative performance between two models from an overall image perception perspective. In total, we utilized 1,000+ single- and multi-images editing cases, generating an equal number of image samples for all compared models in a single run. For a fair comparison, we conducted inference only once for each prompt, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models. The evaluation was performed by more than 100 professional evaluators.
439
440 <p align="center">
441 <img src="./assets/gsb_instruct.png" width=60% alt="Human Evaluation with Other Models">
442 </p>
443
444
445 ### Evaluation of HunyuanImage-3.0 (Text-to-Image)
446
447 * 🤖 **SSAE (Machine Evaluation)**
448 SSAE (Structured Semantic Alignment Evaluation) is an intelligent evaluation metric for image-text alignment based on advanced multimodal large language models (MLLMs). We extracted 3500 key points across 12 categories, then used multimodal large language models to automatically evaluate and score by comparing the generated images with these key points based on the visual content of the images. Mean Image Accuracy represents the image-wise average score across all key points, while Global Accuracy directly calculates the average score across all key points.
449
450 <p align="center">
451 <img src="./assets/ssae_side_by_side_comparison.png" width=98% alt="Human Evaluation with Other Models">
452 </p>
453
454 <p align="center">
455 <img src="./assets/ssae_side_by_side_heatmap.png" width=98% alt="Human Evaluation with Other Models">
456 </p>
457
458
459 * 👥 **GSB (Human Evaluation)**
460
461 We adopted the GSB (Good/Same/Bad) evaluation method commonly used to assess the relative performance between two models from an overall image perception perspective. In total, we utilized 1,000 text prompts, generating an equal number of image samples for all compared models in a single run. For a fair comparison, we conducted inference only once for each prompt, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models. The evaluation was performed by more than 100 professional evaluators.
462
463 <p align="center">
464 <img src="./assets/gsb.png" width=98% alt="Human Evaluation with Other Models">
465 </p>
466
467 ## 🖼️ Showcase
468
469 Our model can follow complex instructions to generate high‑quality, creative images.
470
471 <div align="center">
472 <img src="./assets/banner_all.jpg" width=100% alt="HunyuanImage 3.0 Demo">
473 </div>
474
475 For text-to-image showcases in HunyuanImage-3.0, click the following links:
476
477 - [HunyuanImage-3.0](./Hunyuan-Image3.md)
478
479 ### Showcases of HunyuanImage-3.0-Instruct
480
481 HunyuanImage-3.0-Instruct demonstrates powerful capabilities in intelligent image generation and editing. The following showcases highlight its core features:
482
483 * 🧠 **Intelligent Visual Understanding and Reasoning (CoT Think)**: The model performs structured thinking to analyze user's input image and prompt, expand user's intent and editing tasks into a stucture, comprehnsive instructions, and leading to a better image generation and editing performance.
484
485 breaking down complex prompts and editing tasks into detailed visual components including subject, composition, lighting, color palette, and style.
486
487 * ✏️ **Prompt Self-Rewrite**: Automatically enhances sparse or vague prompts into professional-grade, detail-rich descriptions that capture the user's intent more accurately.
488
489 * 🎨 **Text-to-Image (T2I)**: Generates high-quality images from text prompts with exceptional prompt adherence and photorealistic quality.
490
491 * 🖼️ **Image-to-Image (TI2I)**: Supports creative image editing, including adding elements, removing objects, modifying styles, and seamless background replacement while preserving key visual elements.
492
493 * 🔀 **Multi-Image Fusion**: Intelligently combines multiple reference images (up to 3 inputs) to create coherent composite images that integrate visual elements from different sources.
494
495
496 **Showcase 1: Detailed Thought and Reasoning Process**
497
498 <div align="center">
499 <img src="./assets/pg_instruct_imgs/cot_ti2i.gif" alt="HunyuanImage-3.0-Instruct Showcase 1" width="90%">
500 </div>
501
502 **Showcase 2: Creative T2I Generation with Complex Scene Understanding**
503
504 > Prompt: 3D 毛绒质感拟人化马,暖棕浅棕肌理,穿藏蓝西装、白衬衫,戴深棕手套;疲惫带期待,坐于电脑前,旁置印 "HAPPY AGAIN" 的马克杯。橙红渐变背景,配超大号藏蓝粗体 "马上下班",叠加米黄 "Happy New Year" 并标 "(2026)"。橙红为主,藏蓝米黄撞色,毛绒温暖柔和。
505
506 <div align="center">
507 <img src="./assets/pg_instruct_imgs/image0.png" alt="HunyuanImage-3.0-Instruct Showcase 2" width="75%">
508 </div>
509
510 **Showcase 3: Precise Image Editing with Element Preservation**
511
512 <div align="center">
513 <img src="./assets/pg_instruct_imgs/image1.png" alt="HunyuanImage-3.0-Instruct Showcase 3" width="85%">
514 </div>
515
516 **Showcase 4: Style Transformation with Thematic Enhancement**
517
518 <div align="center">
519 <img src="./assets/pg_instruct_imgs/image2.png" alt="HunyuanImage-3.0-Instruct Showcase 4" width="85%">
520 </div>
521
522
523 **Showcase 5: Advanced Style Transfer and Product Mockup Generation**
524
525 <div align="center">
526 <img src="./assets/pg_instruct_imgs/image3.png" alt="HunyuanImage-3.0-Instruct Showcase 5" width="85%">
527 </div>
528
529
530 **Showcase 6: Multi-Image Fusion and Creative Composition**
531
532 <div align="center">
533 <img src="./assets/pg_instruct_imgs/image4.png" alt="HunyuanImage-3.0-Instruct Showcase 6" width="85%">
534 </div>
535
536
537 ## 📚 Citation
538
539 If you find HunyuanImage-3.0 useful in your research, please cite our work:
540
541 ```bibtex
542 @article{cao2025hunyuanimage,
543 title={HunyuanImage 3.0 Technical Report},
544 author={Cao, Siyu and Chen, Hangting and Chen, Peng and Cheng, Yiji and Cui, Yutao and Deng, Xinchi and Dong, Ying and Gong, Kipper and Gu, Tianpeng and Gu, Xiusen and others},
545 journal={arXiv preprint arXiv:2509.23951},
546 year={2025}
547 }
548 ```
549
550 ## 🙏 Acknowledgements
551
552 We extend our heartfelt gratitude to the following open-source projects and communities for their invaluable contributions:
553
554 * 🤗 [Transformers](https://github.com/huggingface/transformers) - State-of-the-art NLP library
555 * 🎨 [Diffusers](https://github.com/huggingface/diffusers) - Diffusion models library
556 * 🌐 [HuggingFace](https://huggingface.co/) - AI model hub and community
557 * ⚡ [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention
558 * 🚀 [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine
559
560 ## 🌟🚀 Github Star History
561
562 [![GitHub stars](https://img.shields.io/github/stars/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
563 [![GitHub forks](https://img.shields.io/github/forks/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
564
565
566 [![Star History Chart](https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanImage-3.0&type=Date)](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date)
567