README.md
15.1 KB · 311 lines · markdown Raw
1 ---
2 language:
3 - en
4 tags:
5 - text-to-image
6 - stable-diffusion
7 - safetensors
8 - stable-diffusion-xl
9 widget:
10 - text: >-
11 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors,
12 night, turtleneck, masterpiece, high score, great score, absurdres
13 parameter:
14 negative_prompt: >-
15 lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit,
16 fewer digits, cropped, worst quality, low quality, low score, bad score,
17 average score, signature, watermark, username, blurry
18 example_title: 1girl
19 - text: >-
20 1boy, male focus, green hair, sweater, looking at viewer, upper body,
21 beanie, outdoors, night, turtleneck, masterpiece, high score, great score,
22 absurdres
23 parameter:
24 negative_prompt: >-
25 lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit,
26 fewer digits, cropped, worst quality, low quality, low score, bad score,
27 average score, signature, watermark, username, blurry
28 example_title: 1boy
29 license: openrail++
30 base_model:
31 - stabilityai/stable-diffusion-xl-base-1.0
32 ---
33
34 # Animagine XL 4.0
35
36 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/_tsxjwf3VPu94xh9wJSbo.png)
37
38 ## Overview
39
40 **Animagine XL 4.0**, also stylized as **Anim4gine**, is the ultimate anime-themed finetuned SDXL model and the latest installment of [Animagine XL series](https://huggingface.co/collections/Linaqruf/animagine-xl-669888c0add5adaf09754aca). Despite being a continuation, the model was retrained from [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) with a massive dataset of 8.4M diverse anime-style images from various sources with the knowledge cut-off of January 7th 2025 and finetuned for approximately 2650 GPU hours. Similar to the previous version, this model was trained using tag ordering method for the identity and style training.
41 With the release of **Animagine XL 4.0 Opt (Optimized)**, the model has been further refined with an additional dataset, improving **stability**, **anatomy accuracy**, **noise reduction**, **color saturation**, and **overall color accuracy**. These enhancements make **Animagine XL 4.0 Opt** more consistent and visually appealing while maintaining the signature quality of the series.
42
43
44 ## Changelog
45 - 2025-02-13 – Added Animagine XL 4.0 Opt
46 - Better stability for more consistent outputs
47 - Enhanced anatomy with more accurate proportions
48 - Reduced noise and artifacts in generations
49 - Fixed low saturation issues, resulting in richer colors
50 - Improved color accuracy for more visually appealing results
51 - 2025-01-24 – Initial release
52
53
54 ## Model Details
55
56 - **Developed by**: [Cagliostro Research Lab](https://github.com/cagliostrolab)
57 - **Model type**: Diffusion-based text-to-image generative model
58 - **License**: [CreativeML Open RAIL++-M](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
59 - **Model Description**: This is a model that can be used to generate and modify specifically anime-themed images based on text prompt
60 - **Fine-tuned from**: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
61
62 ## Downstream Use
63
64 1. Use this model in our [`Hugging Face Spaces`](https://huggingface.co/spaces/cagliostrolab/animagine-xl-4.0)
65 2. Use it in [`ComfyUI`](https://github.com/comfyanonymous/ComfyUI) or [`Stable Diffusion Webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
66 3. Use it with 🧨 `diffusers`
67
68 ## 🧨 Diffusers Installation
69
70 ### 1. Install Required Libraries
71
72 ```bash
73 pip install diffusers transformers accelerate safetensors --upgrade
74 ```
75
76 ### 2. Example Code
77 The example below uses `lpw_stable_diffusion_xl` pipeline which enables better handling of long, weighted and detailed prompts. The model is already uploaded in FP16 format, so there's no need to specify `variant="fp16"` in the `from_pretrained` call.
78
79 ```python
80 import torch
81 from diffusers import StableDiffusionXLPipeline
82
83 pipe = StableDiffusionXLPipeline.from_pretrained(
84 "cagliostrolab/animagine-xl-4.0",
85 torch_dtype=torch.float16,
86 use_safetensors=True,
87 custom_pipeline="lpw_stable_diffusion_xl",
88 add_watermarker=False
89 )
90 pipe.to('cuda')
91
92 prompt = "1girl, arima kana, oshi no ko, hoshimachi suisei, hoshimachi suisei \(1st costume\), cosplay, looking at viewer, smile, outdoors, night, v, masterpiece, high score, great score, absurdres"
93 negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry"
94
95 image = pipe(
96 prompt,
97 negative_prompt=negative_prompt,
98 width=832,
99 height=1216,
100 guidance_scale=5,
101 num_inference_steps=28
102 ).images[0]
103
104 image.save("./arima_kana.png")
105 ```
106
107 ## Usage Guidelines
108
109 The summary can be seen in the image for the prompt guideline.
110
111 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c088660a4d02f37a965f6c/YPe3MCnQAHM7nCZ1vQ7vI.png)
112
113 ### 1. Prompt Structure
114 The model was trained with tag-based captions and the tag-ordering method. Use this structured template:
115
116 ```
117 1girl/1boy/1other, character name, from which series, rating, everything else in any order and end with quality enhancement
118 ```
119
120 ### 2. Quality Enhancement Tags
121 Add these tags at the end of your prompt:
122
123 ```
124 masterpiece, high score, great score, absurdres
125 ```
126
127 ### 3. Recommended Negative Prompt
128 ```
129 lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry
130 ```
131
132 ### 4. Optimal Settings
133 - **CFG Scale**: 4-7 (5 Recommended)
134 - **Sampling Steps**: 25-28 (28 Recommended)
135 - **Preferred Sampler**: Euler Ancestral (Euler a)
136
137 ### 5. Recommended Resolutions
138
139 | Orientation | Dimensions | Aspect Ratio |
140 |------------|------------|--------------|
141 | Square | 1024 x 1024| 1:1 |
142 | Landscape | 1152 x 896 | 9:7 |
143 | | 1216 x 832 | 3:2 |
144 | | 1344 x 768 | 7:4 |
145 | | 1536 x 640 | 12:5 |
146 | Portrait | 896 x 1152 | 7:9 |
147 | | 832 x 1216 | 2:3 |
148 | | 768 x 1344 | 4:7 |
149 | | 640 x 1536 | 5:12 |
150
151 ### 6. Final Prompt Structure Example
152 ```
153 1girl, firefly \(honkai: star rail\), honkai \(series\), honkai: star rail, safe, casual, solo, looking at viewer, outdoors, smile, reaching towards viewer, night, masterpiece, high score, great score, absurdres
154 ```
155
156 ## Special Tags
157
158 The model supports various special tags that can be used to control different aspects of the image generation process. These tags are carefully weighted and tested to provide consistent results across different prompts.
159
160 ### Quality Tags
161 Quality tags are fundamental controls that directly influence the overall image quality and detail level. Available quality tags:
162 - `masterpiece`
163 - `best quality`
164 - `low quality`
165 - `worst quality`
166
167 | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/bDdKraYxjiReKknlYJepR.png" width="100%" style="max-height: 400px; object-fit: contain;"> | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/mAgMMKL2tBj8oBuWHTYUz.png" width="100%" style="max-height: 400px; object-fit: contain;"> |
168 |---|---|
169 | Sample image using `"masterpiece, best quality"` quality tags with negative prompt left empty. | Sample image using `"low quality, worst quality"` quality tags with negative prompt left empty. |
170
171 ### Score Tags
172 Score tags provide a more nuanced control over image quality compared to basic quality tags. They have a stronger impact on steering output quality in this model. Available score tags:
173 - `high score`
174 - `great score`
175 - `good score`
176 - `average score`
177 - `bad score`
178 - `low score`
179
180 | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/PXK6D1yhD8SND-VHFQOXD.png" width="100%" style="max-height: 400px; object-fit: contain;"> | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/0uUw7DQ9IMiSNE_MZ9Uyf.png" width="100%" style="max-height: 400px; object-fit: contain;"> |
181 |---|---|
182 | Sample image using `"high score, great score"` score tags with negative prompt left empty. | Sample image using `"bad score, low score"` score tags with negative prompt left empty. |
183
184 ### Temporal Tags
185 Temporal tags allow you to influence the artistic style based on specific time periods or years. This can be useful for generating images with era-specific artistic characteristics. Supported year tags:
186 - `year 2005`
187 - `year {n}`
188 - `year 2025`
189
190 | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/weRv0BmfkZrBhcW5NxXAI.png" width="100%" style="max-height: 400px; object-fit: contain;"> | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/WwFoeLrbN2MkXuGHh91Ky.png" width="100%" style="max-height: 400px; object-fit: contain;"> |
191 |---|---|
192 | Sample image of Hatsune Miku with `"year 2007"` temporal tag. | Sample image of Hatsune Miku with `"year 2023"` temporal tag. |
193
194 ### Rating Tags
195 Rating tags help control the content safety level of generated images. These tags should be used responsibly and in accordance with applicable laws and platform policies. Supported ratings:
196 - `safe`
197 - `sensitive`
198 - `nsfw`
199 - `explicit`
200
201 ## Training Information
202
203 The model was trained using state-of-the-art hardware and optimized hyperparameters to ensure the highest quality output. Below are the detailed technical specifications and parameters used during the training process:
204
205 | Parameter | Value |
206 |-----------|--------|
207 | Hardware | 7 x H100 80GB SXM5 |
208 | Num Images | 8,401,464 |
209 | UNet Learning Rate | 2.5e-6 |
210 | Text Encoder Learning Rate | 1.25e-6 |
211 | Scheduler | Constant With Warmup |
212 | Warmup Steps | 5% |
213 | Batch Size | 32 |
214 | Gradient Accumulation Steps | 2 |
215 | Training Resolution | 1024x1024 |
216 | Optimizer | Adafactor |
217 | Input Perturbation Noise | 0.1 |
218 | Debiased Estimation Loss | Enabled |
219 | Mixed Precision | fp16 |
220
221 ## Acknowledgement
222
223 This long-term project would not have been possible without the groundbreaking work, innovative contributions, and comprehensive documentation provided by **Stability AI**, **Novel AI**, and **Waifu Diffusion Team**. We are especially grateful for the kickstarter grant from **Main** that enabled us to progress beyond V2. For this iteration, we would like to express our sincere gratitude to everyone in the community for their continuous support, particularly:
224
225 1. [**Moescape AI**](https://moescape.ai/): Our invaluable collaboration partner in model distribution and testing
226 2. **Lesser Rabbit**: For providing essential computing and research grants
227 3. [**Kohya SS**](https://github.com/kohya-ss): For developing the comprehensive open-source training framework
228 4. [**discus0434**](https://github.com/discus0434): For creating the industry-leading open-source Aesthetic Predictor 2.5
229 5. **Early testers**: For their dedication in providing critical feedback and thorough quality assurance
230
231 ## Contributors
232
233 We extend our heartfelt appreciation to our dedicated team members who have contributed significantly to this project, including but not limited to:
234
235 ### Model
236 - [**KayfaHaarukku**](https://huggingface.co/kayfahaarukku)
237 - [**Raelina**](https://huggingface.co/Raelina)
238 - [**Linaqruf**](https://huggingface.co/Linaqruf)
239
240 ### Gradio
241 - [**Damar Jati**](https://huggingface.co/DamarJati)
242
243 ### Relations, finance, and quality assurance
244 - [**Scipius**](https://huggingface.co/Scipius2121)
245 - [**Asahina**](https://huggingface.co/Asahina2K)
246 - [**Bell**](https://huggingface.co/ItsMeBell)
247 - [**BoboiAzumi**](https://huggingface.co/Boboiazumi)
248
249 ### Data
250 - [**Pomegranata**](https://huggingface.co/paripi)
251 - [**Kr1SsSzz**](https://huggingface.co/Kr1SsSzz)
252 - [**Fiqi**](https://huggingface.co/saikanov)
253 - [**William Adams Soeherman**](https://huggingface.co/williamsoeherman)
254
255 ## Fundraising Has New Methods!
256
257 We're excited to introduce new fundraising methods through GitHub Sponsors to support training, research, and model development. Your support helps us push the boundaries of what's possible with AI.
258
259 **You can help us with:**
260
261 * **Donate**: Contribute via ETH, USDT, or USDC to the address below, or sponsor us on GitHub.
262
263 * **Share**: Spread the word about our models and share your creations!
264
265 * **Feedback**: Let us know how we can improve.
266
267 **Donation Address**:
268
269 ETH/USDT/USDC(e): ```0xd8A1dA94BA7E6feCe8CfEacc1327f498fCcBFC0C```
270
271 **Github Sponsor**: [https://github.com/sponsors/cagliostrolab/](https://github.com/sponsors/cagliostrolab/)
272
273
274 <details>
275 <summary>Why do we use Cryptocurrency?</summary>
276 When we initially opened fundraising through Ko-fi and using PayPal as withdrawal methods, our PayPal account was flagged and eventually banned, despite our efforts to explain the purpose of our project. Unfortunately, this forced us to refund all donations and left us without a reliable way to receive support. To avoid such issues and ensure transparency, we have now switched to cryptocurrency as the way to raise the fund.
277 </details>
278
279 <details>
280 <summary>Want to Donate in Non-Crypto Currency?</summary>
281 Although we had a bad experience with Paypal, and you’d like to support us but prefer not to use cryptocurrency, feel free to contact us via [Discord Server](https://discord.gg/cqh9tZgbGc) for alternative donation methods.
282 </details>
283
284 ## Join Our Discord Server
285 Feel free to join our discord server
286 <div style="text-align: center;">
287 <a href="https://discord.gg/cqh9tZgbGc">
288 <img src="https://discord.com/api/guilds/1115542847395987519/widget.png?style=banner2" alt="Discord Banner 2"/>
289 </a>
290 </div>
291
292 ## Limitations
293
294 - **Prompt Format**: Limited to tag-based text prompts; natural language input may not be effective
295 - **Anatomy**: May struggle with complex anatomical details, particularly hand poses and finger counting
296 - **Text Generation**: Text rendering in images is currently not supported and not recommended
297 - **New Characters**: Recent characters may have lower accuracy due to limited training data availability
298 - **Multiple Characters**: Scenes with multiple characters may require careful prompt engineering
299 - **Resolution**: Higher resolutions (e.g., 1536x1536) may show degradation as training used original SDXL resolution
300 - **Style Consistency**: May require specific style tags as training focused more on identity preservation than style consistency
301
302 ## License
303
304 This model adopts the original [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md) from Stability AI without any modifications or additional restrictions. The license terms remain exactly as specified in the original SDXL license, which includes:
305
306 - ✅ **Permitted**: Commercial use, modifications, distributions, private use
307 - ❌ **Prohibited**: Illegal activities, harmful content generation, discrimination, exploitation
308 - ⚠️ **Requirements**: Include license copy, state changes, preserve notices
309 - 📝 **Warranty**: Provided "AS IS" without warranties
310
311 Please refer to the [original SDXL license](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md) for the complete and authoritative terms and conditions.