README.md · animagine-xl-3.1

1

---

2

license: openrail++

3

language:

4

- en

5

tags:

6

- text-to-image

7

- stable-diffusion

8

- safetensors

9

- stable-diffusion-xl

10

base_model: cagliostrolab/animagine-xl-3.0

11

widget:

12

- text: >-

13

1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors,

14

night, turtleneck, masterpiece, best quality, very aesthetic, absurdes

15

parameter:

16

negative_prompt: >-

17

nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality,

18

jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest,

19

early, chromatic aberration, signature, extra digits, artistic error,

20

username, scan, [abstract]

21

example_title: 1girl

22

- text: >-

23

1boy, male focus, green hair, sweater, looking at viewer, upper body,

24

beanie, outdoors, night, turtleneck, masterpiece, best quality, very

25

aesthetic, absurdes

26

parameter:

27

negative_prompt: >-

28

nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality,

29

jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest,

30

early, chromatic aberration, signature, extra digits, artistic error,

31

username, scan, [abstract]

32

example_title: 1boy

33

---

34

35

<style>

36

.title-container {

37

display: flex;

38

justify-content: center;

39

align-items: center;

40

height: 100vh;

41

}

42

43

.title {

44

font-size: 2.5em;

45

text-align: center;

46

color: #333;

47

font-family: 'Helvetica Neue', sans-serif;

48

text-transform: uppercase;

49

letter-spacing: 0.1em;

50

padding: 0.5em 0;

51

background: transparent;

52

}

53

54

.title span {

55

background: -webkit-linear-gradient(45deg, #7ed56f, #28b485);

56

-webkit-background-clip: text;

57

-webkit-text-fill-color: transparent;

58

}

59

60

.custom-table {

61

table-layout: fixed;

62

width: 100%;

63

border-collapse: collapse;

64

margin-top: 2em;

65

}

66

67

.custom-table td {

68

/* FIXED: Changed from 50% to 33.33% because there are 3 columns */

69

width: 33.33%;

70

vertical-align: top;

71

padding: 10px;

72

box-shadow: 0px 0px 0px 0px rgba(0, 0, 0, 0.15);

73

}

74

75

.custom-image-container {

76

position: relative;

77

width: 100%;

78

margin-bottom: 1em; /* Added small margin for spacing between stacked images */

79

overflow: hidden;

80

border-radius: 10px;

81

transition: transform .7s;

82

}

83

84

.custom-image-container:hover {

85

transform: scale(1.05);

86

}

87

88

.custom-image {

89

width: 100%;

90

height: auto;

91

object-fit: cover;

92

border-radius: 10px;

93

transition: transform .7s;

94

margin-bottom: 0em;

95

display: block; /* Ensures no extra space below image */

96

}

97

98

.nsfw-filter {

99

filter: blur(8px);

100

transition: filter 0.3s ease;

101

}

102

103

.custom-image-container:hover .nsfw-filter {

104

filter: none;

105

}

106

107

.overlay {

108

position: absolute;

109

bottom: 0;

110

left: 0;

111

right: 0;

112

color: white;

113

width: 100%;

114

height: 40%;

115

display: flex;

116

flex-direction: column;

117

justify-content: center;

118

align-items: center;

119

font-size: 1vw;

120

font-weight: bold; /* Corrected 'font-style: bold' to 'font-weight: bold' */

121

text-align: center;

122

opacity: 0;

123

background: linear-gradient(0deg, rgba(0, 0, 0, 0.8) 60%, rgba(0, 0, 0, 0) 100%);

124

transition: opacity .5s;

125

}

126

127

.custom-image-container:hover .overlay {

128

opacity: 1;

129

}

130

131

.overlay-text {

132

background: linear-gradient(45deg, #7ed56f, #28b485);

133

-webkit-background-clip: text;

134

color: transparent;

135

text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.7);

136

} /* FIXED: Added missing closing brace here */

137

138

.overlay-subtext {

139

font-size: 0.75em;

140

margin-top: 0.5em;

141

font-style: italic;

142

}

143

144

.overlay,

145

.overlay-subtext {

146

text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.5);

147

}

148

</style>

149

150

<h1 class="title">

151

<span>Animagine XL 3.1</span>

152

</h1>

153

154

<table class="custom-table">

155

<tr>

156

<td>

157

<div class="custom-image-container">

158

        <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/yq_5AWegnLsGyCYyqJ-1G.png" alt="sample1">

159

</div>

160

<div class="custom-image-container">

161

        <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/sp6w1elvXVTbckkU74v3o.png" alt="sample4">

162

</div>

163

</td>

164

<td>

165

<div class="custom-image-container">

166

        <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/OYBuX1XzffN7Pxi4c75JV.png" alt="sample2">

167

</div>

168

<div class="custom-image-container">

169

        <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/ytT3Oaf-atbqrnPIqz_dq.png" alt="sample3">

170

</div> </td>

171

<td>

172

<div class="custom-image-container">

173

        <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/0oRq204okFxRGECmrIK6d.png" alt="sample1">

174

</div>

175

<div class="custom-image-container">

176

        <img class="custom-image" src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/DW51m0HlDuAlXwu8H8bIS.png" alt="sample4">

177

</div>

178

</td>

179

</tr>

180

</table>

181

182

**Animagine XL 3.1** is an update in the Animagine XL V3 series, enhancing the previous version, Animagine XL 3.0. This open-source, anime-themed text-to-image model has been improved for generating anime-style images with higher quality. It includes a broader range of characters from well-known anime series, an optimized dataset, and new aesthetic tags for better image creation. Built on Stable Diffusion XL, Animagine XL 3.1 aims to be a valuable resource for anime fans, artists, and content creators by producing accurate and detailed representations of anime characters.

183

184

## Model Details

185

- **Developed by**: [Cagliostro Research Lab](https://huggingface.co/cagliostrolab)

186

- **Model type**: Diffusion-based text-to-image generative model

187

- **Model Description**: Animagine XL 3.1 generates high-quality anime images from textual prompts. It boasts enhanced hand anatomy, improved concept understanding, and advanced prompt interpretation.

188

- **License**: [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)

189

- **Fine-tuned from**: [Animagine XL 3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0)

190

191

## Gradio & Colab Integration

192

193

Try the demo powered by Gradio in Huggingface Spaces: [![Open In Spaces](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)](https://huggingface.co/spaces/cagliostrolab/animagine-xl-3.1)

194

195

Or open the demo in Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/spaces/cagliostrolab/animagine-xl-3.1/blob/main/demo.ipynb)

196

197

## 🧨 Diffusers Installation

198

199

First install the required libraries:

200

201

```bash

202

pip install diffusers transformers accelerate safetensors --upgrade

203

```

204

205

Then run image generation with the following example code:

206

207

```python

208

import torch

209

from diffusers import DiffusionPipeline

210

211

pipe = DiffusionPipeline.from_pretrained(

212

"cagliostrolab/animagine-xl-3.1",

213

torch_dtype=torch.float16,

214

use_safetensors=True,

215

)

216

pipe.to('cuda')

217

218

prompt = "1girl, souryuu asuka langley, neon genesis evangelion, solo, upper body, v, smile, looking at viewer, outdoors, night"

219

negative_prompt = "nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]"

220

221

image = pipe(

222

prompt,

223

negative_prompt=negative_prompt,

224

width=832,

225

height=1216,

226

guidance_scale=7,

227

num_inference_steps=28

228

).images[0]

229

230

image.save("./output/asuka_test.png")

231

```

232

233

## Usage Guidelines

234

235

### Tag Ordering

236

237

For optimal results, it's recommended to follow the structured prompt template because we train the model like this:

238

239

```

240

1girl/1boy, character name, from what series, everything else in any order.

241

```

242

243

## Special Tags

244

245

Animagine XL 3.1 utilizes special tags to steer the result toward quality, rating, creation date and aesthetic. While the model can generate images without these tags, using them can help achieve better results.

246

247

### Quality Modifiers

248

249

Quality tags now consider both scores and post ratings to ensure a balanced quality distribution. We've refined labels for greater clarity, such as changing 'high quality' to 'great quality'.

250

251

| Quality Modifier | Score Criterion |

252

|------------------|-------------------|

253

| `masterpiece` | > 95% |

254

| `best quality` | > 85% & ≤ 95% |

255

| `great quality` | > 75% & ≤ 85% |

256

| `good quality` | > 50% & ≤ 75% |

257

| `normal quality` | > 25% & ≤ 50% |

258

| `low quality` | > 10% & ≤ 25% |

259

| `worst quality` | ≤ 10% |

260

261

### Rating Modifiers

262

263

We've also streamlined our rating tags for simplicity and clarity, aiming to establish global rules that can be applied across different models. For example, the tag 'rating: general' is now simply 'general', and 'rating: sensitive' has been condensed to 'sensitive'.

264

265

| Rating Modifier | Rating Criterion |

266

|-------------------|------------------|

267

| `safe` | General |

268

| `sensitive` | Sensitive |

269

| `nsfw` | Questionable |

270

| `explicit, nsfw` | Explicit |

271

272

### Year Modifier

273

274

We've also redefined the year range to steer results towards specific modern or vintage anime art styles more accurately. This update simplifies the range, focusing on relevance to current and past eras.

275

276

| Year Tag | Year Range |

277

|----------|------------------|

278

| `newest` | 2021 to 2024 |

279

| `recent` | 2018 to 2020 |

280

| `mid` | 2015 to 2017 |

281

| `early` | 2011 to 2014 |

282

| `oldest` | 2005 to 2010 |

283

284

### Aesthetic Tags

285

286

We've enhanced our tagging system with aesthetic tags to refine content categorization based on visual appeal. These tags are derived from evaluations made by a specialized ViT (Vision Transformer) image classification model, specifically trained on anime data. For this purpose, we utilized the model [shadowlilac/aesthetic-shadow-v2](https://huggingface.co/shadowlilac/aesthetic-shadow-v2), which assesses the aesthetic value of content before it undergoes training. This ensures that each piece of content is not only relevant and accurate but also visually appealing.

287

288

| Aesthetic Tag | Score Range |

289

|-------------------|-------------------|

290

| `very aesthetic` | > 0.71 |

291

| `aesthetic` | > 0.45 & < 0.71 |

292

| `displeasing` | > 0.27 & < 0.45 |

293

| `very displeasing`| ≤ 0.27 |

294

295

## Recommended settings

296

297

To guide the model towards generating high-aesthetic images, use negative prompts like:

298

299

```

300

nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]

301

```

302

303

For higher quality outcomes, prepend prompts with:

304

305

```

306

masterpiece, best quality, very aesthetic, absurdres

307

```

308

309

it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler.

310

311

### Multi Aspect Resolution

312

313

This model supports generating images at the following dimensions:

314

315

| Dimensions | Aspect Ratio |

316

|-------------------|-----------------|

317

| `1024 x 1024` | 1:1 Square |

318

| `1152 x 896` | 9:7 |

319

| `896 x 1152` | 7:9 |

320

| `1216 x 832` | 19:13 |

321

| `832 x 1216` | 13:19 |

322

| `1344 x 768` | 7:4 Horizontal |

323

| `768 x 1344` | 4:7 Vertical |

324

| `1536 x 640` | 12:5 Horizontal |

325

| `640 x 1536` | 5:12 Vertical |

326

327

## Training and Hyperparameters

328

329

**Animagine XL 3.1** was trained on 2x A100 80GB GPUs for approximately 15 days, totaling over 350 GPU hours. The training process consisted of three stages:

330

  - **Pretraining**: Utilized a data-rich collection of 870k ordered and tagged images to increase Animagine XL 3.0's model knowledge.

331

  - **Finetuning - First Stage**: Employed labeled and curated aesthetic datasets to refine the broken U-Net after pretraining.

332

  - **Finetuning - Second Stage**: Utilized labeled and curated aesthetic datasets to refine the model's art style and improve hand and anatomy rendering.

333

334

### Hyperparameters

335

336

| Stage                    | Epochs | UNet lr | Train Text Encoder | Batch Size | Noise Offset | Optimizer  | LR Scheduler                  | Grad Acc Steps | GPUs |

337

|--------------------------|--------|---------|--------------------|------------|--------------|------------|-------------------------------|----------------|------|

338

| **Pretraining**    | 10     | 1e-5    | True               | 16         | N/A          | AdamW      | Cosine Annealing Warm Restart | 3              | 2    |

339

| **Finetuning 1st Stage** | 10     | 2e-6    | False              | 48         | 0.0357       | Adafactor  | Constant with Warmup          | 1              | 1    |

340

| **Finetuning 2nd Stage** | 15     | 1e-6    | False              | 48         | 0.0357       | Adafactor  | Constant with Warmup          | 1              | 1    |

341

342

## Model Comparison (Pretraining only)

343

344

### Training Config

345

346

| Configuration Item              | Animagine XL 3.0                         | Animagine XL 3.1                               |

347

|---------------------------------|------------------------------------------|------------------------------------------------|

348

| **GPU**                         | 2 x A100 80G                             | 2 x A100 80G                                   |

349

| **Dataset**                     | 1,271,990                                | 873,504                                        |

350

| **Shuffle Separator**           | True                                     | True                                           |

351

| **Num Epochs**                  | 10                                       | 10                                             |

352

| **Learning Rate**               | 7.5e-6                                   | 1e-5                                           |

353

| **Text Encoder Learning Rate**  | 3.75e-6                                  | 1e-5                                           |

354

| **Effective Batch Size**        | 48 x 1 x 2                               | 16 x 3 x 2                                     |

355

| **Optimizer**                   | Adafactor                                | AdamW                                          |

356

| **Optimizer Args**              | Scale Parameter: False, Relative Step: False, Warmup Init: False | Weight Decay: 0.1, Betas: (0.9, 0.99)   |

357

| **LR Scheduler**                | Constant with Warmup                     | Cosine Annealing Warm Restart                  |

358

| **LR Scheduler Args**           | Warmup Steps: 100                        | Num Cycles: 10, Min LR: 1e-6, LR Decay: 0.9, First Cycle Steps: 9,099 |

359

360

Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook

361

362

### Acknowledgements

363

364

The development and release of Animagine XL 3.1 would not have been possible without the invaluable contributions and support from the following individuals and organizations:

365

366

- **[SeaArt.ai](https://www.seaart.ai/)**: for funding and supporting this project.

367

- **[Shadow Lilac](https://huggingface.co/shadowlilac)**: For providing the aesthetic classification model, [aesthetic-shadow-v2](https://huggingface.co/shadowlilac/aesthetic-shadow-v2).

368

- **[Derrian Distro](https://github.com/derrian-distro)**: For their custom learning rate scheduler, adapted from [LoRA Easy Training Scripts](https://github.com/derrian-distro/LoRA_Easy_Training_Scripts/blob/main/custom_scheduler/LoraEasyCustomOptimizer/CustomOptimizers.py).

369

- **[Kohya SS](https://github.com/kohya-ss)**: For their comprehensive training scripts.

370

- **Cagliostrolab Collaborators**: For their dedication to model training, project management, and data curation.

371

- **Early Testers**: For their valuable feedback and quality assurance efforts.

372

- **NovelAI**: For their innovative approach to aesthetic tagging, which served as an inspiration for our implementation.

373

- **KBlueLeaf**: For providing inspiration in balancing quality tags distribution and managing tags based on [Hakubooru Metainfo](https://github.com/KohakuBlueleaf/HakuBooru/blob/main/hakubooru/metainfo.py)

374

375

Thank you all for your support and expertise in pushing the boundaries of anime-style image generation.

376

377

## Collaborators

378

379

- [Linaqruf](https://huggingface.co/Linaqruf)

380

- [ItsMeBell](https://huggingface.co/ItsMeBell)

381

- [Asahina2K](https://huggingface.co/Asahina2K)

382

- [DamarJati](https://huggingface.co/DamarJati)

383

- [Zwicky18](https://huggingface.co/Zwicky18)

384

- [Scipius2121](https://huggingface.co/Scipius2121)

385

- [Raelina](https://huggingface.co/Raelina)

386

- [Kayfahaarukku](https://huggingface.co/kayfahaarukku)

387

- [Kriz](https://huggingface.co/Kr1SsSzz)

388

389

## Limitations

390

391

While Animagine XL 3.1 represents a significant advancement in anime-style image generation, it is important to acknowledge its limitations:

392

393

1. **Anime-Focused**: This model is specifically designed for generating anime-style images and is not suitable for creating realistic photos.

394

2. **Prompt Complexity**: This model may not be suitable for users who expect high-quality results from short or simple prompts. The training focus was on concept understanding rather than aesthetic refinement, which may require more detailed and specific prompts to achieve the desired output.

395

3. **Prompt Format**: Animagine XL 3.1 is optimized for Danbooru-style tags rather than natural language prompts. For best results, users are encouraged to format their prompts using the appropriate tags and syntax.

396

4. **Anatomy and Hand Rendering**: Despite the improvements made in anatomy and hand rendering, there may still be instances where the model produces suboptimal results in these areas.

397

5. **Dataset Size**: The dataset used for training Animagine XL 3.1 consists of approximately 870,000 images. When combined with the previous iteration's dataset (1.2 million), the total training data amounts to around 2.1 million images. While substantial, this dataset size may still be considered limited in scope for an "ultimate" anime model.

398

6. **NSFW Content**: Animagine XL 3.1 has been designed to generate more balanced NSFW content. However, it is important to note that the model may still produce NSFW results, even if not explicitly prompted.

399

400

By acknowledging these limitations, we aim to provide transparency and set realistic expectations for users of Animagine XL 3.1. Despite these constraints, we believe that the model represents a significant step forward in anime-style image generation and offers a powerful tool for artists, designers, and enthusiasts alike.

401

402

## License

403

404

This model is licensed under the [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).

405

406

To ensure full compatibility with the upstream SDXL ecosystem and standard usage rights, this model adheres strictly to the original SDXL terms, which include:

407

408

- ✅ **Permitted**: Commercial use, modifications, distribution, private use

409

- ❌ **Prohibited**: Illegal activities, harmful content generation, discrimination, exploitation

410

411

*Note: This license supersedes any previous community license tags (e.g., FAIPL) applied to earlier versions of this repository, ensuring full compatibility with the standard SDXL ecosystem.*

412

413

Please refer to the [full license agreement](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md) for complete details.