README.md · LTX-2.3-GGUF

1

---

2

base_model: Lightricks/LTX-2.3

3

language:

4

- en

5

- de

6

- es

7

- fr

8

- ja

9

- ko

10

- zh

11

- it

12

- pt

13

library_name: ggml

14

license: other

15

license_name: ltx-2-community-license-agreement

16

license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE

17

pipeline_tag: image-to-video

18

arxiv: 2601.03233

19

tags:

20

- image-to-video

21

- gguf

22

- unsloth

23

- text-to-video

24

- video-to-video

25

- image-text-to-video

26

- audio-to-video

27

- text-to-audio

28

- video-to-audio

29

- audio-to-audio

30

- text-to-audio-video

31

- image-to-audio-video

32

- image-text-to-audio-video

33

- ltx-2

34

- ltx-2-3

35

- ltx-video

36

- ltxv

37

- lightricks

38

pinned: true

39

demo: https://app.ltx.studio/ltx-2-playground/i2v

40

widget:

41

- text: florist

42

output:

43

url: unsloth_flowers.mp4

44

---

45

46

This is a GGUF quantized version of [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3). <br>

47

unsloth/LTX-2.3-GGUF uses [Unsloth Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs) methodology for SOTA performance.

48

- Important layers are upcasted to higher precision.

49

- Uses tooling from [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) by city96.

50

51

There are two sets of GGUF's published. One for the dev model and one for the distilled. The distilled model is optimized for few step generation, think 4-8 steps. dev on the other hand needs more steps at least 20, but you get better outputs. The distilled variant is useful as a drafting model or a refining model.

52

53

In fact the workflow published below, uses the distilled lora on top of the dev model to refine the intial output.

54

<div>

55

<div style="display: flex; gap: 5px; align-items: center; ">

56

<a href="https://github.com/unslothai/unsloth/">

57

<img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">

58

</a>

59

<a href="https://discord.gg/unsloth">

60

<img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">

61

</a>

62

<a href="https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs">

63

      <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">

64

</a>

65

</div>

66

</div>

67

68

# Workflow

69

70

Download the mp4 in the repo and open it with ComfyUI. The workflow to reproduce the video is embedded in the file.

71

<Gallery />

72

73

To install ComfyUI

74

```

75

python3 -m venv .diffusion

76

source .diffusion/bin/activate

77

git clone https://github.com/Comfy-Org/ComfyUI.git

78

cd ComfyUI

79

pip install -r requirements.txt

80

pip install huggingface_hub

81

cd custom_nodes/

82

git clone https://github.com/city96/ComfyUI-GGUF.git

83

cd ComfyUI-GGUF/

84

pip install -r requirements.txt

85

cd ..

86

git clone https://github.com/kijai/ComfyUI-KJNodes.git

87

cd ComfyUI-KJNodes/

88

pip install -r requirements.txt

89

cd ../../models

90

```

91

92

To download the model files used

93

```

94

ln -s "$(hf download unsloth/LTX-2.3-GGUF ltx-2.3-22b-dev-Q4_K_M.gguf --quiet)" unet/.

95

ln -s "$(hf download unsloth/LTX-2.3-GGUF vae/ltx-2.3-22b-dev_video_vae.safetensors --quiet)" vae/.

96

ln -s "$(hf download unsloth/LTX-2.3-GGUF vae/ltx-2.3-22b-dev_audio_vae.safetensors --quiet)" vae/.

97

ln -s "$(hf download unsloth/LTX-2.3-GGUF text_encoders/ltx-2.3-22b-dev_embeddings_connectors.safetensors --quiet)" text_encoders/.

98

99

ln -s "$(hf download Lightricks/LTX-2.3 ltx-2.3-22b-distilled-lora-384.safetensors --quiet)" loras/.

100

ln -s "$(hf download Lightricks/LTX-2.3 ltx-2.3-spatial-upscaler-x2-1.0.safetensors --quiet)" latent_upscale_models/.

101

ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --quiet)" text_encoders/.

102

ln -s "$(hf download unsloth/gemma-3-12b-it-qat-GGUF mmproj-BF16.gguf --quiet)" text_encoders/.

103

```

104

105

Then launch ComfyUI, make sure you're using an up to date version of ComfyUI and all custom nodes.

106

```

107

cd ..

108

python main.py

109

```

110

111

---

112

# LTX-2.3 Model Card

113

114

This model card focuses on the LTX-2.3 model, which is a significant update to the [LTX-2 model](https://huggingface.co/Lightricks/LTX-2) with improved audio and visual quality as well as enhanced prompt adherence.

115

LTX-2 was presented in the paper [LTX-2: Efficient Joint Audio-Visual Foundation Model](https://huggingface.co/papers/2601.03233).

116

117

💻💻 **If you want to dive in right to the code - it is available [here](https://github.com/Lightricks/LTX-2).** 💾💾

118

119

LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

120

121

[![LTX-2 Open Source](ltx2.3-open.png)](https://youtu.be/o-7us-BR_gQ)

122

123

# Model Checkpoints

124

125

| Name                               | Notes                                                                                                              |

126

|------------------------------------|--------------------------------------------------------------------------------------------------------------------|

127

| ltx-2.3-22b-dev                    | The full model, flexible and trainable in bf16                                                                     |

128

| ltx-2.3-22b-distilled              | The distilled version of the full model, 8 steps, CFG=1                                                            |

129

| ltx-2.3-22b-distilled-lora-384     | A LoRA version of the distilled model applicable to the full model                                                 |

130

| ltx-2.3-spatial-upscaler-x2-1.0    | An x2 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolution   |

131

| ltx-2.3-spatial-upscaler-x1.5-1.0  | An x1.5 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolution |

132

| ltx-2.3-temporal-upscaler-x2-1.0   | An x2 temporal upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher FPS         |

133

134

## Model Details

135

- **Developed by:** Lightricks

136

- **Model type:** Diffusion-based audio-video foundation model

137

- **Language(s):** English

138

139

# Online demo

140

LTX-2.3 is accessible right away via the [API Playground](https://console.ltx.video/playground/).

141

142

# Run locally

143

144

## Direct use license

145

You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the [license](./LICENSE).

146

147

## ComfyUI

148

We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager.

149

For manual installation information, please refer to our [documentation site](https://docs.ltx.video/open-source-model/integration-tools/comfy-ui).

150

151

## PyTorch codebase

152

153

The [LTX-2 codebase](https://github.com/Lightricks/LTX-2) is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'.

154

The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.

155

156

### Installation

157

158

```bash

159

git clone https://github.com/Lightricks/LTX-2.git

160

cd LTX-2

161

162

# From the repository root

163

uv sync

164

source .venv/bin/activate

165

```

166

167

### Inference

168

169

To use our model, please follow the instructions in our [ltx-pipelines](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/README.md) package.

170

171

## Diffusers 🧨

172

173

LTX-2.3 support in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) is coming soon!

174

175

## General tips:

176

* Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.

177

* In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.

178

* For tips on writing effective prompts, please visit our [Prompting guide](https://ltx.video/blog/how-to-prompt-for-ltx-2)

179

180

### Limitations

181

- This model is not intended or able to provide factual information.

182

- As a statistical model this checkpoint might amplify existing societal biases.

183

- The model may fail to generate videos that matches the prompts perfectly.

184

- Prompt following is heavily influenced by the prompting-style.

185

- The model may generate content that is inappropriate or offensive.

186

- When generating audio without speech, the audio may be of lower quality.

187

188

# Train the model

189

190

The base (dev) model is fully trainable.

191

192

It's extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on the [LTX-2 Trainer Readme](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-trainer/README.md).

193

194

Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings.

195

196

## Citation

197

198

```bibtex

199

@article{hacohen2025ltx2,

200

title={LTX-2: Efficient Joint Audio-Visual Foundation Model},

201

  author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman},

202

journal={arXiv preprint arXiv:2601.03233},

203

year={2025}

204

}

205

```