README.md · LTX-2 | QuantaMrkt

1

---

2

language:

3

- en

4

- de

5

- es

6

- fr

7

- ja

8

- ko

9

- zh

10

- it

11

- pt

12

library_name: diffusers

13

license: other

14

license_name: ltx-2-community-license-agreement

15

license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE

16

pipeline_tag: image-to-video

17

arxiv: 2601.03233

18

tags:

19

- image-to-video

20

- text-to-video

21

- video-to-video

22

- image-text-to-video

23

- audio-to-video

24

- text-to-audio

25

- video-to-audio

26

- audio-to-audio

27

- text-to-audio-video

28

- image-to-audio-video

29

- image-text-to-audio-video

30

- ltx-2

31

- ltx-video

32

- ltxv

33

- lightricks

34

pinned: true

35

demo: https://app.ltx.studio/ltx-2-playground/i2v

36

---

37

38

# LTX-2 Model Card

39

40

This model card focuses on the LTX-2 model, as presented in the paper [LTX-2: Efficient Joint Audio-Visual Foundation Model](https://huggingface.co/papers/2601.03233). The codebase is available [here](https://github.com/Lightricks/LTX-2).

41

42

LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

43

44

[![LTX-2 Open Source](https://img.youtube.com/vi/8fWAJXZJbRA/maxresdefault.jpg)](https://www.youtube.com/watch?v=8fWAJXZJbRA)

45

46

# Model Checkpoints

47

48

| Name                           | Notes                                                                                                          |

49

|--------------------------------|----------------------------------------------------------------------------------------------------------------|

50

| ltx-2-19b-dev                  | The full model, flexible and trainable in bf16                                                                 |

51

| ltx-2-19b-dev-fp8              | The full model in fp8 quantization                                                                             |

52

| ltx-2-19b-dev-fp4              | The full model in nvfp4 quantization                                                                           |

53

| ltx-2-19b-distilled            | The distilled version of the full model, 8 steps, CFG=1                                                        |

54

| ltx-2-19b-distilled-lora-384   | A LoRA version of the distilled model applicable to the full model                                             |

55

| ltx-2-spatial-upscaler-x2-1.0  | An x2 spatial upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher resolution |

56

| ltx-2-temporal-upscaler-x2-1.0 | An x2 temporal upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher FPS       |

57

58

## Model Details

59

- **Developed by:** Lightricks

60

- **Model type:** Diffusion-based audio-video foundation model

61

- **Language(s):** English

62

63

# Online demo

64

LTX-2 is accessible right away via the following links:

65

- [LTX-Studio text-to-video](https://app.ltx.studio/ltx-2-playground/t2v)

66

- [LTX-Studio image-to-video](https://app.ltx.studio/ltx-2-playground/i2v)

67

68

# Run locally

69

70

## Direct use license

71

You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the [license](./LICENSE).

72

73

## ComfyUI

74

We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager.

75

For manual installation information, please refer to our [documentation site](https://docs.ltx.video/open-source-model/integration-tools/comfy-ui).

76

77

## PyTorch codebase

78

79

The [LTX-2 codebase](https://github.com/Lightricks/LTX-2) is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'.

80

The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.

81

82

### Installation

83

84

```bash

85

git clone https://github.com/Lightricks/LTX-2.git

86

cd LTX-2

87

88

# From the repository root

89

uv sync

90

source .venv/bin/activate

91

```

92

93

### Inference

94

95

To use our model, please follow the instructions in our [ltx-pipelines](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/README.md) package.

96

97

## Diffusers 🧨

98

99

LTX-2 is supported in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) for text & image-to-video generation.

100

Read more on LTX-2 with diffusers [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2#diffusers.LTX2Pipeline.__call__.example).

101

102

### Use with diffusers

103

To achieve production quality generation, it's recommended to use the two-stage generation pipeline.

104

Example for 2-stage inference of text-to-video:

105

```python

106

import torch

107

from diffusers import FlowMatchEulerDiscreteScheduler

108

from diffusers.pipelines.ltx2 import LTX2Pipeline, LTX2LatentUpsamplePipeline

109

from diffusers.pipelines.ltx2.latent_upsampler import LTX2LatentUpsamplerModel

110

from diffusers.pipelines.ltx2.utils import STAGE_2_DISTILLED_SIGMA_VALUES

111

from diffusers.pipelines.ltx2.export_utils import encode_video

112

113

device = "cuda:0"

114

width = 768

115

height = 512

116

117

pipe = LTX2Pipeline.from_pretrained(

118

"Lightricks/LTX-2", torch_dtype=torch.bfloat16

119

)

120

pipe.enable_sequential_cpu_offload(device=device)

121

122

prompt = "A beautiful sunset over the ocean"

123

negative_prompt = "shaky, glitchy, low quality, worst quality, deformed, distorted, disfigured, motion smear, motion artifacts, fused fingers, bad anatomy, weird hand, ugly, transition, static."

124

125

# Stage 1 default (non-distilled) inference

126

frame_rate = 24.0

127

video_latent, audio_latent = pipe(

128

prompt=prompt,

129

negative_prompt=negative_prompt,

130

width=width,

131

height=height,

132

num_frames=121,

133

frame_rate=frame_rate,

134

num_inference_steps=40,

135

sigmas=None,

136

guidance_scale=4.0,

137

output_type="latent",

138

return_dict=False,

139

)

140

141

latent_upsampler = LTX2LatentUpsamplerModel.from_pretrained(

142

"Lightricks/LTX-2",

143

subfolder="latent_upsampler",

144

torch_dtype=torch.bfloat16,

145

)

146

upsample_pipe = LTX2LatentUpsamplePipeline(vae=pipe.vae, latent_upsampler=latent_upsampler)

147

upsample_pipe.enable_model_cpu_offload(device=device)

148

upscaled_video_latent = upsample_pipe(

149

latents=video_latent,

150

output_type="latent",

151

return_dict=False,

152

)[0]

153

154

# Load Stage 2 distilled LoRA

155

pipe.load_lora_weights(

156

"Lightricks/LTX-2", adapter_name="stage_2_distilled", weight_name="ltx-2-19b-distilled-lora-384.safetensors"

157

)

158

pipe.set_adapters("stage_2_distilled", 1.0)

159

# VAE tiling is usually necessary to avoid OOM error when VAE decoding

160

pipe.vae.enable_tiling()

161

# Change scheduler to use Stage 2 distilled sigmas as is

162

new_scheduler = FlowMatchEulerDiscreteScheduler.from_config(

163

pipe.scheduler.config, use_dynamic_shifting=False, shift_terminal=None

164

)

165

pipe.scheduler = new_scheduler

166

# Stage 2 inference with distilled LoRA and sigmas

167

video, audio = pipe(

168

latents=upscaled_video_latent,

169

audio_latents=audio_latent,

170

prompt=prompt,

171

negative_prompt=negative_prompt,

172

num_inference_steps=3,

173

    noise_scale=STAGE_2_DISTILLED_SIGMA_VALUES[0], # renoise with first sigma value https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/src/ltx_pipelines/ti2vid_two_stages.py#L218

174

sigmas=STAGE_2_DISTILLED_SIGMA_VALUES,

175

guidance_scale=1.0,

176

output_type="np",

177

return_dict=False,

178

)

179

180

encode_video(

181

video[0],

182

fps=frame_rate,

183

audio=audio[0].float().cpu(),

184

audio_sample_rate=pipe.vocoder.config.output_sampling_rate,

185

output_path="ltx2_lora_distilled_sample.mp4",

186

)

187

```

188

For more inference examples, including generation with the distilled checkpoint, visit [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2#diffusers.LTX2Pipeline.__call__.example).

189

190

## General tips:

191

* Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.

192

* In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.

193

* For tips on writing effective prompts, please visit our [Prompting guide](https://ltx.video/blog/how-to-prompt-for-ltx-2)

194

195

### Limitations

196

- This model is not intended or able to provide factual information.

197

- As a statistical model this checkpoint might amplify existing societal biases.

198

- The model may fail to generate videos that matches the prompts perfectly.

199

- Prompt following is heavily influenced by the prompting-style.

200

- The model may generate content that is inappropriate or offensive.

201

- When generating audio without speech, the audio may be of lower quality.

202

203

# Train the model

204

205

The base (dev) model is fully trainable.

206

207

It's extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on the [LTX-2 Trainer Readme](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-trainer/README.md).

208

209

Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings.

210

211

## Citation

212

213

```bibtex

214

@article{hacohen2025ltx2,

215

title={LTX-2: Efficient Joint Audio-Visual Foundation Model},

216

  author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman},

217

journal={arXiv preprint arXiv:2601.03233},

218

year={2025}

219

}

220

```