README.md · LTX-2.3-fp8

1

---

2

language:

3

- en

4

- de

5

- es

6

- fr

7

- ja

8

- ko

9

- zh

10

- it

11

- pt

12

library_name: diffusers

13

license: other

14

license_name: ltx-2-community-license-agreement

15

license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE

16

pipeline_tag: image-to-video

17

arxiv: 2601.03233

18

tags:

19

- image-to-video

20

- text-to-video

21

- video-to-video

22

- image-text-to-video

23

- audio-to-video

24

- text-to-audio

25

- video-to-audio

26

- audio-to-audio

27

- text-to-audio-video

28

- image-to-audio-video

29

- image-text-to-audio-video

30

- ltx-2

31

- ltx-2-3

32

- ltx-video

33

- ltxv

34

- lightricks

35

pinned: true

36

demo: https://console.ltx.video/playground/

37

---

38

39

# LTX-2.3 FP8 Model Card

40

41

**This is the FP8 versions of the LTX-2.3 model. All information below is derived from the base model.**

42

43

This model card focuses on the LTX-2.3 model, which is a significant update to the [LTX-2 model](https://huggingface.co/Lightricks/LTX-2) with improved audio and visual quality as well as enhanced prompt adherence.

44

LTX-2 was presented in the paper [LTX-2: Efficient Joint Audio-Visual Foundation Model](https://huggingface.co/papers/2601.03233).

45

46

💻💻 **If you want to dive in right to the code - it is available [here](https://github.com/Lightricks/LTX-2).** 💾💾

47

48

LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

49

50

[![LTX-2.3 Open Source](https://img.youtube.com/vi/o-7us-BR_gQ/maxresdefault.jpg)](https://youtu.be/o-7us-BR_gQ)

51

52

# Model Checkpoints

53

54

| Name                               | Notes                                                                                                              |

55

|------------------------------------|--------------------------------------------------------------------------------------------------------------------|

56

| ltx-2.3-22b-dev-fp8                    | The full model, flexible and trainable, in fp8                                                                     |

57

| ltx-2.3-22b-distilled-fp8              | The distilled version of the full model, 8 steps, CFG=1, in fp8                                                            |

58

59

## Model Details

60

- **Developed by:** Lightricks

61

- **Model type:** Diffusion-based audio-video foundation model

62

- **Language(s):** English

63

64

# Online demo

65

LTX-2.3 is accessible right away via the [API Playground](https://console.ltx.video/playground/).

66

67

# Run locally

68

69

## Direct use license

70

You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the [license](./LICENSE).

71

72

## ComfyUI

73

We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager.

74

For manual installation information, please refer to our [documentation site](https://docs.ltx.video/open-source-model/integration-tools/comfy-ui).

75

76

## PyTorch codebase

77

78

The [LTX-2 codebase](https://github.com/Lightricks/LTX-2) is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'.

79

The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.

80

81

### Installation

82

83

```bash

84

git clone https://github.com/Lightricks/LTX-2.git

85

cd LTX-2

86

87

# From the repository root

88

uv sync

89

source .venv/bin/activate

90

```

91

92

### Inference

93

94

To use our model, please follow the instructions in our [ltx-pipelines](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/README.md) package.

95

96

## Diffusers 🧨

97

98

LTX-2.3 support in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) is coming soon!

99

100

## General tips:

101

* Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.

102

* In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.

103

* For tips on writing effective prompts, please visit our [Prompting guide](https://ltx.video/blog/how-to-prompt-for-ltx-2)

104

105

### Limitations

106

- This model is not intended or able to provide factual information.

107

- As a statistical model this checkpoint might amplify existing societal biases.

108

- The model may fail to generate videos that matches the prompts perfectly.

109

- Prompt following is heavily influenced by the prompting-style.

110

- The model may generate content that is inappropriate or offensive.

111

- When generating audio without speech, the audio may be of lower quality.

112

113

# Train the model

114

115

Currently it is recommended to train the bf16 model. Recipes for training the fp8 model are welcome as community contributions.

116

117

## Citation

118

119

```bibtex

120

@article{hacohen2025ltx2,

121

title={LTX-2: Efficient Joint Audio-Visual Foundation Model},

122

  author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman},

123

journal={arXiv preprint arXiv:2601.03233},

124

year={2025}

125

}

126

```