README.md
6.4 KB · 134 lines · markdown Raw
1 ---
2 language:
3 - en
4 - de
5 - es
6 - fr
7 - ja
8 - ko
9 - zh
10 - it
11 - pt
12 library_name: diffusers
13 license: other
14 license_name: ltx-2-community-license-agreement
15 license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
16 pipeline_tag: image-to-video
17 arxiv: 2601.03233
18 tags:
19 - image-to-video
20 - text-to-video
21 - video-to-video
22 - image-text-to-video
23 - audio-to-video
24 - text-to-audio
25 - video-to-audio
26 - audio-to-audio
27 - text-to-audio-video
28 - image-to-audio-video
29 - image-text-to-audio-video
30 - ltx-2
31 - ltx-2-3
32 - ltx-video
33 - ltxv
34 - lightricks
35 pinned: true
36 demo: https://app.ltx.studio/ltx-2-playground/i2v
37 ---
38
39 # LTX-2.3 Model Card
40
41 This model card focuses on the LTX-2.3 model, which is a significant update to the [LTX-2 model](https://huggingface.co/Lightricks/LTX-2) with improved audio and visual quality as well as enhanced prompt adherence.
42 LTX-2 was presented in the paper [LTX-2: Efficient Joint Audio-Visual Foundation Model](https://huggingface.co/papers/2601.03233).
43
44 💻💻 **If you want to dive in right to the code - it is available [here](https://github.com/Lightricks/LTX-2).** 💾💾
45
46 LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
47
48 [![LTX-2 Open Source](ltx2.3-open.png)](https://youtu.be/o-7us-BR_gQ)
49
50 # Model Checkpoints
51
52 | Name | Notes |
53 |------------------------------------|--------------------------------------------------------------------------------------------------------------------|
54 | ltx-2.3-22b-dev | The full model, flexible and trainable in bf16 |
55 | ltx-2.3-22b-distilled | The distilled version of the full model, 8 steps, CFG=1 |
56 | ltx-2.3-22b-distilled-1.1 | The distilled v1.1 version of the full model, 8 steps, CFG=1 - A different aesthetic experience and improved audio compared to v1.0 |
57 | ltx-2.3-22b-distilled-lora-384 | A LoRA version of the distilled model applicable to the full model |
58 | ltx-2.3-22b-distilled-lora-384-1.1 | A LoRA version of the v1.1 distilled model applicable to the full model |
59 | ltx-2.3-spatial-upscaler-x2-1.1 | An x2 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolution |
60 | ltx-2.3-spatial-upscaler-x1.5-1.0 | An x1.5 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolution |
61 | ltx-2.3-temporal-upscaler-x2-1.0 | An x2 temporal upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher FPS |
62
63 ## Model Details
64 - **Developed by:** Lightricks
65 - **Model type:** Diffusion-based audio-video foundation model
66 - **Language(s):** English
67
68 # Online demo
69 LTX-2.3 is accessible right away via the [API Playground](https://console.ltx.video/playground/).
70
71 # Run locally
72
73 ## Direct use license
74 You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the [license](./LICENSE).
75
76 ## ComfyUI
77 We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager.
78 For manual installation information, please refer to our [documentation site](https://docs.ltx.video/open-source-model/integration-tools/comfy-ui).
79
80 ## PyTorch codebase
81
82 The [LTX-2 codebase](https://github.com/Lightricks/LTX-2) is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'.
83 The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.
84
85 ### Installation
86
87 ```bash
88 git clone https://github.com/Lightricks/LTX-2.git
89 cd LTX-2
90
91 # From the repository root
92 uv sync
93 source .venv/bin/activate
94 ```
95
96 ### Inference
97
98 To use our model, please follow the instructions in our [ltx-pipelines](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/README.md) package.
99
100 ## Diffusers 🧨
101
102 LTX-2.3 support in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) is coming soon!
103
104 ## General tips:
105 * Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.
106 * In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.
107 * For tips on writing effective prompts, please visit our [Prompting guide](https://ltx.video/blog/how-to-prompt-for-ltx-2)
108
109 ### Limitations
110 - This model is not intended or able to provide factual information.
111 - As a statistical model this checkpoint might amplify existing societal biases.
112 - The model may fail to generate videos that matches the prompts perfectly.
113 - Prompt following is heavily influenced by the prompting-style.
114 - The model may generate content that is inappropriate or offensive.
115 - When generating audio without speech, the audio may be of lower quality.
116
117 # Train the model
118
119 The base (dev) model is fully trainable.
120
121 It's extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on the [LTX-2 Trainer Readme](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-trainer/README.md).
122
123 Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings.
124
125 ## Citation
126
127 ```bibtex
128 @article{hacohen2025ltx2,
129 title={LTX-2: Efficient Joint Audio-Visual Foundation Model},
130 author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman},
131 journal={arXiv preprint arXiv:2601.03233},
132 year={2025}
133 }
134 ```