README.md
16.9 KB · 291 lines · markdown Raw
1 ---
2 license: apache-2.0
3 base_model:
4 - Qwen/Qwen3-4B-Base
5 tags:
6 - transformers
7 - sentence-transformers
8 - sentence-similarity
9 - feature-extraction
10 - text-embeddings-inference
11 ---
12 # Qwen3-Embedding-4B
13
14 <p align="center">
15 <img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>
16 <p>
17
18 ## Highlights
19
20 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.
21
22 **Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of June 5, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios.
23
24 **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
25
26 **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
27
28 ## Model Overview
29
30 **Qwen3-Embedding-4B** has the following features:
31
32 - Model Type: Text Embedding
33 - Supported Languages: 100+ Languages
34 - Number of Paramaters: 4B
35 - Context Length: 32k
36 - Embedding Dimension: Up to 2560, supports user-defined output dimensions ranging from 32 to 2560
37
38 For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-embedding/), [GitHub](https://github.com/QwenLM/Qwen3-Embedding).
39
40 ## Qwen3 Embedding Series Model list
41
42 | Model Type | Models | Size | Layers | Sequence Length | Embedding Dimension | MRL Support | Instruction Aware |
43 |------------------|----------------------|------|--------|-----------------|---------------------|-------------|----------------|
44 | Text Embedding | [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) | 0.6B | 28 | 32K | 1024 | Yes | Yes |
45 | Text Embedding | [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) | 4B | 36 | 32K | 2560 | Yes | Yes |
46 | Text Embedding | [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) | 8B | 36 | 32K | 4096 | Yes | Yes |
47 | Text Reranking | [Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) | 0.6B | 28 | 32K | - | - | Yes |
48 | Text Reranking | [Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) | 4B | 36 | 32K | - | - | Yes |
49 | Text Reranking | [Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B) | 8B | 36 | 32K | - | - | Yes |
50
51 > **Note**:
52 > - `MRL Support` indicates whether the embedding model supports custom dimensions for the final embedding.
53 > - `Instruction Aware` notes whether the embedding or reranking model supports customizing the input instruction according to different tasks.
54 > - Our evaluation indicates that, for most downstream tasks, using instructions (instruct) typically yields an improvement of 1% to 5% compared to not using them. Therefore, we recommend that developers create tailored instructions specific to their tasks and scenarios. In multilingual contexts, we also advise users to write their instructions in English, as most instructions utilized during the model training process were originally written in English.
55
56 ## Usage
57
58 With Transformers versions earlier than 4.51.0, you may encounter the following error:
59 ```
60 KeyError: 'qwen3'
61 ```
62
63 ### Sentence Transformers Usage
64
65 ```python
66 # Requires transformers>=4.51.0
67 # Requires sentence-transformers>=2.7.0
68
69 from sentence_transformers import SentenceTransformer
70
71 # Load the model
72 model = SentenceTransformer("Qwen/Qwen3-Embedding-4B")
73
74 # We recommend enabling flash_attention_2 for better acceleration and memory saving,
75 # together with setting `padding_side` to "left":
76 # model = SentenceTransformer(
77 # "Qwen/Qwen3-Embedding-4B",
78 # model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
79 # tokenizer_kwargs={"padding_side": "left"},
80 # )
81
82 # The queries and documents to embed
83 queries = [
84 "What is the capital of China?",
85 "Explain gravity",
86 ]
87 documents = [
88 "The capital of China is Beijing.",
89 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
90 ]
91
92 # Encode the queries and documents. Note that queries benefit from using a prompt
93 # Here we use the prompt called "query" stored under `model.prompts`, but you can
94 # also pass your own prompt via the `prompt` argument
95 query_embeddings = model.encode(queries, prompt_name="query")
96 document_embeddings = model.encode(documents)
97
98 # Compute the (cosine) similarity between the query and document embeddings
99 similarity = model.similarity(query_embeddings, document_embeddings)
100 print(similarity)
101 # tensor([[0.7534, 0.1147],
102 # [0.0320, 0.6258]])
103 ```
104
105 ### Transformers Usage
106
107 ```python
108 # Requires transformers>=4.51.0
109 import torch
110 import torch.nn.functional as F
111
112 from torch import Tensor
113 from transformers import AutoTokenizer, AutoModel
114
115
116 def last_token_pool(last_hidden_states: Tensor,
117 attention_mask: Tensor) -> Tensor:
118 left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
119 if left_padding:
120 return last_hidden_states[:, -1]
121 else:
122 sequence_lengths = attention_mask.sum(dim=1) - 1
123 batch_size = last_hidden_states.shape[0]
124 return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
125
126
127 def get_detailed_instruct(task_description: str, query: str) -> str:
128 return f'Instruct: {task_description}\nQuery:{query}'
129
130 # Each query must come with a one-sentence instruction that describes the task
131 task = 'Given a web search query, retrieve relevant passages that answer the query'
132
133 queries = [
134 get_detailed_instruct(task, 'What is the capital of China?'),
135 get_detailed_instruct(task, 'Explain gravity')
136 ]
137 # No need to add instruction for retrieval documents
138 documents = [
139 "The capital of China is Beijing.",
140 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
141 ]
142 input_texts = queries + documents
143
144 tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-4B', padding_side='left')
145 model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-4B')
146
147 # We recommend enabling flash_attention_2 for better acceleration and memory saving.
148 # model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-4B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()
149
150 max_length = 8192
151
152 # Tokenize the input texts
153 batch_dict = tokenizer(
154 input_texts,
155 padding=True,
156 truncation=True,
157 max_length=max_length,
158 return_tensors="pt",
159 )
160 batch_dict.to(model.device)
161 outputs = model(**batch_dict)
162 embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
163
164 # normalize embeddings
165 embeddings = F.normalize(embeddings, p=2, dim=1)
166 scores = (embeddings[:2] @ embeddings[2:].T)
167 print(scores.tolist())
168 # [[0.7534257769584656, 0.1146894246339798], [0.03198453038930893, 0.6258305311203003]]
169 ```
170
171 ### vLLM Usage
172
173 ```python
174 # Requires vllm>=0.8.5
175 import torch
176 import vllm
177 from vllm import LLM
178
179 def get_detailed_instruct(task_description: str, query: str) -> str:
180 return f'Instruct: {task_description}\nQuery:{query}'
181
182 # Each query must come with a one-sentence instruction that describes the task
183 task = 'Given a web search query, retrieve relevant passages that answer the query'
184
185 queries = [
186 get_detailed_instruct(task, 'What is the capital of China?'),
187 get_detailed_instruct(task, 'Explain gravity')
188 ]
189 # No need to add instruction for retrieval documents
190 documents = [
191 "The capital of China is Beijing.",
192 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
193 ]
194 input_texts = queries + documents
195
196 model = LLM(model="Qwen/Qwen3-Embedding-4B", task="embed")
197
198 outputs = model.embed(input_texts)
199 embeddings = torch.tensor([o.outputs.embedding for o in outputs])
200 scores = (embeddings[:2] @ embeddings[2:].T)
201 print(scores.tolist())
202 # [[0.7525103688240051, 0.1143278032541275], [0.030893627554178238, 0.6239761114120483]]
203 ```
204
205 📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
206
207 ### Text Embeddings Inference (TEI) Usage
208
209 You can either run / deploy TEI on NVIDIA GPUs as:
210
211 ```bash
212 docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-4B --dtype float16
213 ```
214
215 Or on CPU devices as:
216
217 ```bash
218 docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-4B --dtype float16
219 ```
220
221 And then, generate the embeddings sending a HTTP POST request as:
222
223 ```bash
224 curl http://localhost:8080/embed \
225 -X POST \
226 -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
227 -H "Content-Type: application/json"
228 ```
229
230 ## Evaluation
231
232 ### MTEB (Multilingual)
233
234 | Model | Size | Mean (Task) | Mean (Type) | Bitxt Mining | Class. | Clust. | Inst. Retri. | Multi. Class. | Pair. Class. | Rerank | Retri. | STS |
235 |----------------------------------|:-------:|:-------------:|:-------------:|:--------------:|:--------:|:--------:|:--------------:|:---------------:|:--------------:|:--------:|:--------:|:------:|
236 | NV-Embed-v2 | 7B | 56.29 | 49.58 | 57.84 | 57.29 | 40.80 | 1.04 | 18.63 | 78.94 | 63.82 | 56.72 | 71.10|
237 | GritLM-7B | 7B | 60.92 | 53.74 | 70.53 | 61.83 | 49.75 | 3.45 | 22.77 | 79.94 | 63.78 | 58.31 | 73.33|
238 | BGE-M3 | 0.6B | 59.56 | 52.18 | 79.11 | 60.35 | 40.88 | -3.11 | 20.1 | 80.76 | 62.79 | 54.60 | 74.12|
239 | multilingual-e5-large-instruct | 0.6B | 63.22 | 55.08 | 80.13 | 64.94 | 50.75 | -0.40 | 22.91 | 80.86 | 62.61 | 57.12 | 76.81|
240 | gte-Qwen2-1.5B-instruct | 1.5B | 59.45 | 52.69 | 62.51 | 58.32 | 52.05 | 0.74 | 24.02 | 81.58 | 62.58 | 60.78 | 71.61|
241 | gte-Qwen2-7b-Instruct | 7B | 62.51 | 55.93 | 73.92 | 61.55 | 52.77 | 4.94 | 25.48 | 85.13 | 65.55 | 60.08 | 73.98|
242 | text-embedding-3-large | - | 58.93 | 51.41 | 62.17 | 60.27 | 46.89 | -2.68 | 22.03 | 79.17 | 63.89 | 59.27 | 71.68|
243 | Cohere-embed-multilingual-v3.0 | - | 61.12 | 53.23 | 70.50 | 62.95 | 46.89 | -1.89 | 22.74 | 79.88 | 64.07 | 59.16 | 74.80|
244 | gemini-embedding-exp-03-07 | - | 68.37 | 59.59 | 79.28 | 71.82 | 54.59 | 5.18 | **29.16** | 83.63 | 65.58 | 67.71 | 79.40|
245 | **Qwen3-Embedding-0.6B** | 0.6B | 64.33 | 56.00 | 72.22 | 66.83 | 52.33 | 5.09 | 24.59 | 80.83 | 61.41 | 64.64 | 76.17|
246 | **Qwen3-Embedding-4B** | 4B | 69.45 | 60.86 | 79.36 | 72.33 | 57.15 | **11.56** | 26.77 | 85.05 | 65.08 | 69.60 | 80.86|
247 | **Qwen3-Embedding-8B** | 8B | **70.58** | **61.69** | **80.89** | **74.00** | **57.65** | 10.06 | 28.66 | **86.40** | **65.63** | **70.88** | **81.08** |
248
249 > **Note**: For compared models, the scores are retrieved from MTEB online [leaderboard](https://huggingface.co/spaces/mteb/leaderboard) on May 24th, 2025.
250
251 ### MTEB (Eng v2)
252
253 | MTEB English / Models | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retri. | STS | Summ. |
254 |--------------------------------|:--------:|:------------:|:------------:|:--------:|:--------:|:-------------:|:---------:|:--------:|:-------:|:-------:|
255 | multilingual-e5-large-instruct | 0.6B | 65.53 | 61.21 | 75.54 | 49.89 | 86.24 | 48.74 | 53.47 | 84.72 | 29.89 |
256 | NV-Embed-v2 | 7.8B | 69.81 | 65.00 | 87.19 | 47.66 | 88.69 | 49.61 | 62.84 | 83.82 | 35.21 |
257 | GritLM-7B | 7.2B | 67.07 | 63.22 | 81.25 | 50.82 | 87.29 | 49.59 | 54.95 | 83.03 | 35.65 |
258 | gte-Qwen2-1.5B-instruct | 1.5B | 67.20 | 63.26 | 85.84 | 53.54 | 87.52 | 49.25 | 50.25 | 82.51 | 33.94 |
259 | stella_en_1.5B_v5 | 1.5B | 69.43 | 65.32 | 89.38 | 57.06 | 88.02 | 50.19 | 52.42 | 83.27 | 36.91 |
260 | gte-Qwen2-7B-instruct | 7.6B | 70.72 | 65.77 | 88.52 | 58.97 | 85.9 | 50.47 | 58.09 | 82.69 | 35.74 |
261 | gemini-embedding-exp-03-07 | - | 73.3 | 67.67 | 90.05 | **59.39** | **87.7** | 48.59 | 64.35 | 85.29 | **38.28** |
262 | **Qwen3-Embedding-0.6B** | 0.6B | 70.70 | 64.88 | 85.76 | 54.05 | 84.37 | 48.18 | 61.83 | 86.57 | 33.43 |
263 | **Qwen3-Embedding-4B** | 4B | 74.60 | 68.10 | 89.84 | 57.51 | 87.01 | 50.76 | 68.46 | **88.72** | 34.39 |
264 | **Qwen3-Embedding-8B** | 8B | **75.22** | **68.71** | **90.43** | 58.57 | 87.52 | **51.56** | **69.44** | 88.58 | 34.83 |
265
266 ### C-MTEB (MTEB Chinese)
267
268 | C-MTEB | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retr. | STS |
269 |------------------|--------|------------|------------|--------|--------|-------------|---------|-------|-------|
270 | multilingual-e5-large-instruct | 0.6B | 58.08 | 58.24 | 69.80 | 48.23 | 64.52 | 57.45 | 63.65 | 45.81 |
271 | bge-multilingual-gemma2 | 9B | 67.64 |68.52 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 |
272 | gte-Qwen2-1.5B-instruct | 1.5B | 67.12 | 67.79 | 72.53 | 54.61 | 79.5 | 68.21 | 71.86 | 60.05 |
273 | gte-Qwen2-7B-instruct | 7.6B | 71.62 | 72.19 | 75.77 | 66.06 | 81.16 | 69.24 | 75.70 | 65.20 |
274 | ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | **85.98** | **72.86** | 76.97 | **63.92** |
275 | **Qwen3-Embedding-0.6B** | 0.6B | 66.33 | 67.45 | 71.40 | 68.74 | 76.42 | 62.58 | 71.03 | 54.52 |
276 | **Qwen3-Embedding-4B** | 4B | 72.27 | 73.51 | 75.46 | 77.89 | 83.34 | 66.05 | 77.03 | 61.26 |
277 | **Qwen3-Embedding-8B** | 8B | **73.84** | **75.00** | **76.97** | **80.08** | 84.23 | 66.99 | **78.21** | 63.53 |
278
279
280 ## Citation
281
282 If you find our work helpful, feel free to give us a cite.
283
284 ```
285 @article{qwen3embedding,
286 title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
287 author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
288 journal={arXiv preprint arXiv:2506.05176},
289 year={2025}
290 }
291 ```