README.md
16.8 KB · 292 lines · markdown Raw
1 ---
2 license: apache-2.0
3 base_model:
4 - Qwen/Qwen3-0.6B-Base
5 tags:
6 - transformers
7 - sentence-transformers
8 - sentence-similarity
9 - feature-extraction
10 - text-embeddings-inference
11 ---
12 # Qwen3-Embedding-0.6B
13
14 <p align="center">
15 <img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>
16 <p>
17
18 ## Highlights
19
20 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.
21
22 **Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of June 5, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios.
23
24 **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
25
26 **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
27
28 ## Model Overview
29
30 **Qwen3-Embedding-0.6B** has the following features:
31
32 - Model Type: Text Embedding
33 - Supported Languages: 100+ Languages
34 - Number of Parameters: 0.6B
35 - Context Length: 32k
36 - Embedding Dimension: Up to 1024, supports user-defined output dimensions ranging from 32 to 1024
37
38 For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-embedding/), [GitHub](https://github.com/QwenLM/Qwen3-Embedding).
39
40 ## Qwen3 Embedding Series Model list
41
42 | Model Type | Models | Size | Layers | Sequence Length | Embedding Dimension | MRL Support | Instruction Aware |
43 |------------------|----------------------|------|--------|-----------------|---------------------|-------------|----------------|
44 | Text Embedding | [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) | 0.6B | 28 | 32K | 1024 | Yes | Yes |
45 | Text Embedding | [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) | 4B | 36 | 32K | 2560 | Yes | Yes |
46 | Text Embedding | [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) | 8B | 36 | 32K | 4096 | Yes | Yes |
47 | Text Reranking | [Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) | 0.6B | 28 | 32K | - | - | Yes |
48 | Text Reranking | [Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) | 4B | 36 | 32K | - | - | Yes |
49 | Text Reranking | [Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B) | 8B | 36 | 32K | - | - | Yes |
50
51 > **Note**:
52 > - `MRL Support` indicates whether the embedding model supports custom dimensions for the final embedding.
53 > - `Instruction Aware` notes whether the embedding or reranking model supports customizing the input instruction according to different tasks.
54 > - Our evaluation indicates that, for most downstream tasks, using instructions (instruct) typically yields an improvement of 1% to 5% compared to not using them. Therefore, we recommend that developers create tailored instructions specific to their tasks and scenarios. In multilingual contexts, we also advise users to write their instructions in English, as most instructions utilized during the model training process were originally written in English.
55
56 ## Usage
57
58 With Transformers versions earlier than 4.51.0, you may encounter the following error:
59 ```
60 KeyError: 'qwen3'
61 ```
62
63 ### Sentence Transformers Usage
64
65 ```python
66 # Requires transformers>=4.51.0
67 # Requires sentence-transformers>=2.7.0
68
69 from sentence_transformers import SentenceTransformer
70
71 # Load the model
72 model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")
73
74 # We recommend enabling flash_attention_2 for better acceleration and memory saving,
75 # together with setting `padding_side` to "left":
76 # model = SentenceTransformer(
77 # "Qwen/Qwen3-Embedding-0.6B",
78 # model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
79 # tokenizer_kwargs={"padding_side": "left"},
80 # )
81
82 # The queries and documents to embed
83 queries = [
84 "What is the capital of China?",
85 "Explain gravity",
86 ]
87 documents = [
88 "The capital of China is Beijing.",
89 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
90 ]
91
92 # Encode the queries and documents. Note that queries benefit from using a prompt
93 # Here we use the prompt called "query" stored under `model.prompts`, but you can
94 # also pass your own prompt via the `prompt` argument
95 query_embeddings = model.encode(queries, prompt_name="query")
96 document_embeddings = model.encode(documents)
97
98 # Compute the (cosine) similarity between the query and document embeddings
99 similarity = model.similarity(query_embeddings, document_embeddings)
100 print(similarity)
101 # tensor([[0.7646, 0.1414],
102 # [0.1355, 0.6000]])
103 ```
104
105 ### Transformers Usage
106
107 ```python
108 # Requires transformers>=4.51.0
109
110 import torch
111 import torch.nn.functional as F
112
113 from torch import Tensor
114 from transformers import AutoTokenizer, AutoModel
115
116
117 def last_token_pool(last_hidden_states: Tensor,
118 attention_mask: Tensor) -> Tensor:
119 left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
120 if left_padding:
121 return last_hidden_states[:, -1]
122 else:
123 sequence_lengths = attention_mask.sum(dim=1) - 1
124 batch_size = last_hidden_states.shape[0]
125 return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
126
127
128 def get_detailed_instruct(task_description: str, query: str) -> str:
129 return f'Instruct: {task_description}\nQuery:{query}'
130
131 # Each query must come with a one-sentence instruction that describes the task
132 task = 'Given a web search query, retrieve relevant passages that answer the query'
133
134 queries = [
135 get_detailed_instruct(task, 'What is the capital of China?'),
136 get_detailed_instruct(task, 'Explain gravity')
137 ]
138 # No need to add instruction for retrieval documents
139 documents = [
140 "The capital of China is Beijing.",
141 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
142 ]
143 input_texts = queries + documents
144
145 tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')
146 model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')
147
148 # We recommend enabling flash_attention_2 for better acceleration and memory saving.
149 # model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()
150
151 max_length = 8192
152
153 # Tokenize the input texts
154 batch_dict = tokenizer(
155 input_texts,
156 padding=True,
157 truncation=True,
158 max_length=max_length,
159 return_tensors="pt",
160 )
161 batch_dict.to(model.device)
162 outputs = model(**batch_dict)
163 embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
164
165 # normalize embeddings
166 embeddings = F.normalize(embeddings, p=2, dim=1)
167 scores = (embeddings[:2] @ embeddings[2:].T)
168 print(scores.tolist())
169 # [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]
170 ```
171
172 ### vLLM Usage
173
174 ```python
175 # Requires vllm>=0.8.5
176 import torch
177 import vllm
178 from vllm import LLM
179
180 def get_detailed_instruct(task_description: str, query: str) -> str:
181 return f'Instruct: {task_description}\nQuery:{query}'
182
183 # Each query must come with a one-sentence instruction that describes the task
184 task = 'Given a web search query, retrieve relevant passages that answer the query'
185
186 queries = [
187 get_detailed_instruct(task, 'What is the capital of China?'),
188 get_detailed_instruct(task, 'Explain gravity')
189 ]
190 # No need to add instruction for retrieval documents
191 documents = [
192 "The capital of China is Beijing.",
193 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
194 ]
195 input_texts = queries + documents
196
197 model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")
198
199 outputs = model.embed(input_texts)
200 embeddings = torch.tensor([o.outputs.embedding for o in outputs])
201 scores = (embeddings[:2] @ embeddings[2:].T)
202 print(scores.tolist())
203 # [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]]
204 ```
205
206 📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
207
208 ### Text Embeddings Inference (TEI) Usage
209
210 You can either run / deploy TEI on NVIDIA GPUs as:
211
212 ```bash
213 docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B --dtype float16
214 ```
215
216 Or on CPU devices as:
217
218 ```bash
219 docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B
220 ```
221
222 And then, generate the embeddings sending a HTTP POST request as:
223
224 ```bash
225 curl http://localhost:8080/embed \
226 -X POST \
227 -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
228 -H "Content-Type: application/json"
229 ```
230
231 ## Evaluation
232
233 ### MTEB (Multilingual)
234
235 | Model | Size | Mean (Task) | Mean (Type) | Bitxt Mining | Class. | Clust. | Inst. Retri. | Multi. Class. | Pair. Class. | Rerank | Retri. | STS |
236 |----------------------------------|:-------:|:-------------:|:-------------:|:--------------:|:--------:|:--------:|:--------------:|:---------------:|:--------------:|:--------:|:--------:|:------:|
237 | NV-Embed-v2 | 7B | 56.29 | 49.58 | 57.84 | 57.29 | 40.80 | 1.04 | 18.63 | 78.94 | 63.82 | 56.72 | 71.10|
238 | GritLM-7B | 7B | 60.92 | 53.74 | 70.53 | 61.83 | 49.75 | 3.45 | 22.77 | 79.94 | 63.78 | 58.31 | 73.33|
239 | BGE-M3 | 0.6B | 59.56 | 52.18 | 79.11 | 60.35 | 40.88 | -3.11 | 20.1 | 80.76 | 62.79 | 54.60 | 74.12|
240 | multilingual-e5-large-instruct | 0.6B | 63.22 | 55.08 | 80.13 | 64.94 | 50.75 | -0.40 | 22.91 | 80.86 | 62.61 | 57.12 | 76.81|
241 | gte-Qwen2-1.5B-instruct | 1.5B | 59.45 | 52.69 | 62.51 | 58.32 | 52.05 | 0.74 | 24.02 | 81.58 | 62.58 | 60.78 | 71.61|
242 | gte-Qwen2-7b-Instruct | 7B | 62.51 | 55.93 | 73.92 | 61.55 | 52.77 | 4.94 | 25.48 | 85.13 | 65.55 | 60.08 | 73.98|
243 | text-embedding-3-large | - | 58.93 | 51.41 | 62.17 | 60.27 | 46.89 | -2.68 | 22.03 | 79.17 | 63.89 | 59.27 | 71.68|
244 | Cohere-embed-multilingual-v3.0 | - | 61.12 | 53.23 | 70.50 | 62.95 | 46.89 | -1.89 | 22.74 | 79.88 | 64.07 | 59.16 | 74.80|
245 | Gemini Embedding | - | 68.37 | 59.59 | 79.28 | 71.82 | 54.59 | 5.18 | **29.16** | 83.63 | 65.58 | 67.71 | 79.40|
246 | **Qwen3-Embedding-0.6B** | 0.6B | 64.33 | 56.00 | 72.22 | 66.83 | 52.33 | 5.09 | 24.59 | 80.83 | 61.41 | 64.64 | 76.17|
247 | **Qwen3-Embedding-4B** | 4B | 69.45 | 60.86 | 79.36 | 72.33 | 57.15 | **11.56** | 26.77 | 85.05 | 65.08 | 69.60 | 80.86|
248 | **Qwen3-Embedding-8B** | 8B | **70.58** | **61.69** | **80.89** | **74.00** | **57.65** | 10.06 | 28.66 | **86.40** | **65.63** | **70.88** | **81.08** |
249
250 > **Note**: For compared models, the scores are retrieved from MTEB online [leaderboard](https://huggingface.co/spaces/mteb/leaderboard) on May 24th, 2025.
251
252 ### MTEB (Eng v2)
253
254 | MTEB English / Models | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retri. | STS | Summ. |
255 |--------------------------------|:--------:|:------------:|:------------:|:--------:|:--------:|:-------------:|:---------:|:--------:|:-------:|:-------:|
256 | multilingual-e5-large-instruct | 0.6B | 65.53 | 61.21 | 75.54 | 49.89 | 86.24 | 48.74 | 53.47 | 84.72 | 29.89 |
257 | NV-Embed-v2 | 7.8B | 69.81 | 65.00 | 87.19 | 47.66 | 88.69 | 49.61 | 62.84 | 83.82 | 35.21 |
258 | GritLM-7B | 7.2B | 67.07 | 63.22 | 81.25 | 50.82 | 87.29 | 49.59 | 54.95 | 83.03 | 35.65 |
259 | gte-Qwen2-1.5B-instruct | 1.5B | 67.20 | 63.26 | 85.84 | 53.54 | 87.52 | 49.25 | 50.25 | 82.51 | 33.94 |
260 | stella_en_1.5B_v5 | 1.5B | 69.43 | 65.32 | 89.38 | 57.06 | 88.02 | 50.19 | 52.42 | 83.27 | 36.91 |
261 | gte-Qwen2-7B-instruct | 7.6B | 70.72 | 65.77 | 88.52 | 58.97 | 85.9 | 50.47 | 58.09 | 82.69 | 35.74 |
262 | gemini-embedding-exp-03-07 | - | 73.3 | 67.67 | 90.05 | 59.39 | 87.7 | 48.59 | 64.35 | 85.29 | 38.28 |
263 | **Qwen3-Embedding-0.6B** | 0.6B | 70.70 | 64.88 | 85.76 | 54.05 | 84.37 | 48.18 | 61.83 | 86.57 | 33.43 |
264 | **Qwen3-Embedding-4B** | 4B | 74.60 | 68.10 | 89.84 | 57.51 | 87.01 | 50.76 | 68.46 | 88.72 | 34.39 |
265 | **Qwen3-Embedding-8B** | 8B | 75.22 | 68.71 | 90.43 | 58.57 | 87.52 | 51.56 | 69.44 | 88.58 | 34.83 |
266
267 ### C-MTEB (MTEB Chinese)
268
269 | C-MTEB | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retr. | STS |
270 |------------------|--------|------------|------------|--------|--------|-------------|---------|-------|-------|
271 | multilingual-e5-large-instruct | 0.6B | 58.08 | 58.24 | 69.80 | 48.23 | 64.52 | 57.45 | 63.65 | 45.81 |
272 | bge-multilingual-gemma2 | 9B | 67.64 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 | - |
273 | gte-Qwen2-1.5B-instruct | 1.5B | 67.12 | 67.79 | 72.53 | 54.61 | 79.5 | 68.21 | 71.86 | 60.05 |
274 | gte-Qwen2-7B-instruct | 7.6B | 71.62 | 72.19 | 75.77 | 66.06 | 81.16 | 69.24 | 75.70 | 65.20 |
275 | ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | 85.98 | 72.86 | 76.97 | 63.92 |
276 | **Qwen3-Embedding-0.6B** | 0.6B | 66.33 | 67.45 | 71.40 | 68.74 | 76.42 | 62.58 | 71.03 | 54.52 |
277 | **Qwen3-Embedding-4B** | 4B | 72.27 | 73.51 | 75.46 | 77.89 | 83.34 | 66.05 | 77.03 | 61.26 |
278 | **Qwen3-Embedding-8B** | 8B | 73.84 | 75.00 | 76.97 | 80.08 | 84.23 | 66.99 | 78.21 | 63.53 |
279
280
281 ## Citation
282
283 If you find our work helpful, feel free to give us a cite.
284
285 ```
286 @article{qwen3embedding,
287 title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
288 author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
289 journal={arXiv preprint arXiv:2506.05176},
290 year={2025}
291 }
292 ```