README.md
16.8 KB · 285 lines · markdown Raw
1 ---
2 license: apache-2.0
3 base_model:
4 - Qwen/Qwen3-8B-Base
5 tags:
6 - transformers
7 - sentence-transformers
8 - sentence-similarity
9 - feature-extraction
10 - text-embeddings-inference
11 ---
12 # Qwen3-Embedding-8B
13
14 <p align="center">
15 <img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>
16 <p>
17
18 ## Highlights
19
20 The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.
21
22 **Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of June 5, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios.
23
24 **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
25
26 **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
27
28 **Qwen3-Embedding-8B** has the following features:
29
30 - Model Type: Text Embedding
31 - Supported Languages: 100+ Languages
32 - Number of Paramaters: 8B
33 - Context Length: 32k
34 - Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 32 to 4096
35
36 For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-embedding/), [GitHub](https://github.com/QwenLM/Qwen3-Embedding).
37
38 ## Qwen3 Embedding Series Model list
39
40 | Model Type | Models | Size | Layers | Sequence Length | Embedding Dimension | MRL Support | Instruction Aware |
41 |------------------|----------------------|------|--------|-----------------|---------------------|-------------|----------------|
42 | Text Embedding | [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) | 0.6B | 28 | 32K | 1024 | Yes | Yes |
43 | Text Embedding | [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) | 4B | 36 | 32K | 2560 | Yes | Yes |
44 | Text Embedding | [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) | 8B | 36 | 32K | 4096 | Yes | Yes |
45 | Text Reranking | [Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) | 0.6B | 28 | 32K | - | - | Yes |
46 | Text Reranking | [Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) | 4B | 36 | 32K | - | - | Yes |
47 | Text Reranking | [Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B) | 8B | 36 | 32K | - | - | Yes |
48
49 > **Note**:
50 > - `MRL Support` indicates whether the embedding model supports custom dimensions for the final embedding.
51 > - `Instruction Aware` notes whether the embedding or reranking model supports customizing the input instruction according to different tasks.
52 > - Our evaluation indicates that, for most downstream tasks, using instructions (instruct) typically yields an improvement of 1% to 5% compared to not using them. Therefore, we recommend that developers create tailored instructions specific to their tasks and scenarios. In multilingual contexts, we also advise users to write their instructions in English, as most instructions utilized during the model training process were originally written in English.
53
54 ## Usage
55
56 With Transformers versions earlier than 4.51.0, you may encounter the following error:
57 ```
58 KeyError: 'qwen3'
59 ```
60
61 ### Sentence Transformers Usage
62
63 ```python
64 # Requires transformers>=4.51.0
65 # Requires sentence-transformers>=2.7.0
66
67 from sentence_transformers import SentenceTransformer
68
69 # Load the model
70 model = SentenceTransformer("Qwen/Qwen3-Embedding-8B")
71
72 # We recommend enabling flash_attention_2 for better acceleration and memory saving,
73 # together with setting `padding_side` to "left":
74 # model = SentenceTransformer(
75 # "Qwen/Qwen3-Embedding-8B",
76 # model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
77 # tokenizer_kwargs={"padding_side": "left"},
78 # )
79
80 # The queries and documents to embed
81 queries = [
82 "What is the capital of China?",
83 "Explain gravity",
84 ]
85 documents = [
86 "The capital of China is Beijing.",
87 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
88 ]
89
90 # Encode the queries and documents. Note that queries benefit from using a prompt
91 # Here we use the prompt called "query" stored under `model.prompts`, but you can
92 # also pass your own prompt via the `prompt` argument
93 query_embeddings = model.encode(queries, prompt_name="query")
94 document_embeddings = model.encode(documents)
95
96 # Compute the (cosine) similarity between the query and document embeddings
97 similarity = model.similarity(query_embeddings, document_embeddings)
98 print(similarity)
99 # tensor([[0.7493, 0.0751],
100 # [0.0880, 0.6318]])
101 ```
102
103 ### Transformers Usage
104
105 ```python
106 # Requires transformers>=4.51.0
107
108 import torch
109 import torch.nn.functional as F
110
111 from torch import Tensor
112 from transformers import AutoTokenizer, AutoModel
113
114
115 def last_token_pool(last_hidden_states: Tensor,
116 attention_mask: Tensor) -> Tensor:
117 left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
118 if left_padding:
119 return last_hidden_states[:, -1]
120 else:
121 sequence_lengths = attention_mask.sum(dim=1) - 1
122 batch_size = last_hidden_states.shape[0]
123 return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
124
125
126 def get_detailed_instruct(task_description: str, query: str) -> str:
127 return f'Instruct: {task_description}\nQuery:{query}'
128
129 # Each query must come with a one-sentence instruction that describes the task
130 task = 'Given a web search query, retrieve relevant passages that answer the query'
131
132 queries = [
133 get_detailed_instruct(task, 'What is the capital of China?'),
134 get_detailed_instruct(task, 'Explain gravity')
135 ]
136 # No need to add instruction for retrieval documents
137 documents = [
138 "The capital of China is Beijing.",
139 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
140 ]
141 input_texts = queries + documents
142
143 tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-8B', padding_side='left')
144 model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-8B')
145
146 # We recommend enabling flash_attention_2 for better acceleration and memory saving.
147 # model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-8B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()
148
149 max_length = 8192
150
151 # Tokenize the input texts
152 batch_dict = tokenizer(
153 input_texts,
154 padding=True,
155 truncation=True,
156 max_length=max_length,
157 return_tensors="pt",
158 )
159 batch_dict.to(model.device)
160 outputs = model(**batch_dict)
161 embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
162
163 # normalize embeddings
164 embeddings = F.normalize(embeddings, p=2, dim=1)
165 scores = (embeddings[:2] @ embeddings[2:].T)
166 print(scores.tolist())
167 # [[0.7493016123771667, 0.0750647559762001], [0.08795969933271408, 0.6318399906158447]]
168 ```
169
170 ### vLLM Usage
171
172 ```python
173 # Requires vllm>=0.8.5
174 import torch
175 import vllm
176 from vllm import LLM
177 def get_detailed_instruct(task_description: str, query: str) -> str:
178 return f'Instruct: {task_description}\nQuery:{query}'
179 # Each query must come with a one-sentence instruction that describes the task
180 task = 'Given a web search query, retrieve relevant passages that answer the query'
181 queries = [
182 get_detailed_instruct(task, 'What is the capital of China?'),
183 get_detailed_instruct(task, 'Explain gravity')
184 ]
185 # No need to add instruction for retrieval documents
186 documents = [
187 "The capital of China is Beijing.",
188 "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
189 ]
190 input_texts = queries + documents
191 model = LLM(model="Qwen/Qwen3-Embedding-8B", task="embed")
192 outputs = model.embed(input_texts)
193 embeddings = torch.tensor([o.outputs.embedding for o in outputs])
194 scores = (embeddings[:2] @ embeddings[2:].T)
195 print(scores.tolist())
196 # [[0.7482624650001526, 0.07556197047233582], [0.08875375241041183, 0.6300010681152344]]
197 ```
198
199 📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
200
201 ### Text Embeddings Inference (TEI) Usage
202
203 You can either run / deploy TEI on NVIDIA GPUs as:
204
205 ```bash
206 docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-8B --dtype float16
207 ```
208
209 Or on CPU devices as:
210
211 ```bash
212 docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-8B --dtype float16
213 ```
214
215 And then, generate the embeddings sending a HTTP POST request as:
216
217 ```bash
218 curl http://localhost:8080/embed \
219 -X POST \
220 -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
221 -H "Content-Type: application/json"
222 ```
223
224 ## Evaluation
225
226 ### MTEB (Multilingual)
227
228 | Model | Size | Mean (Task) | Mean (Type) | Bitxt Mining | Class. | Clust. | Inst. Retri. | Multi. Class. | Pair. Class. | Rerank | Retri. | STS |
229 |----------------------------------|:-------:|:-------------:|:-------------:|:--------------:|:--------:|:--------:|:--------------:|:---------------:|:--------------:|:--------:|:--------:|:------:|
230 | NV-Embed-v2 | 7B | 56.29 | 49.58 | 57.84 | 57.29 | 40.80 | 1.04 | 18.63 | 78.94 | 63.82 | 56.72 | 71.10|
231 | GritLM-7B | 7B | 60.92 | 53.74 | 70.53 | 61.83 | 49.75 | 3.45 | 22.77 | 79.94 | 63.78 | 58.31 | 73.33|
232 | BGE-M3 | 0.6B | 59.56 | 52.18 | 79.11 | 60.35 | 40.88 | -3.11 | 20.1 | 80.76 | 62.79 | 54.60 | 74.12|
233 | multilingual-e5-large-instruct | 0.6B | 63.22 | 55.08 | 80.13 | 64.94 | 50.75 | -0.40 | 22.91 | 80.86 | 62.61 | 57.12 | 76.81|
234 | gte-Qwen2-1.5B-instruct | 1.5B | 59.45 | 52.69 | 62.51 | 58.32 | 52.05 | 0.74 | 24.02 | 81.58 | 62.58 | 60.78 | 71.61|
235 | gte-Qwen2-7b-Instruct | 7B | 62.51 | 55.93 | 73.92 | 61.55 | 52.77 | 4.94 | 25.48 | 85.13 | 65.55 | 60.08 | 73.98|
236 | text-embedding-3-large | - | 58.93 | 51.41 | 62.17 | 60.27 | 46.89 | -2.68 | 22.03 | 79.17 | 63.89 | 59.27 | 71.68|
237 | Cohere-embed-multilingual-v3.0 | - | 61.12 | 53.23 | 70.50 | 62.95 | 46.89 | -1.89 | 22.74 | 79.88 | 64.07 | 59.16 | 74.80|
238 | gemini-embedding-exp-03-07 | - | 68.37 | 59.59 | 79.28 | 71.82 | 54.59 | 5.18 | **29.16** | 83.63 | 65.58 | 67.71 | 79.40|
239 | **Qwen3-Embedding-0.6B** | 0.6B | 64.33 | 56.00 | 72.22 | 66.83 | 52.33 | 5.09 | 24.59 | 80.83 | 61.41 | 64.64 | 76.17|
240 | **Qwen3-Embedding-4B** | 4B | 69.45 | 60.86 | 79.36 | 72.33 | 57.15 | **11.56** | 26.77 | 85.05 | 65.08 | 69.60 | 80.86|
241 | **Qwen3-Embedding-8B** | 8B | **70.58** | **61.69** | **80.89** | **74.00** | **57.65** | 10.06 | 28.66 | **86.40** | **65.63** | **70.88** | **81.08** |
242
243 > **Note**: For compared models, the scores are retrieved from MTEB online [leaderboard](https://huggingface.co/spaces/mteb/leaderboard) on May 24th, 2025.
244
245 ### MTEB (Eng v2)
246
247 | MTEB English / Models | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retri. | STS | Summ. |
248 |--------------------------------|:--------:|:------------:|:------------:|:--------:|:--------:|:-------------:|:---------:|:--------:|:-------:|:-------:|
249 | multilingual-e5-large-instruct | 0.6B | 65.53 | 61.21 | 75.54 | 49.89 | 86.24 | 48.74 | 53.47 | 84.72 | 29.89 |
250 | NV-Embed-v2 | 7.8B | 69.81 | 65.00 | 87.19 | 47.66 | 88.69 | 49.61 | 62.84 | 83.82 | 35.21 |
251 | GritLM-7B | 7.2B | 67.07 | 63.22 | 81.25 | 50.82 | 87.29 | 49.59 | 54.95 | 83.03 | 35.65 |
252 | gte-Qwen2-1.5B-instruct | 1.5B | 67.20 | 63.26 | 85.84 | 53.54 | 87.52 | 49.25 | 50.25 | 82.51 | 33.94 |
253 | stella_en_1.5B_v5 | 1.5B | 69.43 | 65.32 | 89.38 | 57.06 | 88.02 | 50.19 | 52.42 | 83.27 | 36.91 |
254 | gte-Qwen2-7B-instruct | 7.6B | 70.72 | 65.77 | 88.52 | 58.97 | 85.9 | 50.47 | 58.09 | 82.69 | 35.74 |
255 | gemini-embedding-exp-03-07 | - | 73.3 | 67.67 | 90.05 | **59.39** | **87.7** | 48.59 | 64.35 | 85.29 | **38.28** |
256 | **Qwen3-Embedding-0.6B** | 0.6B | 70.70 | 64.88 | 85.76 | 54.05 | 84.37 | 48.18 | 61.83 | 86.57 | 33.43 |
257 | **Qwen3-Embedding-4B** | 4B | 74.60 | 68.10 | 89.84 | 57.51 | 87.01 | 50.76 | 68.46 | **88.72** | 34.39 |
258 | **Qwen3-Embedding-8B** | 8B | **75.22** | **68.71** | **90.43** | 58.57 | 87.52 | **51.56** | **69.44** | 88.58 | 34.83 |
259
260 ### C-MTEB (MTEB Chinese)
261
262 | C-MTEB | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retr. | STS |
263 |------------------|--------|------------|------------|--------|--------|-------------|---------|-------|-------|
264 | multilingual-e5-large-instruct | 0.6B | 58.08 | 58.24 | 69.80 | 48.23 | 64.52 | 57.45 | 63.65 | 45.81 |
265 | bge-multilingual-gemma2 | 9B | 67.64 |68.52 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 |
266 | gte-Qwen2-1.5B-instruct | 1.5B | 67.12 | 67.79 | 72.53 | 54.61 | 79.5 | 68.21 | 71.86 | 60.05 |
267 | gte-Qwen2-7B-instruct | 7.6B | 71.62 | 72.19 | 75.77 | 66.06 | 81.16 | 69.24 | 75.70 | 65.20 |
268 | ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | **85.98** | **72.86** | 76.97 | **63.92** |
269 | **Qwen3-Embedding-0.6B** | 0.6B | 66.33 | 67.45 | 71.40 | 68.74 | 76.42 | 62.58 | 71.03 | 54.52 |
270 | **Qwen3-Embedding-4B** | 4B | 72.27 | 73.51 | 75.46 | 77.89 | 83.34 | 66.05 | 77.03 | 61.26 |
271 | **Qwen3-Embedding-8B** | 8B | **73.84** | **75.00** | **76.97** | **80.08** | 84.23 | 66.99 | **78.21** | 63.53 |
272
273
274 ## Citation
275
276 If you find our work helpful, feel free to give us a cite.
277
278 ```
279 @article{qwen3embedding,
280 title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
281 author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
282 journal={arXiv preprint arXiv:2506.05176},
283 year={2025}
284 }
285 ```