README.md · Qwen3-Embedding-0.6B

1

---

2

license: apache-2.0

3

base_model:

4

- Qwen/Qwen3-0.6B-Base

5

tags:

6

- transformers

7

- sentence-transformers

8

- sentence-similarity

9

- feature-extraction

10

- text-embeddings-inference

11

---

12

# Qwen3-Embedding-0.6B

13

14

<p align="center">

15

<img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>

16

<p>

17

18

## Highlights

19

20

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

21

22

**Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of June 5, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios.

23

24

**Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.

25

26

**Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.

27

28

## Model Overview

29

30

**Qwen3-Embedding-0.6B** has the following features:

31

32

- Model Type: Text Embedding

33

- Supported Languages: 100+ Languages

34

- Number of Parameters: 0.6B

35

- Context Length: 32k

36

- Embedding Dimension: Up to 1024, supports user-defined output dimensions ranging from 32 to 1024

37

38

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-embedding/), [GitHub](https://github.com/QwenLM/Qwen3-Embedding).

39

40

## Qwen3 Embedding Series Model list

41

42

| Model Type       | Models               | Size | Layers | Sequence Length | Embedding Dimension | MRL Support | Instruction Aware |

43

|------------------|----------------------|------|--------|-----------------|---------------------|-------------|----------------|

44

| Text Embedding   | [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) | 0.6B | 28     | 32K             | 1024                | Yes         | Yes            |

45

| Text Embedding   | [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)   | 4B   | 36     | 32K             | 2560                | Yes         | Yes            |

46

| Text Embedding   | [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B)   | 8B   | 36     | 32K             | 4096                | Yes         | Yes            |

47

| Text Reranking   | [Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) | 0.6B | 28     | 32K             | -                   | -           | Yes            |

48

| Text Reranking   | [Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B)   | 4B   | 36     | 32K             | -                   | -           | Yes            |

49

| Text Reranking   | [Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B)   | 8B   | 36     | 32K             | -                   | -           | Yes            |

50

51

> **Note**:

52

> - `MRL Support` indicates whether the embedding model supports custom dimensions for the final embedding.

53

> - `Instruction Aware` notes whether the embedding or reranking model supports customizing the input instruction according to different tasks.

54

> - Our evaluation indicates that, for most downstream tasks, using instructions (instruct) typically yields an improvement of 1% to 5% compared to not using them. Therefore, we recommend that developers create tailored instructions specific to their tasks and scenarios. In multilingual contexts, we also advise users to write their instructions in English, as most instructions utilized during the model training process were originally written in English.

55

56

## Usage

57

58

With Transformers versions earlier than 4.51.0, you may encounter the following error:

59

```

60

KeyError: 'qwen3'

61

```

62

63

### Sentence Transformers Usage

64

65

```python

66

# Requires transformers>=4.51.0

67

# Requires sentence-transformers>=2.7.0

68

69

from sentence_transformers import SentenceTransformer

70

71

# Load the model

72

model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

73

74

# We recommend enabling flash_attention_2 for better acceleration and memory saving,

75

# together with setting `padding_side` to "left":

76

# model = SentenceTransformer(

77

# "Qwen/Qwen3-Embedding-0.6B",

78

# model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},

79

# tokenizer_kwargs={"padding_side": "left"},

80

# )

81

82

# The queries and documents to embed

83

queries = [

84

"What is the capital of China?",

85

"Explain gravity",

86

]

87

documents = [

88

"The capital of China is Beijing.",

89

    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",

90

]

91

92

# Encode the queries and documents. Note that queries benefit from using a prompt

93

# Here we use the prompt called "query" stored under `model.prompts`, but you can

94

# also pass your own prompt via the `prompt` argument

95

query_embeddings = model.encode(queries, prompt_name="query")

96

document_embeddings = model.encode(documents)

97

98

# Compute the (cosine) similarity between the query and document embeddings

99

similarity = model.similarity(query_embeddings, document_embeddings)

100

print(similarity)

101

# tensor([[0.7646, 0.1414],

102

# [0.1355, 0.6000]])

103

```

104

105

### Transformers Usage

106

107

```python

108

# Requires transformers>=4.51.0

109

110

import torch

111

import torch.nn.functional as F

112

113

from torch import Tensor

114

from transformers import AutoTokenizer, AutoModel

115

116

117

def last_token_pool(last_hidden_states: Tensor,

118

attention_mask: Tensor) -> Tensor:

119

left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])

120

if left_padding:

121

return last_hidden_states[:, -1]

122

else:

123

sequence_lengths = attention_mask.sum(dim=1) - 1

124

batch_size = last_hidden_states.shape[0]

125

return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]

126

127

128

def get_detailed_instruct(task_description: str, query: str) -> str:

129

return f'Instruct: {task_description}\nQuery:{query}'

130

131

# Each query must come with a one-sentence instruction that describes the task

132

task = 'Given a web search query, retrieve relevant passages that answer the query'

133

134

queries = [

135

get_detailed_instruct(task, 'What is the capital of China?'),

136

get_detailed_instruct(task, 'Explain gravity')

137

]

138

# No need to add instruction for retrieval documents

139

documents = [

140

"The capital of China is Beijing.",

141

    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."

142

]

143

input_texts = queries + documents

144

145

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')

146

model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')

147

148

# We recommend enabling flash_attention_2 for better acceleration and memory saving.

149

# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()

150

151

max_length = 8192

152

153

# Tokenize the input texts

154

batch_dict = tokenizer(

155

input_texts,

156

padding=True,

157

truncation=True,

158

max_length=max_length,

159

return_tensors="pt",

160

)

161

batch_dict.to(model.device)

162

outputs = model(**batch_dict)

163

embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

164

165

# normalize embeddings

166

embeddings = F.normalize(embeddings, p=2, dim=1)

167

scores = (embeddings[:2] @ embeddings[2:].T)

168

print(scores.tolist())

169

# [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]

170

```

171

172

### vLLM Usage

173

174

```python

175

# Requires vllm>=0.8.5

176

import torch

177

import vllm

178

from vllm import LLM

179

180

def get_detailed_instruct(task_description: str, query: str) -> str:

181

return f'Instruct: {task_description}\nQuery:{query}'

182

183

# Each query must come with a one-sentence instruction that describes the task

184

task = 'Given a web search query, retrieve relevant passages that answer the query'

185

186

queries = [

187

get_detailed_instruct(task, 'What is the capital of China?'),

188

get_detailed_instruct(task, 'Explain gravity')

189

]

190

# No need to add instruction for retrieval documents

191

documents = [

192

"The capital of China is Beijing.",

193

    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."

194

]

195

input_texts = queries + documents

196

197

model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")

198

199

outputs = model.embed(input_texts)

200

embeddings = torch.tensor([o.outputs.embedding for o in outputs])

201

scores = (embeddings[:2] @ embeddings[2:].T)

202

print(scores.tolist())

203

# [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]]

204

```

205

206

📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.

207

208

### Text Embeddings Inference (TEI) Usage

209

210

You can either run / deploy TEI on NVIDIA GPUs as:

211

212

```bash

213

docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B --dtype float16

214

```

215

216

Or on CPU devices as:

217

218

```bash

219

docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B

220

```

221

222

And then, generate the embeddings sending a HTTP POST request as:

223

224

```bash

225

curl http://localhost:8080/embed \

226

-X POST \

227

    -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \

228

-H "Content-Type: application/json"

229

```

230

231

## Evaluation

232

233

### MTEB (Multilingual)

234

235

| Model                            |  Size   |  Mean (Task)  | Mean (Type) | Bitxt Mining | Class. | Clust. | Inst. Retri. | Multi. Class. | Pair. Class. | Rerank | Retri. | STS  |

236

|----------------------------------|:-------:|:-------------:|:-------------:|:--------------:|:--------:|:--------:|:--------------:|:---------------:|:--------------:|:--------:|:--------:|:------:|

237

| NV-Embed-v2                      |   7B    |     56.29     | 49.58       | 57.84        | 57.29  | 40.80  | 1.04         | 18.63         | 78.94        | 63.82  | 56.72  | 71.10|

238

| GritLM-7B                        |   7B    |     60.92     | 53.74       | 70.53        | 61.83  | 49.75  | 3.45         | 22.77         | 79.94        | 63.78  | 58.31  | 73.33|

239

| BGE-M3                           |  0.6B   |     59.56     | 52.18       | 79.11        | 60.35  | 40.88  | -3.11        | 20.1          | 80.76        | 62.79  | 54.60  | 74.12|

240

| multilingual-e5-large-instruct   |  0.6B   |     63.22     | 55.08       | 80.13        | 64.94  | 50.75  | -0.40        | 22.91         | 80.86        | 62.61  | 57.12  | 76.81|

241

| gte-Qwen2-1.5B-instruct          |  1.5B   |     59.45     | 52.69       | 62.51        | 58.32  | 52.05  | 0.74         | 24.02         | 81.58        | 62.58  | 60.78  | 71.61|

242

| gte-Qwen2-7b-Instruct            |   7B    |     62.51     | 55.93       | 73.92        | 61.55  | 52.77  | 4.94         | 25.48         | 85.13        | 65.55  | 60.08  | 73.98|

243

| text-embedding-3-large           |    -    |     58.93     | 51.41       | 62.17        | 60.27  | 46.89  | -2.68        | 22.03         | 79.17        | 63.89  | 59.27  | 71.68|

244

| Cohere-embed-multilingual-v3.0   |    -    |     61.12     | 53.23       | 70.50        | 62.95  | 46.89  | -1.89        | 22.74         | 79.88        | 64.07  | 59.16  | 74.80|

245

| Gemini Embedding                 |    -    |     68.37     | 59.59       | 79.28        | 71.82  | 54.59  | 5.18         | **29.16**     | 83.63        | 65.58  | 67.71  | 79.40|

246

| **Qwen3-Embedding-0.6B**         |  0.6B   |     64.33     | 56.00       | 72.22        | 66.83  | 52.33  | 5.09         | 24.59         | 80.83        | 61.41  | 64.64  | 76.17|

247

| **Qwen3-Embedding-4B**           |   4B    |     69.45     | 60.86       | 79.36        | 72.33  | 57.15  | **11.56**    | 26.77         | 85.05        | 65.08  | 69.60  | 80.86|

248

| **Qwen3-Embedding-8B**           |   8B    |   **70.58**   | **61.69**   | **80.89**    | **74.00** | **57.65** | 10.06      | 28.66         | **86.40**    | **65.63** | **70.88** | **81.08** |

249

250

> **Note**: For compared models, the scores are retrieved from MTEB online [leaderboard](https://huggingface.co/spaces/mteb/leaderboard) on May 24th, 2025.

251

252

### MTEB (Eng v2)

253

254

| MTEB English / Models          |  Param.  | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retri. | STS   | Summ. |

255

|--------------------------------|:--------:|:------------:|:------------:|:--------:|:--------:|:-------------:|:---------:|:--------:|:-------:|:-------:|

256

| multilingual-e5-large-instruct |   0.6B   | 65.53      | 61.21      | 75.54  | 49.89  | 86.24       | 48.74   | 53.47  | 84.72 | 29.89 |

257

| NV-Embed-v2                    |   7.8B   | 69.81      | 65.00      | 87.19  | 47.66  | 88.69       | 49.61   | 62.84  | 83.82 | 35.21 |

258

| GritLM-7B                      |   7.2B   | 67.07      | 63.22      | 81.25  | 50.82  | 87.29       | 49.59   | 54.95  | 83.03 | 35.65 |

259

| gte-Qwen2-1.5B-instruct        |   1.5B   | 67.20      | 63.26      | 85.84  | 53.54  | 87.52       | 49.25   | 50.25  | 82.51 | 33.94 |

260

| stella_en_1.5B_v5              |   1.5B   | 69.43      | 65.32      | 89.38  | 57.06  | 88.02       | 50.19   | 52.42  | 83.27 | 36.91 |

261

| gte-Qwen2-7B-instruct          |   7.6B   | 70.72      | 65.77      | 88.52  | 58.97  | 85.9        | 50.47   | 58.09  | 82.69 | 35.74 |

262

| gemini-embedding-exp-03-07     |    -     | 73.3       | 67.67      | 90.05  | 59.39  | 87.7        | 48.59   | 64.35  | 85.29 | 38.28 |

263

| **Qwen3-Embedding-0.6B**       |   0.6B   | 70.70      | 64.88      | 85.76  | 54.05  | 84.37       | 48.18   | 61.83  | 86.57 | 33.43 |

264

| **Qwen3-Embedding-4B**         |    4B    | 74.60      | 68.10      | 89.84  | 57.51  | 87.01       | 50.76   | 68.46  | 88.72 | 34.39 |

265

| **Qwen3-Embedding-8B**         |    8B    | 75.22      | 68.71      | 90.43  | 58.57  | 87.52       | 51.56   | 69.44  | 88.58 | 34.83 |

266

267

### C-MTEB (MTEB Chinese)

268

269

270

|------------------|--------|------------|------------|--------|--------|-------------|---------|-------|-------|

271

| multilingual-e5-large-instruct | 0.6B   | 58.08      | 58.24      | 69.80  | 48.23  | 64.52       | 57.45   | 63.65 | 45.81 |

272

| bge-multilingual-gemma2 | 9B | 67.64 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 | - |

273

| gte-Qwen2-1.5B-instruct  | 1.5B   | 67.12      | 67.79      | 72.53  | 54.61  | 79.5        | 68.21   | 71.86 | 60.05 |

274

| gte-Qwen2-7B-instruct    | 7.6B   | 71.62      | 72.19      | 75.77  | 66.06  | 81.16       | 69.24   | 75.70 | 65.20 |

275

| ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | 85.98 | 72.86 | 76.97 | 63.92 |

276

| **Qwen3-Embedding-0.6B** | 0.6B   | 66.33      | 67.45      | 71.40  | 68.74  | 76.42       | 62.58   | 71.03 | 54.52 |

277

| **Qwen3-Embedding-4B**   | 4B     | 72.27      | 73.51      | 75.46  | 77.89  | 83.34       | 66.05   | 77.03 | 61.26 |

278

| **Qwen3-Embedding-8B**   | 8B     | 73.84      | 75.00      | 76.97  | 80.08  | 84.23       | 66.99   | 78.21 | 63.53 |

279

280

281

## Citation

282

283

If you find our work helpful, feel free to give us a cite.

284

285

```

286

@article{qwen3embedding,

287

title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},

288

  author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},

289

journal={arXiv preprint arXiv:2506.05176},

290

year={2025}

291

}

292

```