README.md · Qwen3-Embedding-4B

1

---

2

license: apache-2.0

3

base_model:

4

- Qwen/Qwen3-4B-Base

5

tags:

6

- transformers

7

- sentence-transformers

8

- sentence-similarity

9

- feature-extraction

10

- text-embeddings-inference

11

---

12

# Qwen3-Embedding-4B

13

14

<p align="center">

15

<img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>

16

<p>

17

18

## Highlights

19

20

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

21

22

**Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of June 5, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios.

23

24

**Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.

25

26

**Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.

27

28

## Model Overview

29

30

**Qwen3-Embedding-4B** has the following features:

31

32

- Model Type: Text Embedding

33

- Supported Languages: 100+ Languages

34

- Number of Paramaters: 4B

35

- Context Length: 32k

36

- Embedding Dimension: Up to 2560, supports user-defined output dimensions ranging from 32 to 2560

37

38

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-embedding/), [GitHub](https://github.com/QwenLM/Qwen3-Embedding).

39

40

## Qwen3 Embedding Series Model list

41

42

| Model Type       | Models               | Size | Layers | Sequence Length | Embedding Dimension | MRL Support | Instruction Aware |

43

|------------------|----------------------|------|--------|-----------------|---------------------|-------------|----------------|

44

| Text Embedding   | [Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) | 0.6B | 28     | 32K             | 1024                | Yes         | Yes            |

45

| Text Embedding   | [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)   | 4B   | 36     | 32K             | 2560                | Yes         | Yes            |

46

| Text Embedding   | [Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B)   | 8B   | 36     | 32K             | 4096                | Yes         | Yes            |

47

| Text Reranking   | [Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) | 0.6B | 28     | 32K             | -                   | -           | Yes            |

48

| Text Reranking   | [Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B)   | 4B   | 36     | 32K             | -                   | -           | Yes            |

49

| Text Reranking   | [Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B)   | 8B   | 36     | 32K             | -                   | -           | Yes            |

50

51

> **Note**:

52

> - `MRL Support` indicates whether the embedding model supports custom dimensions for the final embedding.

53

> - `Instruction Aware` notes whether the embedding or reranking model supports customizing the input instruction according to different tasks.

54

> - Our evaluation indicates that, for most downstream tasks, using instructions (instruct) typically yields an improvement of 1% to 5% compared to not using them. Therefore, we recommend that developers create tailored instructions specific to their tasks and scenarios. In multilingual contexts, we also advise users to write their instructions in English, as most instructions utilized during the model training process were originally written in English.

55

56

## Usage

57

58

With Transformers versions earlier than 4.51.0, you may encounter the following error:

59

```

60

KeyError: 'qwen3'

61

```

62

63

### Sentence Transformers Usage

64

65

```python

66

# Requires transformers>=4.51.0

67

# Requires sentence-transformers>=2.7.0

68

69

from sentence_transformers import SentenceTransformer

70

71

# Load the model

72

model = SentenceTransformer("Qwen/Qwen3-Embedding-4B")

73

74

# We recommend enabling flash_attention_2 for better acceleration and memory saving,

75

# together with setting `padding_side` to "left":

76

# model = SentenceTransformer(

77

# "Qwen/Qwen3-Embedding-4B",

78

# model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},

79

# tokenizer_kwargs={"padding_side": "left"},

80

# )

81

82

# The queries and documents to embed

83

queries = [

84

"What is the capital of China?",

85

"Explain gravity",

86

]

87

documents = [

88

"The capital of China is Beijing.",

89

    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",

90

]

91

92

# Encode the queries and documents. Note that queries benefit from using a prompt

93

# Here we use the prompt called "query" stored under `model.prompts`, but you can

94

# also pass your own prompt via the `prompt` argument

95

query_embeddings = model.encode(queries, prompt_name="query")

96

document_embeddings = model.encode(documents)

97

98

# Compute the (cosine) similarity between the query and document embeddings

99

similarity = model.similarity(query_embeddings, document_embeddings)

100

print(similarity)

101

# tensor([[0.7534, 0.1147],

102

# [0.0320, 0.6258]])

103

```

104

105

### Transformers Usage

106

107

```python

108

# Requires transformers>=4.51.0

109

import torch

110

import torch.nn.functional as F

111

112

from torch import Tensor

113

from transformers import AutoTokenizer, AutoModel

114

115

116

def last_token_pool(last_hidden_states: Tensor,

117

attention_mask: Tensor) -> Tensor:

118

left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])

119

if left_padding:

120

return last_hidden_states[:, -1]

121

else:

122

sequence_lengths = attention_mask.sum(dim=1) - 1

123

batch_size = last_hidden_states.shape[0]

124

return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]

125

126

127

def get_detailed_instruct(task_description: str, query: str) -> str:

128

return f'Instruct: {task_description}\nQuery:{query}'

129

130

# Each query must come with a one-sentence instruction that describes the task

131

task = 'Given a web search query, retrieve relevant passages that answer the query'

132

133

queries = [

134

get_detailed_instruct(task, 'What is the capital of China?'),

135

get_detailed_instruct(task, 'Explain gravity')

136

]

137

# No need to add instruction for retrieval documents

138

documents = [

139

"The capital of China is Beijing.",

140

    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."

141

]

142

input_texts = queries + documents

143

144

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-4B', padding_side='left')

145

model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-4B')

146

147

# We recommend enabling flash_attention_2 for better acceleration and memory saving.

148

# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-4B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()

149

150

max_length = 8192

151

152

# Tokenize the input texts

153

batch_dict = tokenizer(

154

input_texts,

155

padding=True,

156

truncation=True,

157

max_length=max_length,

158

return_tensors="pt",

159

)

160

batch_dict.to(model.device)

161

outputs = model(**batch_dict)

162

embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

163

164

# normalize embeddings

165

embeddings = F.normalize(embeddings, p=2, dim=1)

166

scores = (embeddings[:2] @ embeddings[2:].T)

167

print(scores.tolist())

168

# [[0.7534257769584656, 0.1146894246339798], [0.03198453038930893, 0.6258305311203003]]

169

```

170

171

### vLLM Usage

172

173

```python

174

# Requires vllm>=0.8.5

175

import torch

176

import vllm

177

from vllm import LLM

178

179

def get_detailed_instruct(task_description: str, query: str) -> str:

180

return f'Instruct: {task_description}\nQuery:{query}'

181

182

# Each query must come with a one-sentence instruction that describes the task

183

task = 'Given a web search query, retrieve relevant passages that answer the query'

184

185

queries = [

186

get_detailed_instruct(task, 'What is the capital of China?'),

187

get_detailed_instruct(task, 'Explain gravity')

188

]

189

# No need to add instruction for retrieval documents

190

documents = [

191

"The capital of China is Beijing.",

192

    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."

193

]

194

input_texts = queries + documents

195

196

model = LLM(model="Qwen/Qwen3-Embedding-4B", task="embed")

197

198

outputs = model.embed(input_texts)

199

embeddings = torch.tensor([o.outputs.embedding for o in outputs])

200

scores = (embeddings[:2] @ embeddings[2:].T)

201

print(scores.tolist())

202

# [[0.7525103688240051, 0.1143278032541275], [0.030893627554178238, 0.6239761114120483]]

203

```

204

205

📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.

206

207

### Text Embeddings Inference (TEI) Usage

208

209

You can either run / deploy TEI on NVIDIA GPUs as:

210

211

```bash

212

docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-4B --dtype float16

213

```

214

215

Or on CPU devices as:

216

217

```bash

218

docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-4B --dtype float16

219

```

220

221

And then, generate the embeddings sending a HTTP POST request as:

222

223

```bash

224

curl http://localhost:8080/embed \

225

-X POST \

226

    -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \

227

-H "Content-Type: application/json"

228

```

229

230

## Evaluation

231

232

### MTEB (Multilingual)

233

234

| Model                            |  Size   |  Mean (Task)  | Mean (Type) | Bitxt Mining | Class. | Clust. | Inst. Retri. | Multi. Class. | Pair. Class. | Rerank | Retri. | STS  |

235

|----------------------------------|:-------:|:-------------:|:-------------:|:--------------:|:--------:|:--------:|:--------------:|:---------------:|:--------------:|:--------:|:--------:|:------:|

236

| NV-Embed-v2                      |   7B    |     56.29     | 49.58       | 57.84        | 57.29  | 40.80  | 1.04         | 18.63         | 78.94        | 63.82  | 56.72  | 71.10|

237

| GritLM-7B                        |   7B    |     60.92     | 53.74       | 70.53        | 61.83  | 49.75  | 3.45         | 22.77         | 79.94        | 63.78  | 58.31  | 73.33|

238

| BGE-M3                           |  0.6B   |     59.56     | 52.18       | 79.11        | 60.35  | 40.88  | -3.11        | 20.1          | 80.76        | 62.79  | 54.60  | 74.12|

239

| multilingual-e5-large-instruct   |  0.6B   |     63.22     | 55.08       | 80.13        | 64.94  | 50.75  | -0.40        | 22.91         | 80.86        | 62.61  | 57.12  | 76.81|

240

| gte-Qwen2-1.5B-instruct          |  1.5B   |     59.45     | 52.69       | 62.51        | 58.32  | 52.05  | 0.74         | 24.02         | 81.58        | 62.58  | 60.78  | 71.61|

241

| gte-Qwen2-7b-Instruct            |   7B    |     62.51     | 55.93       | 73.92        | 61.55  | 52.77  | 4.94         | 25.48         | 85.13        | 65.55  | 60.08  | 73.98|

242

| text-embedding-3-large           |    -    |     58.93     | 51.41       | 62.17        | 60.27  | 46.89  | -2.68        | 22.03         | 79.17        | 63.89  | 59.27  | 71.68|

243

| Cohere-embed-multilingual-v3.0   |    -    |     61.12     | 53.23       | 70.50        | 62.95  | 46.89  | -1.89        | 22.74         | 79.88        | 64.07  | 59.16  | 74.80|

244

| gemini-embedding-exp-03-07       |    -    |     68.37     | 59.59       | 79.28        | 71.82  | 54.59  | 5.18         | **29.16**     | 83.63        | 65.58  | 67.71  | 79.40|

245

| **Qwen3-Embedding-0.6B**         |  0.6B   |     64.33     | 56.00       | 72.22        | 66.83  | 52.33  | 5.09         | 24.59         | 80.83        | 61.41  | 64.64  | 76.17|

246

| **Qwen3-Embedding-4B**           |   4B    |     69.45     | 60.86       | 79.36        | 72.33  | 57.15  | **11.56**    | 26.77         | 85.05        | 65.08  | 69.60  | 80.86|

247

| **Qwen3-Embedding-8B**           |   8B    |   **70.58**   | **61.69**   | **80.89**    | **74.00** | **57.65** | 10.06      | 28.66         | **86.40**    | **65.63** | **70.88** | **81.08** |

248

249

> **Note**: For compared models, the scores are retrieved from MTEB online [leaderboard](https://huggingface.co/spaces/mteb/leaderboard) on May 24th, 2025.

250

251

### MTEB (Eng v2)

252

253

| MTEB English / Models          |  Param.  | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retri. | STS   | Summ. |

254

|--------------------------------|:--------:|:------------:|:------------:|:--------:|:--------:|:-------------:|:---------:|:--------:|:-------:|:-------:|

255

| multilingual-e5-large-instruct |   0.6B   | 65.53      | 61.21      | 75.54  | 49.89  | 86.24       | 48.74   | 53.47  | 84.72 | 29.89 |

256

| NV-Embed-v2                    |   7.8B   | 69.81      | 65.00      | 87.19  | 47.66  | 88.69       | 49.61   | 62.84  | 83.82 | 35.21 |

257

| GritLM-7B                      |   7.2B   | 67.07      | 63.22      | 81.25  | 50.82  | 87.29       | 49.59   | 54.95  | 83.03 | 35.65 |

258

| gte-Qwen2-1.5B-instruct        |   1.5B   | 67.20      | 63.26      | 85.84  | 53.54  | 87.52       | 49.25   | 50.25  | 82.51 | 33.94 |

259

| stella_en_1.5B_v5              |   1.5B   | 69.43      | 65.32      | 89.38  | 57.06  | 88.02       | 50.19   | 52.42  | 83.27 | 36.91 |

260

| gte-Qwen2-7B-instruct          |   7.6B   | 70.72      | 65.77      | 88.52  | 58.97  | 85.9        | 50.47   | 58.09  | 82.69 | 35.74 |

261

| gemini-embedding-exp-03-07     |    -     | 73.3       | 67.67      | 90.05  | **59.39**  | **87.7**   | 48.59   | 64.35  | 85.29 | **38.28** |

262

| **Qwen3-Embedding-0.6B**       |   0.6B   | 70.70      | 64.88      | 85.76  | 54.05  | 84.37       | 48.18   | 61.83  | 86.57 | 33.43 |

263

| **Qwen3-Embedding-4B**         |    4B    | 74.60      | 68.10      | 89.84  | 57.51  | 87.01       | 50.76   | 68.46  | **88.72** | 34.39 |

264

| **Qwen3-Embedding-8B**         |    8B    | **75.22**  | **68.71**  | **90.43** | 58.57  | 87.52       | **51.56**   | **69.44**  | 88.58 | 34.83 |

265

266

### C-MTEB (MTEB Chinese)

267

268

269

|------------------|--------|------------|------------|--------|--------|-------------|---------|-------|-------|

270

| multilingual-e5-large-instruct | 0.6B   | 58.08      | 58.24      | 69.80  | 48.23  | 64.52       | 57.45   | 63.65 | 45.81 |

271

| bge-multilingual-gemma2 | 9B     | 67.64      |68.52   | 75.31      | 59.30  | 86.67  | 68.28       | 73.73   | 55.19 |

272

| gte-Qwen2-1.5B-instruct  | 1.5B   | 67.12      | 67.79      | 72.53  | 54.61  | 79.5        | 68.21   | 71.86 | 60.05 |

273

| gte-Qwen2-7B-instruct    | 7.6B   | 71.62      | 72.19      | 75.77  | 66.06  | 81.16       | 69.24   | 75.70 | 65.20 |

274

| ritrieve_zh_v1          | 0.3B   | 72.71      | 73.85      | 76.88  | 66.5   | **85.98**       | **72.86**   | 76.97 | **63.92** |

275

| **Qwen3-Embedding-0.6B** | 0.6B   | 66.33      | 67.45      | 71.40  | 68.74  | 76.42       | 62.58   | 71.03 | 54.52 |

276

| **Qwen3-Embedding-4B**   | 4B     | 72.27      | 73.51      | 75.46  | 77.89  | 83.34       | 66.05   | 77.03 | 61.26 |

277

| **Qwen3-Embedding-8B**   | 8B     | **73.84**  | **75.00**  | **76.97**  | **80.08**  | 84.23       | 66.99   | **78.21** | 63.53 |

278

279

280

## Citation

281

282

If you find our work helpful, feel free to give us a cite.

283

284

```

285

@article{qwen3embedding,

286

title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},

287

  author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},

288

journal={arXiv preprint arXiv:2506.05176},

289

year={2025}

290

}

291

```