README.md · bge-small-en-v1.5

README.md

92.6 KB · 3075 lines · markdown Raw

1	`---`
2	`tags:`
3	`- sentence-transformers`
4	`- feature-extraction`
5	`- sentence-similarity`
6	`- transformers`
7	`- mteb`
8	`model-index:`
9	`- name: bge-small-en-v1.5`
10	`results:`
11	`- task:`
12	`type: Classification`
13	`dataset:`
14	`type: mteb/amazon_counterfactual`
15	`name: MTEB AmazonCounterfactualClassification (en)`
16	`config: en`
17	`split: test`
18	`revision: e8379541af4e31359cca9fbcf4b00f2671dba205`
19	`metrics:`
20	`- type: accuracy`
21	`value: 73.79104477611939`
22	`- type: ap`
23	`value: 37.21923821573361`
24	`- type: f1`
25	`value: 68.0914945617093`
26	`- task:`
27	`type: Classification`
28	`dataset:`
29	`type: mteb/amazon_polarity`
30	`name: MTEB AmazonPolarityClassification`
31	`config: default`
32	`split: test`
33	`revision: e2d317d38cd51312af73b3d32a06d1a08b442046`
34	`metrics:`
35	`- type: accuracy`
36	`value: 92.75377499999999`
37	`- type: ap`
38	`value: 89.46766124546022`
39	`- type: f1`
40	`value: 92.73884001331487`
41	`- task:`
42	`type: Classification`
43	`dataset:`
44	`type: mteb/amazon_reviews_multi`
45	`name: MTEB AmazonReviewsClassification (en)`
46	`config: en`
47	`split: test`
48	`revision: 1399c76144fd37290681b995c656ef9b2e06e26d`
49	`metrics:`
50	`- type: accuracy`
51	`value: 46.986`
52	`- type: f1`
53	`value: 46.55936786727896`
54	`- task:`
55	`type: Retrieval`
56	`dataset:`
57	`type: arguana`
58	`name: MTEB ArguAna`
59	`config: default`
60	`split: test`
61	`revision: None`
62	`metrics:`
63	`- type: map_at_1`
64	`value: 35.846000000000004`
65	`- type: map_at_10`
66	`value: 51.388`
67	`- type: map_at_100`
68	`value: 52.132999999999996`
69	`- type: map_at_1000`
70	`value: 52.141000000000005`
71	`- type: map_at_3`
72	`value: 47.037`
73	`- type: map_at_5`
74	`value: 49.579`
75	`- type: mrr_at_1`
76	`value: 36.558`
77	`- type: mrr_at_10`
78	`value: 51.658`
79	`- type: mrr_at_100`
80	`value: 52.402`
81	`- type: mrr_at_1000`
82	`value: 52.410000000000004`
83	`- type: mrr_at_3`
84	`value: 47.345`
85	`- type: mrr_at_5`
86	`value: 49.797999999999995`
87	`- type: ndcg_at_1`
88	`value: 35.846000000000004`
89	`- type: ndcg_at_10`
90	`value: 59.550000000000004`
91	`- type: ndcg_at_100`
92	`value: 62.596`
93	`- type: ndcg_at_1000`
94	`value: 62.759`
95	`- type: ndcg_at_3`
96	`value: 50.666999999999994`
97	`- type: ndcg_at_5`
98	`value: 55.228`
99	`- type: precision_at_1`
100	`value: 35.846000000000004`
101	`- type: precision_at_10`
102	`value: 8.542`
103	`- type: precision_at_100`
104	`value: 0.984`
105	`- type: precision_at_1000`
106	`value: 0.1`
107	`- type: precision_at_3`
108	`value: 20.389`
109	`- type: precision_at_5`
110	`value: 14.438`
111	`- type: recall_at_1`
112	`value: 35.846000000000004`
113	`- type: recall_at_10`
114	`value: 85.42`
115	`- type: recall_at_100`
116	`value: 98.43499999999999`
117	`- type: recall_at_1000`
118	`value: 99.644`
119	`- type: recall_at_3`
120	`value: 61.166`
121	`- type: recall_at_5`
122	`value: 72.191`
123	`- task:`
124	`type: Clustering`
125	`dataset:`
126	`type: mteb/arxiv-clustering-p2p`
127	`name: MTEB ArxivClusteringP2P`
128	`config: default`
129	`split: test`
130	`revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d`
131	`metrics:`
132	`- type: v_measure`
133	`value: 47.402770198163594`
134	`- task:`
135	`type: Clustering`
136	`dataset:`
137	`type: mteb/arxiv-clustering-s2s`
138	`name: MTEB ArxivClusteringS2S`
139	`config: default`
140	`split: test`
141	`revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53`
142	`metrics:`
143	`- type: v_measure`
144	`value: 40.01545436974177`
145	`- task:`
146	`type: Reranking`
147	`dataset:`
148	`type: mteb/askubuntudupquestions-reranking`
149	`name: MTEB AskUbuntuDupQuestions`
150	`config: default`
151	`split: test`
152	`revision: 2000358ca161889fa9c082cb41daa8dcfb161a54`
153	`metrics:`
154	`- type: map`
155	`value: 62.586465273207196`
156	`- type: mrr`
157	`value: 74.42169019038825`
158	`- task:`
159	`type: STS`
160	`dataset:`
161	`type: mteb/biosses-sts`
162	`name: MTEB BIOSSES`
163	`config: default`
164	`split: test`
165	`revision: d3fb88f8f02e40887cd149695127462bbcf29b4a`
166	`metrics:`
167	`- type: cos_sim_pearson`
168	`value: 85.1891186537969`
169	`- type: cos_sim_spearman`
170	`value: 83.75492046087288`
171	`- type: euclidean_pearson`
172	`value: 84.11766204805357`
173	`- type: euclidean_spearman`
174	`value: 84.01456493126516`
175	`- type: manhattan_pearson`
176	`value: 84.2132950502772`
177	`- type: manhattan_spearman`
178	`value: 83.89227298813377`
179	`- task:`
180	`type: Classification`
181	`dataset:`
182	`type: mteb/banking77`
183	`name: MTEB Banking77Classification`
184	`config: default`
185	`split: test`
186	`revision: 0fd18e25b25c072e09e0d92ab615fda904d66300`
187	`metrics:`
188	`- type: accuracy`
189	`value: 85.74025974025975`
190	`- type: f1`
191	`value: 85.71493566466381`
192	`- task:`
193	`type: Clustering`
194	`dataset:`
195	`type: mteb/biorxiv-clustering-p2p`
196	`name: MTEB BiorxivClusteringP2P`
197	`config: default`
198	`split: test`
199	`revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40`
200	`metrics:`
201	`- type: v_measure`
202	`value: 38.467181385006434`
203	`- task:`
204	`type: Clustering`
205	`dataset:`
206	`type: mteb/biorxiv-clustering-s2s`
207	`name: MTEB BiorxivClusteringS2S`
208	`config: default`
209	`split: test`
210	`revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908`
211	`metrics:`
212	`- type: v_measure`
213	`value: 34.719496037339056`
214	`- task:`
215	`type: Retrieval`
216	`dataset:`
217	`type: BeIR/cqadupstack`
218	`name: MTEB CQADupstackAndroidRetrieval`
219	`config: default`
220	`split: test`
221	`revision: None`
222	`metrics:`
223	`- type: map_at_1`
224	`value: 29.587000000000003`
225	`- type: map_at_10`
226	`value: 41.114`
227	`- type: map_at_100`
228	`value: 42.532`
229	`- type: map_at_1000`
230	`value: 42.661`
231	`- type: map_at_3`
232	`value: 37.483`
233	`- type: map_at_5`
234	`value: 39.652`
235	`- type: mrr_at_1`
236	`value: 36.338`
237	`- type: mrr_at_10`
238	`value: 46.763`
239	`- type: mrr_at_100`
240	`value: 47.393`
241	`- type: mrr_at_1000`
242	`value: 47.445`
243	`- type: mrr_at_3`
244	`value: 43.538`
245	`- type: mrr_at_5`
246	`value: 45.556000000000004`
247	`- type: ndcg_at_1`
248	`value: 36.338`
249	`- type: ndcg_at_10`
250	`value: 47.658`
251	`- type: ndcg_at_100`
252	`value: 52.824000000000005`
253	`- type: ndcg_at_1000`
254	`value: 54.913999999999994`
255	`- type: ndcg_at_3`
256	`value: 41.989`
257	`- type: ndcg_at_5`
258	`value: 44.944`
259	`- type: precision_at_1`
260	`value: 36.338`
261	`- type: precision_at_10`
262	`value: 9.156`
263	`- type: precision_at_100`
264	`value: 1.4789999999999999`
265	`- type: precision_at_1000`
266	`value: 0.196`
267	`- type: precision_at_3`
268	`value: 20.076`
269	`- type: precision_at_5`
270	`value: 14.85`
271	`- type: recall_at_1`
272	`value: 29.587000000000003`
273	`- type: recall_at_10`
274	`value: 60.746`
275	`- type: recall_at_100`
276	`value: 82.157`
277	`- type: recall_at_1000`
278	`value: 95.645`
279	`- type: recall_at_3`
280	`value: 44.821`
281	`- type: recall_at_5`
282	`value: 52.819`
283	`- task:`
284	`type: Retrieval`
285	`dataset:`
286	`type: BeIR/cqadupstack`
287	`name: MTEB CQADupstackEnglishRetrieval`
288	`config: default`
289	`split: test`
290	`revision: None`
291	`metrics:`
292	`- type: map_at_1`
293	`value: 30.239`
294	`- type: map_at_10`
295	`value: 39.989000000000004`
296	`- type: map_at_100`
297	`value: 41.196`
298	`- type: map_at_1000`
299	`value: 41.325`
300	`- type: map_at_3`
301	`value: 37.261`
302	`- type: map_at_5`
303	`value: 38.833`
304	`- type: mrr_at_1`
305	`value: 37.516`
306	`- type: mrr_at_10`
307	`value: 46.177`
308	`- type: mrr_at_100`
309	`value: 46.806`
310	`- type: mrr_at_1000`
311	`value: 46.849000000000004`
312	`- type: mrr_at_3`
313	`value: 44.002`
314	`- type: mrr_at_5`
315	`value: 45.34`
316	`- type: ndcg_at_1`
317	`value: 37.516`
318	`- type: ndcg_at_10`
319	`value: 45.586`
320	`- type: ndcg_at_100`
321	`value: 49.897000000000006`
322	`- type: ndcg_at_1000`
323	`value: 51.955`
324	`- type: ndcg_at_3`
325	`value: 41.684`
326	`- type: ndcg_at_5`
327	`value: 43.617`
328	`- type: precision_at_1`
329	`value: 37.516`
330	`- type: precision_at_10`
331	`value: 8.522`
332	`- type: precision_at_100`
333	`value: 1.374`
334	`- type: precision_at_1000`
335	`value: 0.184`
336	`- type: precision_at_3`
337	`value: 20.105999999999998`
338	`- type: precision_at_5`
339	`value: 14.152999999999999`
340	`- type: recall_at_1`
341	`value: 30.239`
342	`- type: recall_at_10`
343	`value: 55.03`
344	`- type: recall_at_100`
345	`value: 73.375`
346	`- type: recall_at_1000`
347	`value: 86.29599999999999`
348	`- type: recall_at_3`
349	`value: 43.269000000000005`
350	`- type: recall_at_5`
351	`value: 48.878`
352	`- task:`
353	`type: Retrieval`
354	`dataset:`
355	`type: BeIR/cqadupstack`
356	`name: MTEB CQADupstackGamingRetrieval`
357	`config: default`
358	`split: test`
359	`revision: None`
360	`metrics:`
361	`- type: map_at_1`
362	`value: 38.338`
363	`- type: map_at_10`
364	`value: 50.468999999999994`
365	`- type: map_at_100`
366	`value: 51.553000000000004`
367	`- type: map_at_1000`
368	`value: 51.608`
369	`- type: map_at_3`
370	`value: 47.107`
371	`- type: map_at_5`
372	`value: 49.101`
373	`- type: mrr_at_1`
374	`value: 44.201`
375	`- type: mrr_at_10`
376	`value: 54.057`
377	`- type: mrr_at_100`
378	`value: 54.764`
379	`- type: mrr_at_1000`
380	`value: 54.791000000000004`
381	`- type: mrr_at_3`
382	`value: 51.56699999999999`
383	`- type: mrr_at_5`
384	`value: 53.05`
385	`- type: ndcg_at_1`
386	`value: 44.201`
387	`- type: ndcg_at_10`
388	`value: 56.379000000000005`
389	`- type: ndcg_at_100`
390	`value: 60.645`
391	`- type: ndcg_at_1000`
392	`value: 61.73499999999999`
393	`- type: ndcg_at_3`
394	`value: 50.726000000000006`
395	`- type: ndcg_at_5`
396	`value: 53.58500000000001`
397	`- type: precision_at_1`
398	`value: 44.201`
399	`- type: precision_at_10`
400	`value: 9.141`
401	`- type: precision_at_100`
402	`value: 1.216`
403	`- type: precision_at_1000`
404	`value: 0.135`
405	`- type: precision_at_3`
406	`value: 22.654`
407	`- type: precision_at_5`
408	`value: 15.723999999999998`
409	`- type: recall_at_1`
410	`value: 38.338`
411	`- type: recall_at_10`
412	`value: 70.30499999999999`
413	`- type: recall_at_100`
414	`value: 88.77199999999999`
415	`- type: recall_at_1000`
416	`value: 96.49799999999999`
417	`- type: recall_at_3`
418	`value: 55.218`
419	`- type: recall_at_5`
420	`value: 62.104000000000006`
421	`- task:`
422	`type: Retrieval`
423	`dataset:`
424	`type: BeIR/cqadupstack`
425	`name: MTEB CQADupstackGisRetrieval`
426	`config: default`
427	`split: test`
428	`revision: None`
429	`metrics:`
430	`- type: map_at_1`
431	`value: 25.682`
432	`- type: map_at_10`
433	`value: 33.498`
434	`- type: map_at_100`
435	`value: 34.461000000000006`
436	`- type: map_at_1000`
437	`value: 34.544000000000004`
438	`- type: map_at_3`
439	`value: 30.503999999999998`
440	`- type: map_at_5`
441	`value: 32.216`
442	`- type: mrr_at_1`
443	`value: 27.683999999999997`
444	`- type: mrr_at_10`
445	`value: 35.467999999999996`
446	`- type: mrr_at_100`
447	`value: 36.32`
448	`- type: mrr_at_1000`
449	`value: 36.386`
450	`- type: mrr_at_3`
451	`value: 32.618`
452	`- type: mrr_at_5`
453	`value: 34.262`
454	`- type: ndcg_at_1`
455	`value: 27.683999999999997`
456	`- type: ndcg_at_10`
457	`value: 38.378`
458	`- type: ndcg_at_100`
459	`value: 43.288`
460	`- type: ndcg_at_1000`
461	`value: 45.413`
462	`- type: ndcg_at_3`
463	`value: 32.586`
464	`- type: ndcg_at_5`
465	`value: 35.499`
466	`- type: precision_at_1`
467	`value: 27.683999999999997`
468	`- type: precision_at_10`
469	`value: 5.864`
470	`- type: precision_at_100`
471	`value: 0.882`
472	`- type: precision_at_1000`
473	`value: 0.11`
474	`- type: precision_at_3`
475	`value: 13.446`
476	`- type: precision_at_5`
477	`value: 9.718`
478	`- type: recall_at_1`
479	`value: 25.682`
480	`- type: recall_at_10`
481	`value: 51.712`
482	`- type: recall_at_100`
483	`value: 74.446`
484	`- type: recall_at_1000`
485	`value: 90.472`
486	`- type: recall_at_3`
487	`value: 36.236000000000004`
488	`- type: recall_at_5`
489	`value: 43.234`
490	`- task:`
491	`type: Retrieval`
492	`dataset:`
493	`type: BeIR/cqadupstack`
494	`name: MTEB CQADupstackMathematicaRetrieval`
495	`config: default`
496	`split: test`
497	`revision: None`
498	`metrics:`
499	`- type: map_at_1`
500	`value: 16.073999999999998`
501	`- type: map_at_10`
502	`value: 24.352999999999998`
503	`- type: map_at_100`
504	`value: 25.438`
505	`- type: map_at_1000`
506	`value: 25.545`
507	`- type: map_at_3`
508	`value: 21.614`
509	`- type: map_at_5`
510	`value: 23.104`
511	`- type: mrr_at_1`
512	`value: 19.776`
513	`- type: mrr_at_10`
514	`value: 28.837000000000003`
515	`- type: mrr_at_100`
516	`value: 29.755`
517	`- type: mrr_at_1000`
518	`value: 29.817`
519	`- type: mrr_at_3`
520	`value: 26.201999999999998`
521	`- type: mrr_at_5`
522	`value: 27.714`
523	`- type: ndcg_at_1`
524	`value: 19.776`
525	`- type: ndcg_at_10`
526	`value: 29.701`
527	`- type: ndcg_at_100`
528	`value: 35.307`
529	`- type: ndcg_at_1000`
530	`value: 37.942`
531	`- type: ndcg_at_3`
532	`value: 24.764`
533	`- type: ndcg_at_5`
534	`value: 27.025`
535	`- type: precision_at_1`
536	`value: 19.776`
537	`- type: precision_at_10`
538	`value: 5.659`
539	`- type: precision_at_100`
540	`value: 0.971`
541	`- type: precision_at_1000`
542	`value: 0.133`
543	`- type: precision_at_3`
544	`value: 12.065`
545	`- type: precision_at_5`
546	`value: 8.905000000000001`
547	`- type: recall_at_1`
548	`value: 16.073999999999998`
549	`- type: recall_at_10`
550	`value: 41.647`
551	`- type: recall_at_100`
552	`value: 66.884`
553	`- type: recall_at_1000`
554	`value: 85.91499999999999`
555	`- type: recall_at_3`
556	`value: 27.916`
557	`- type: recall_at_5`
558	`value: 33.729`
559	`- task:`
560	`type: Retrieval`
561	`dataset:`
562	`type: BeIR/cqadupstack`
563	`name: MTEB CQADupstackPhysicsRetrieval`
564	`config: default`
565	`split: test`
566	`revision: None`
567	`metrics:`
568	`- type: map_at_1`
569	`value: 28.444999999999997`
570	`- type: map_at_10`
571	`value: 38.218999999999994`
572	`- type: map_at_100`
573	`value: 39.595`
574	`- type: map_at_1000`
575	`value: 39.709`
576	`- type: map_at_3`
577	`value: 35.586`
578	`- type: map_at_5`
579	`value: 36.895`
580	`- type: mrr_at_1`
581	`value: 34.841`
582	`- type: mrr_at_10`
583	`value: 44.106`
584	`- type: mrr_at_100`
585	`value: 44.98`
586	`- type: mrr_at_1000`
587	`value: 45.03`
588	`- type: mrr_at_3`
589	`value: 41.979`
590	`- type: mrr_at_5`
591	`value: 43.047999999999995`
592	`- type: ndcg_at_1`
593	`value: 34.841`
594	`- type: ndcg_at_10`
595	`value: 43.922`
596	`- type: ndcg_at_100`
597	`value: 49.504999999999995`
598	`- type: ndcg_at_1000`
599	`value: 51.675000000000004`
600	`- type: ndcg_at_3`
601	`value: 39.858`
602	`- type: ndcg_at_5`
603	`value: 41.408`
604	`- type: precision_at_1`
605	`value: 34.841`
606	`- type: precision_at_10`
607	`value: 7.872999999999999`
608	`- type: precision_at_100`
609	`value: 1.2449999999999999`
610	`- type: precision_at_1000`
611	`value: 0.161`
612	`- type: precision_at_3`
613	`value: 18.993`
614	`- type: precision_at_5`
615	`value: 13.032`
616	`- type: recall_at_1`
617	`value: 28.444999999999997`
618	`- type: recall_at_10`
619	`value: 54.984`
620	`- type: recall_at_100`
621	`value: 78.342`
622	`- type: recall_at_1000`
623	`value: 92.77`
624	`- type: recall_at_3`
625	`value: 42.842999999999996`
626	`- type: recall_at_5`
627	`value: 47.247`
628	`- task:`
629	`type: Retrieval`
630	`dataset:`
631	`type: BeIR/cqadupstack`
632	`name: MTEB CQADupstackProgrammersRetrieval`
633	`config: default`
634	`split: test`
635	`revision: None`
636	`metrics:`
637	`- type: map_at_1`
638	`value: 23.072`
639	`- type: map_at_10`
640	`value: 32.354`
641	`- type: map_at_100`
642	`value: 33.800000000000004`
643	`- type: map_at_1000`
644	`value: 33.908`
645	`- type: map_at_3`
646	`value: 29.232000000000003`
647	`- type: map_at_5`
648	`value: 31.049`
649	`- type: mrr_at_1`
650	`value: 29.110000000000003`
651	`- type: mrr_at_10`
652	`value: 38.03`
653	`- type: mrr_at_100`
654	`value: 39.032`
655	`- type: mrr_at_1000`
656	`value: 39.086999999999996`
657	`- type: mrr_at_3`
658	`value: 35.407`
659	`- type: mrr_at_5`
660	`value: 36.76`
661	`- type: ndcg_at_1`
662	`value: 29.110000000000003`
663	`- type: ndcg_at_10`
664	`value: 38.231`
665	`- type: ndcg_at_100`
666	`value: 44.425`
667	`- type: ndcg_at_1000`
668	`value: 46.771`
669	`- type: ndcg_at_3`
670	`value: 33.095`
671	`- type: ndcg_at_5`
672	`value: 35.459`
673	`- type: precision_at_1`
674	`value: 29.110000000000003`
675	`- type: precision_at_10`
676	`value: 7.215000000000001`
677	`- type: precision_at_100`
678	`value: 1.2109999999999999`
679	`- type: precision_at_1000`
680	`value: 0.157`
681	`- type: precision_at_3`
682	`value: 16.058`
683	`- type: precision_at_5`
684	`value: 11.644`
685	`- type: recall_at_1`
686	`value: 23.072`
687	`- type: recall_at_10`
688	`value: 50.285999999999994`
689	`- type: recall_at_100`
690	`value: 76.596`
691	`- type: recall_at_1000`
692	`value: 92.861`
693	`- type: recall_at_3`
694	`value: 35.702`
695	`- type: recall_at_5`
696	`value: 42.152`
697	`- task:`
698	`type: Retrieval`
699	`dataset:`
700	`type: BeIR/cqadupstack`
701	`name: MTEB CQADupstackRetrieval`
702	`config: default`
703	`split: test`
704	`revision: None`
705	`metrics:`
706	`- type: map_at_1`
707	`value: 24.937916666666666`
708	`- type: map_at_10`
709	`value: 33.755250000000004`
710	`- type: map_at_100`
711	`value: 34.955999999999996`
712	`- type: map_at_1000`
713	`value: 35.070499999999996`
714	`- type: map_at_3`
715	`value: 30.98708333333333`
716	`- type: map_at_5`
717	`value: 32.51491666666666`
718	`- type: mrr_at_1`
719	`value: 29.48708333333333`
720	`- type: mrr_at_10`
721	`value: 37.92183333333334`
722	`- type: mrr_at_100`
723	`value: 38.76583333333333`
724	`- type: mrr_at_1000`
725	`value: 38.82466666666667`
726	`- type: mrr_at_3`
727	`value: 35.45125`
728	`- type: mrr_at_5`
729	`value: 36.827000000000005`
730	`- type: ndcg_at_1`
731	`value: 29.48708333333333`
732	`- type: ndcg_at_10`
733	`value: 39.05225`
734	`- type: ndcg_at_100`
735	`value: 44.25983333333334`
736	`- type: ndcg_at_1000`
737	`value: 46.568333333333335`
738	`- type: ndcg_at_3`
739	`value: 34.271583333333325`
740	`- type: ndcg_at_5`
741	`value: 36.483916666666666`
742	`- type: precision_at_1`
743	`value: 29.48708333333333`
744	`- type: precision_at_10`
745	`value: 6.865749999999999`
746	`- type: precision_at_100`
747	`value: 1.1195833333333332`
748	`- type: precision_at_1000`
749	`value: 0.15058333333333335`
750	`- type: precision_at_3`
751	`value: 15.742083333333333`
752	`- type: precision_at_5`
753	`value: 11.221916666666667`
754	`- type: recall_at_1`
755	`value: 24.937916666666666`
756	`- type: recall_at_10`
757	`value: 50.650416666666665`
758	`- type: recall_at_100`
759	`value: 73.55383333333334`
760	`- type: recall_at_1000`
761	`value: 89.61691666666667`
762	`- type: recall_at_3`
763	`value: 37.27808333333334`
764	`- type: recall_at_5`
765	`value: 42.99475`
766	`- task:`
767	`type: Retrieval`
768	`dataset:`
769	`type: BeIR/cqadupstack`
770	`name: MTEB CQADupstackStatsRetrieval`
771	`config: default`
772	`split: test`
773	`revision: None`
774	`metrics:`
775	`- type: map_at_1`
776	`value: 23.947`
777	`- type: map_at_10`
778	`value: 30.575000000000003`
779	`- type: map_at_100`
780	`value: 31.465`
781	`- type: map_at_1000`
782	`value: 31.558000000000003`
783	`- type: map_at_3`
784	`value: 28.814`
785	`- type: map_at_5`
786	`value: 29.738999999999997`
787	`- type: mrr_at_1`
788	`value: 26.994`
789	`- type: mrr_at_10`
790	`value: 33.415`
791	`- type: mrr_at_100`
792	`value: 34.18`
793	`- type: mrr_at_1000`
794	`value: 34.245`
795	`- type: mrr_at_3`
796	`value: 31.621`
797	`- type: mrr_at_5`
798	`value: 32.549`
799	`- type: ndcg_at_1`
800	`value: 26.994`
801	`- type: ndcg_at_10`
802	`value: 34.482`
803	`- type: ndcg_at_100`
804	`value: 38.915`
805	`- type: ndcg_at_1000`
806	`value: 41.355`
807	`- type: ndcg_at_3`
808	`value: 31.139`
809	`- type: ndcg_at_5`
810	`value: 32.589`
811	`- type: precision_at_1`
812	`value: 26.994`
813	`- type: precision_at_10`
814	`value: 5.322`
815	`- type: precision_at_100`
816	`value: 0.8160000000000001`
817	`- type: precision_at_1000`
818	`value: 0.11100000000000002`
819	`- type: precision_at_3`
820	`value: 13.344000000000001`
821	`- type: precision_at_5`
822	`value: 8.988`
823	`- type: recall_at_1`
824	`value: 23.947`
825	`- type: recall_at_10`
826	`value: 43.647999999999996`
827	`- type: recall_at_100`
828	`value: 63.851`
829	`- type: recall_at_1000`
830	`value: 82.0`
831	`- type: recall_at_3`
832	`value: 34.288000000000004`
833	`- type: recall_at_5`
834	`value: 38.117000000000004`
835	`- task:`
836	`type: Retrieval`
837	`dataset:`
838	`type: BeIR/cqadupstack`
839	`name: MTEB CQADupstackTexRetrieval`
840	`config: default`
841	`split: test`
842	`revision: None`
843	`metrics:`
844	`- type: map_at_1`
845	`value: 16.197`
846	`- type: map_at_10`
847	`value: 22.968`
848	`- type: map_at_100`
849	`value: 24.095`
850	`- type: map_at_1000`
851	`value: 24.217`
852	`- type: map_at_3`
853	`value: 20.771`
854	`- type: map_at_5`
855	`value: 21.995`
856	`- type: mrr_at_1`
857	`value: 19.511`
858	`- type: mrr_at_10`
859	`value: 26.55`
860	`- type: mrr_at_100`
861	`value: 27.500999999999998`
862	`- type: mrr_at_1000`
863	`value: 27.578999999999997`
864	`- type: mrr_at_3`
865	`value: 24.421`
866	`- type: mrr_at_5`
867	`value: 25.604`
868	`- type: ndcg_at_1`
869	`value: 19.511`
870	`- type: ndcg_at_10`
871	`value: 27.386`
872	`- type: ndcg_at_100`
873	`value: 32.828`
874	`- type: ndcg_at_1000`
875	`value: 35.739`
876	`- type: ndcg_at_3`
877	`value: 23.405`
878	`- type: ndcg_at_5`
879	`value: 25.255`
880	`- type: precision_at_1`
881	`value: 19.511`
882	`- type: precision_at_10`
883	`value: 5.017`
884	`- type: precision_at_100`
885	`value: 0.91`
886	`- type: precision_at_1000`
887	`value: 0.133`
888	`- type: precision_at_3`
889	`value: 11.023`
890	`- type: precision_at_5`
891	`value: 8.025`
892	`- type: recall_at_1`
893	`value: 16.197`
894	`- type: recall_at_10`
895	`value: 37.09`
896	`- type: recall_at_100`
897	`value: 61.778`
898	`- type: recall_at_1000`
899	`value: 82.56599999999999`
900	`- type: recall_at_3`
901	`value: 26.034000000000002`
902	`- type: recall_at_5`
903	`value: 30.762`
904	`- task:`
905	`type: Retrieval`
906	`dataset:`
907	`type: BeIR/cqadupstack`
908	`name: MTEB CQADupstackUnixRetrieval`
909	`config: default`
910	`split: test`
911	`revision: None`
912	`metrics:`
913	`- type: map_at_1`
914	`value: 25.41`
915	`- type: map_at_10`
916	`value: 33.655`
917	`- type: map_at_100`
918	`value: 34.892`
919	`- type: map_at_1000`
920	`value: 34.995`
921	`- type: map_at_3`
922	`value: 30.94`
923	`- type: map_at_5`
924	`value: 32.303`
925	`- type: mrr_at_1`
926	`value: 29.477999999999998`
927	`- type: mrr_at_10`
928	`value: 37.443`
929	`- type: mrr_at_100`
930	`value: 38.383`
931	`- type: mrr_at_1000`
932	`value: 38.440000000000005`
933	`- type: mrr_at_3`
934	`value: 34.949999999999996`
935	`- type: mrr_at_5`
936	`value: 36.228`
937	`- type: ndcg_at_1`
938	`value: 29.477999999999998`
939	`- type: ndcg_at_10`
940	`value: 38.769`
941	`- type: ndcg_at_100`
942	`value: 44.245000000000005`
943	`- type: ndcg_at_1000`
944	`value: 46.593`
945	`- type: ndcg_at_3`
946	`value: 33.623`
947	`- type: ndcg_at_5`
948	`value: 35.766`
949	`- type: precision_at_1`
950	`value: 29.477999999999998`
951	`- type: precision_at_10`
952	`value: 6.455`
953	`- type: precision_at_100`
954	`value: 1.032`
955	`- type: precision_at_1000`
956	`value: 0.135`
957	`- type: precision_at_3`
958	`value: 14.893999999999998`
959	`- type: precision_at_5`
960	`value: 10.485`
961	`- type: recall_at_1`
962	`value: 25.41`
963	`- type: recall_at_10`
964	`value: 50.669`
965	`- type: recall_at_100`
966	`value: 74.084`
967	`- type: recall_at_1000`
968	`value: 90.435`
969	`- type: recall_at_3`
970	`value: 36.679`
971	`- type: recall_at_5`
972	`value: 41.94`
973	`- task:`
974	`type: Retrieval`
975	`dataset:`
976	`type: BeIR/cqadupstack`
977	`name: MTEB CQADupstackWebmastersRetrieval`
978	`config: default`
979	`split: test`
980	`revision: None`
981	`metrics:`
982	`- type: map_at_1`
983	`value: 23.339`
984	`- type: map_at_10`
985	`value: 31.852000000000004`
986	`- type: map_at_100`
987	`value: 33.411`
988	`- type: map_at_1000`
989	`value: 33.62`
990	`- type: map_at_3`
991	`value: 28.929`
992	`- type: map_at_5`
993	`value: 30.542`
994	`- type: mrr_at_1`
995	`value: 28.063`
996	`- type: mrr_at_10`
997	`value: 36.301`
998	`- type: mrr_at_100`
999	`value: 37.288`
1000	`- type: mrr_at_1000`
1001	`value: 37.349`
1002	`- type: mrr_at_3`
1003	`value: 33.663`
1004	`- type: mrr_at_5`
1005	`value: 35.165`
1006	`- type: ndcg_at_1`
1007	`value: 28.063`
1008	`- type: ndcg_at_10`
1009	`value: 37.462`
1010	`- type: ndcg_at_100`
1011	`value: 43.620999999999995`
1012	`- type: ndcg_at_1000`
1013	`value: 46.211`
1014	`- type: ndcg_at_3`
1015	`value: 32.68`
1016	`- type: ndcg_at_5`
1017	`value: 34.981`
1018	`- type: precision_at_1`
1019	`value: 28.063`
1020	`- type: precision_at_10`
1021	`value: 7.1739999999999995`
1022	`- type: precision_at_100`
1023	`value: 1.486`
1024	`- type: precision_at_1000`
1025	`value: 0.23500000000000001`
1026	`- type: precision_at_3`
1027	`value: 15.217`
1028	`- type: precision_at_5`
1029	`value: 11.265`
1030	`- type: recall_at_1`
1031	`value: 23.339`
1032	`- type: recall_at_10`
1033	`value: 48.376999999999995`
1034	`- type: recall_at_100`
1035	`value: 76.053`
1036	`- type: recall_at_1000`
1037	`value: 92.455`
1038	`- type: recall_at_3`
1039	`value: 34.735`
1040	`- type: recall_at_5`
1041	`value: 40.71`
1042	`- task:`
1043	`type: Retrieval`
1044	`dataset:`
1045	`type: BeIR/cqadupstack`
1046	`name: MTEB CQADupstackWordpressRetrieval`
1047	`config: default`
1048	`split: test`
1049	`revision: None`
1050	`metrics:`
1051	`- type: map_at_1`
1052	`value: 18.925`
1053	`- type: map_at_10`
1054	`value: 26.017000000000003`
1055	`- type: map_at_100`
1056	`value: 27.034000000000002`
1057	`- type: map_at_1000`
1058	`value: 27.156000000000002`
1059	`- type: map_at_3`
1060	`value: 23.604`
1061	`- type: map_at_5`
1062	`value: 24.75`
1063	`- type: mrr_at_1`
1064	`value: 20.333000000000002`
1065	`- type: mrr_at_10`
1066	`value: 27.915`
1067	`- type: mrr_at_100`
1068	`value: 28.788000000000004`
1069	`- type: mrr_at_1000`
1070	`value: 28.877999999999997`
1071	`- type: mrr_at_3`
1072	`value: 25.446999999999996`
1073	`- type: mrr_at_5`
1074	`value: 26.648`
1075	`- type: ndcg_at_1`
1076	`value: 20.333000000000002`
1077	`- type: ndcg_at_10`
1078	`value: 30.673000000000002`
1079	`- type: ndcg_at_100`
1080	`value: 35.618`
1081	`- type: ndcg_at_1000`
1082	`value: 38.517`
1083	`- type: ndcg_at_3`
1084	`value: 25.71`
1085	`- type: ndcg_at_5`
1086	`value: 27.679`
1087	`- type: precision_at_1`
1088	`value: 20.333000000000002`
1089	`- type: precision_at_10`
1090	`value: 4.9910000000000005`
1091	`- type: precision_at_100`
1092	`value: 0.8130000000000001`
1093	`- type: precision_at_1000`
1094	`value: 0.117`
1095	`- type: precision_at_3`
1096	`value: 11.029`
1097	`- type: precision_at_5`
1098	`value: 7.8740000000000006`
1099	`- type: recall_at_1`
1100	`value: 18.925`
1101	`- type: recall_at_10`
1102	`value: 43.311`
1103	`- type: recall_at_100`
1104	`value: 66.308`
1105	`- type: recall_at_1000`
1106	`value: 87.49`
1107	`- type: recall_at_3`
1108	`value: 29.596`
1109	`- type: recall_at_5`
1110	`value: 34.245`
1111	`- task:`
1112	`type: Retrieval`
1113	`dataset:`
1114	`type: climate-fever`
1115	`name: MTEB ClimateFEVER`
1116	`config: default`
1117	`split: test`
1118	`revision: None`
1119	`metrics:`
1120	`- type: map_at_1`
1121	`value: 13.714`
1122	`- type: map_at_10`
1123	`value: 23.194`
1124	`- type: map_at_100`
1125	`value: 24.976000000000003`
1126	`- type: map_at_1000`
1127	`value: 25.166`
1128	`- type: map_at_3`
1129	`value: 19.709`
1130	`- type: map_at_5`
1131	`value: 21.523999999999997`
1132	`- type: mrr_at_1`
1133	`value: 30.619000000000003`
1134	`- type: mrr_at_10`
1135	`value: 42.563`
1136	`- type: mrr_at_100`
1137	`value: 43.386`
1138	`- type: mrr_at_1000`
1139	`value: 43.423`
1140	`- type: mrr_at_3`
1141	`value: 39.555`
1142	`- type: mrr_at_5`
1143	`value: 41.268`
1144	`- type: ndcg_at_1`
1145	`value: 30.619000000000003`
1146	`- type: ndcg_at_10`
1147	`value: 31.836`
1148	`- type: ndcg_at_100`
1149	`value: 38.652`
1150	`- type: ndcg_at_1000`
1151	`value: 42.088`
1152	`- type: ndcg_at_3`
1153	`value: 26.733`
1154	`- type: ndcg_at_5`
1155	`value: 28.435`
1156	`- type: precision_at_1`
1157	`value: 30.619000000000003`
1158	`- type: precision_at_10`
1159	`value: 9.751999999999999`
1160	`- type: precision_at_100`
1161	`value: 1.71`
1162	`- type: precision_at_1000`
1163	`value: 0.23500000000000001`
1164	`- type: precision_at_3`
1165	`value: 19.935`
1166	`- type: precision_at_5`
1167	`value: 14.984`
1168	`- type: recall_at_1`
1169	`value: 13.714`
1170	`- type: recall_at_10`
1171	`value: 37.26`
1172	`- type: recall_at_100`
1173	`value: 60.546`
1174	`- type: recall_at_1000`
1175	`value: 79.899`
1176	`- type: recall_at_3`
1177	`value: 24.325`
1178	`- type: recall_at_5`
1179	`value: 29.725`
1180	`- task:`
1181	`type: Retrieval`
1182	`dataset:`
1183	`type: dbpedia-entity`
1184	`name: MTEB DBPedia`
1185	`config: default`
1186	`split: test`
1187	`revision: None`
1188	`metrics:`
1189	`- type: map_at_1`
1190	`value: 8.462`
1191	`- type: map_at_10`
1192	`value: 18.637`
1193	`- type: map_at_100`
1194	`value: 26.131999999999998`
1195	`- type: map_at_1000`
1196	`value: 27.607`
1197	`- type: map_at_3`
1198	`value: 13.333`
1199	`- type: map_at_5`
1200	`value: 15.654000000000002`
1201	`- type: mrr_at_1`
1202	`value: 66.25`
1203	`- type: mrr_at_10`
1204	`value: 74.32600000000001`
1205	`- type: mrr_at_100`
1206	`value: 74.60900000000001`
1207	`- type: mrr_at_1000`
1208	`value: 74.62`
1209	`- type: mrr_at_3`
1210	`value: 72.667`
1211	`- type: mrr_at_5`
1212	`value: 73.817`
1213	`- type: ndcg_at_1`
1214	`value: 53.87499999999999`
1215	`- type: ndcg_at_10`
1216	`value: 40.028999999999996`
1217	`- type: ndcg_at_100`
1218	`value: 44.199`
1219	`- type: ndcg_at_1000`
1220	`value: 51.629999999999995`
1221	`- type: ndcg_at_3`
1222	`value: 44.113`
1223	`- type: ndcg_at_5`
1224	`value: 41.731`
1225	`- type: precision_at_1`
1226	`value: 66.25`
1227	`- type: precision_at_10`
1228	`value: 31.900000000000002`
1229	`- type: precision_at_100`
1230	`value: 10.043000000000001`
1231	`- type: precision_at_1000`
1232	`value: 1.926`
1233	`- type: precision_at_3`
1234	`value: 47.417`
1235	`- type: precision_at_5`
1236	`value: 40.65`
1237	`- type: recall_at_1`
1238	`value: 8.462`
1239	`- type: recall_at_10`
1240	`value: 24.293`
1241	`- type: recall_at_100`
1242	`value: 50.146`
1243	`- type: recall_at_1000`
1244	`value: 74.034`
1245	`- type: recall_at_3`
1246	`value: 14.967`
1247	`- type: recall_at_5`
1248	`value: 18.682000000000002`
1249	`- task:`
1250	`type: Classification`
1251	`dataset:`
1252	`type: mteb/emotion`
1253	`name: MTEB EmotionClassification`
1254	`config: default`
1255	`split: test`
1256	`revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37`
1257	`metrics:`
1258	`- type: accuracy`
1259	`value: 47.84499999999999`
1260	`- type: f1`
1261	`value: 42.48106691979349`
1262	`- task:`
1263	`type: Retrieval`
1264	`dataset:`
1265	`type: fever`
1266	`name: MTEB FEVER`
1267	`config: default`
1268	`split: test`
1269	`revision: None`
1270	`metrics:`
1271	`- type: map_at_1`
1272	`value: 74.034`
1273	`- type: map_at_10`
1274	`value: 82.76`
1275	`- type: map_at_100`
1276	`value: 82.968`
1277	`- type: map_at_1000`
1278	`value: 82.98299999999999`
1279	`- type: map_at_3`
1280	`value: 81.768`
1281	`- type: map_at_5`
1282	`value: 82.418`
1283	`- type: mrr_at_1`
1284	`value: 80.048`
1285	`- type: mrr_at_10`
1286	`value: 87.64999999999999`
1287	`- type: mrr_at_100`
1288	`value: 87.712`
1289	`- type: mrr_at_1000`
1290	`value: 87.713`
1291	`- type: mrr_at_3`
1292	`value: 87.01100000000001`
1293	`- type: mrr_at_5`
1294	`value: 87.466`
1295	`- type: ndcg_at_1`
1296	`value: 80.048`
1297	`- type: ndcg_at_10`
1298	`value: 86.643`
1299	`- type: ndcg_at_100`
1300	`value: 87.361`
1301	`- type: ndcg_at_1000`
1302	`value: 87.606`
1303	`- type: ndcg_at_3`
1304	`value: 85.137`
1305	`- type: ndcg_at_5`
1306	`value: 86.016`
1307	`- type: precision_at_1`
1308	`value: 80.048`
1309	`- type: precision_at_10`
1310	`value: 10.372`
1311	`- type: precision_at_100`
1312	`value: 1.093`
1313	`- type: precision_at_1000`
1314	`value: 0.11299999999999999`
1315	`- type: precision_at_3`
1316	`value: 32.638`
1317	`- type: precision_at_5`
1318	`value: 20.177`
1319	`- type: recall_at_1`
1320	`value: 74.034`
1321	`- type: recall_at_10`
1322	`value: 93.769`
1323	`- type: recall_at_100`
1324	`value: 96.569`
1325	`- type: recall_at_1000`
1326	`value: 98.039`
1327	`- type: recall_at_3`
1328	`value: 89.581`
1329	`- type: recall_at_5`
1330	`value: 91.906`
1331	`- task:`
1332	`type: Retrieval`
1333	`dataset:`
1334	`type: fiqa`
1335	`name: MTEB FiQA2018`
1336	`config: default`
1337	`split: test`
1338	`revision: None`
1339	`metrics:`
1340	`- type: map_at_1`
1341	`value: 20.5`
1342	`- type: map_at_10`
1343	`value: 32.857`
1344	`- type: map_at_100`
1345	`value: 34.589`
1346	`- type: map_at_1000`
1347	`value: 34.778`
1348	`- type: map_at_3`
1349	`value: 29.160999999999998`
1350	`- type: map_at_5`
1351	`value: 31.033`
1352	`- type: mrr_at_1`
1353	`value: 40.123`
1354	`- type: mrr_at_10`
1355	`value: 48.776`
1356	`- type: mrr_at_100`
1357	`value: 49.495`
1358	`- type: mrr_at_1000`
1359	`value: 49.539`
1360	`- type: mrr_at_3`
1361	`value: 46.605000000000004`
1362	`- type: mrr_at_5`
1363	`value: 47.654`
1364	`- type: ndcg_at_1`
1365	`value: 40.123`
1366	`- type: ndcg_at_10`
1367	`value: 40.343`
1368	`- type: ndcg_at_100`
1369	`value: 46.56`
1370	`- type: ndcg_at_1000`
1371	`value: 49.777`
1372	`- type: ndcg_at_3`
1373	`value: 37.322`
1374	`- type: ndcg_at_5`
1375	`value: 37.791000000000004`
1376	`- type: precision_at_1`
1377	`value: 40.123`
1378	`- type: precision_at_10`
1379	`value: 11.08`
1380	`- type: precision_at_100`
1381	`value: 1.752`
1382	`- type: precision_at_1000`
1383	`value: 0.232`
1384	`- type: precision_at_3`
1385	`value: 24.897`
1386	`- type: precision_at_5`
1387	`value: 17.809`
1388	`- type: recall_at_1`
1389	`value: 20.5`
1390	`- type: recall_at_10`
1391	`value: 46.388`
1392	`- type: recall_at_100`
1393	`value: 69.552`
1394	`- type: recall_at_1000`
1395	`value: 89.011`
1396	`- type: recall_at_3`
1397	`value: 33.617999999999995`
1398	`- type: recall_at_5`
1399	`value: 38.211`
1400	`- task:`
1401	`type: Retrieval`
1402	`dataset:`
1403	`type: hotpotqa`
1404	`name: MTEB HotpotQA`
1405	`config: default`
1406	`split: test`
1407	`revision: None`
1408	`metrics:`
1409	`- type: map_at_1`
1410	`value: 39.135999999999996`
1411	`- type: map_at_10`
1412	`value: 61.673`
1413	`- type: map_at_100`
1414	`value: 62.562`
1415	`- type: map_at_1000`
1416	`value: 62.62`
1417	`- type: map_at_3`
1418	`value: 58.467999999999996`
1419	`- type: map_at_5`
1420	`value: 60.463`
1421	`- type: mrr_at_1`
1422	`value: 78.271`
1423	`- type: mrr_at_10`
1424	`value: 84.119`
1425	`- type: mrr_at_100`
1426	`value: 84.29299999999999`
1427	`- type: mrr_at_1000`
1428	`value: 84.299`
1429	`- type: mrr_at_3`
1430	`value: 83.18900000000001`
1431	`- type: mrr_at_5`
1432	`value: 83.786`
1433	`- type: ndcg_at_1`
1434	`value: 78.271`
1435	`- type: ndcg_at_10`
1436	`value: 69.935`
1437	`- type: ndcg_at_100`
1438	`value: 73.01299999999999`
1439	`- type: ndcg_at_1000`
1440	`value: 74.126`
1441	`- type: ndcg_at_3`
1442	`value: 65.388`
1443	`- type: ndcg_at_5`
1444	`value: 67.906`
1445	`- type: precision_at_1`
1446	`value: 78.271`
1447	`- type: precision_at_10`
1448	`value: 14.562`
1449	`- type: precision_at_100`
1450	`value: 1.6969999999999998`
1451	`- type: precision_at_1000`
1452	`value: 0.184`
1453	`- type: precision_at_3`
1454	`value: 41.841`
1455	`- type: precision_at_5`
1456	`value: 27.087`
1457	`- type: recall_at_1`
1458	`value: 39.135999999999996`
1459	`- type: recall_at_10`
1460	`value: 72.809`
1461	`- type: recall_at_100`
1462	`value: 84.86200000000001`
1463	`- type: recall_at_1000`
1464	`value: 92.208`
1465	`- type: recall_at_3`
1466	`value: 62.76199999999999`
1467	`- type: recall_at_5`
1468	`value: 67.718`
1469	`- task:`
1470	`type: Classification`
1471	`dataset:`
1472	`type: mteb/imdb`
1473	`name: MTEB ImdbClassification`
1474	`config: default`
1475	`split: test`
1476	`revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7`
1477	`metrics:`
1478	`- type: accuracy`
1479	`value: 90.60600000000001`
1480	`- type: ap`
1481	`value: 86.6579587804335`
1482	`- type: f1`
1483	`value: 90.5938853929307`
1484	`- task:`
1485	`type: Retrieval`
1486	`dataset:`
1487	`type: msmarco`
1488	`name: MTEB MSMARCO`
1489	`config: default`
1490	`split: dev`
1491	`revision: None`
1492	`metrics:`
1493	`- type: map_at_1`
1494	`value: 21.852`
1495	`- type: map_at_10`
1496	`value: 33.982`
1497	`- type: map_at_100`
1498	`value: 35.116`
1499	`- type: map_at_1000`
1500	`value: 35.167`
1501	`- type: map_at_3`
1502	`value: 30.134`
1503	`- type: map_at_5`
1504	`value: 32.340999999999994`
1505	`- type: mrr_at_1`
1506	`value: 22.479`
1507	`- type: mrr_at_10`
1508	`value: 34.594`
1509	`- type: mrr_at_100`
1510	`value: 35.672`
1511	`- type: mrr_at_1000`
1512	`value: 35.716`
1513	`- type: mrr_at_3`
1514	`value: 30.84`
1515	`- type: mrr_at_5`
1516	`value: 32.998`
1517	`- type: ndcg_at_1`
1518	`value: 22.493`
1519	`- type: ndcg_at_10`
1520	`value: 40.833000000000006`
1521	`- type: ndcg_at_100`
1522	`value: 46.357`
1523	`- type: ndcg_at_1000`
1524	`value: 47.637`
1525	`- type: ndcg_at_3`
1526	`value: 32.995999999999995`
1527	`- type: ndcg_at_5`
1528	`value: 36.919000000000004`
1529	`- type: precision_at_1`
1530	`value: 22.493`
1531	`- type: precision_at_10`
1532	`value: 6.465999999999999`
1533	`- type: precision_at_100`
1534	`value: 0.9249999999999999`
1535	`- type: precision_at_1000`
1536	`value: 0.104`
1537	`- type: precision_at_3`
1538	`value: 14.030999999999999`
1539	`- type: precision_at_5`
1540	`value: 10.413`
1541	`- type: recall_at_1`
1542	`value: 21.852`
1543	`- type: recall_at_10`
1544	`value: 61.934999999999995`
1545	`- type: recall_at_100`
1546	`value: 87.611`
1547	`- type: recall_at_1000`
1548	`value: 97.441`
1549	`- type: recall_at_3`
1550	`value: 40.583999999999996`
1551	`- type: recall_at_5`
1552	`value: 49.992999999999995`
1553	`- task:`
1554	`type: Classification`
1555	`dataset:`
1556	`type: mteb/mtop_domain`
1557	`name: MTEB MTOPDomainClassification (en)`
1558	`config: en`
1559	`split: test`
1560	`revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf`
1561	`metrics:`
1562	`- type: accuracy`
1563	`value: 93.36069311445507`
1564	`- type: f1`
1565	`value: 93.16456330371453`
1566	`- task:`
1567	`type: Classification`
1568	`dataset:`
1569	`type: mteb/mtop_intent`
1570	`name: MTEB MTOPIntentClassification (en)`
1571	`config: en`
1572	`split: test`
1573	`revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba`
1574	`metrics:`
1575	`- type: accuracy`
1576	`value: 74.74692202462381`
1577	`- type: f1`
1578	`value: 58.17903579421599`
1579	`- task:`
1580	`type: Classification`
1581	`dataset:`
1582	`type: mteb/amazon_massive_intent`
1583	`name: MTEB MassiveIntentClassification (en)`
1584	`config: en`
1585	`split: test`
1586	`revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7`
1587	`metrics:`
1588	`- type: accuracy`
1589	`value: 74.80833893745796`
1590	`- type: f1`
1591	`value: 72.70786592684664`
1592	`- task:`
1593	`type: Classification`
1594	`dataset:`
1595	`type: mteb/amazon_massive_scenario`
1596	`name: MTEB MassiveScenarioClassification (en)`
1597	`config: en`
1598	`split: test`
1599	`revision: 7d571f92784cd94a019292a1f45445077d0ef634`
1600	`metrics:`
1601	`- type: accuracy`
1602	`value: 78.69872225958305`
1603	`- type: f1`
1604	`value: 78.61626934504731`
1605	`- task:`
1606	`type: Clustering`
1607	`dataset:`
1608	`type: mteb/medrxiv-clustering-p2p`
1609	`name: MTEB MedrxivClusteringP2P`
1610	`config: default`
1611	`split: test`
1612	`revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73`
1613	`metrics:`
1614	`- type: v_measure`
1615	`value: 33.058658628717694`
1616	`- task:`
1617	`type: Clustering`
1618	`dataset:`
1619	`type: mteb/medrxiv-clustering-s2s`
1620	`name: MTEB MedrxivClusteringS2S`
1621	`config: default`
1622	`split: test`
1623	`revision: 35191c8c0dca72d8ff3efcd72aa802307d469663`
1624	`metrics:`
1625	`- type: v_measure`
1626	`value: 30.85561739360599`
1627	`- task:`
1628	`type: Reranking`
1629	`dataset:`
1630	`type: mteb/mind_small`
1631	`name: MTEB MindSmallReranking`
1632	`config: default`
1633	`split: test`
1634	`revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69`
1635	`metrics:`
1636	`- type: map`
1637	`value: 31.290259910144385`
1638	`- type: mrr`
1639	`value: 32.44223046102856`
1640	`- task:`
1641	`type: Retrieval`
1642	`dataset:`
1643	`type: nfcorpus`
1644	`name: MTEB NFCorpus`
1645	`config: default`
1646	`split: test`
1647	`revision: None`
1648	`metrics:`
1649	`- type: map_at_1`
1650	`value: 5.288`
1651	`- type: map_at_10`
1652	`value: 12.267999999999999`
1653	`- type: map_at_100`
1654	`value: 15.557000000000002`
1655	`- type: map_at_1000`
1656	`value: 16.98`
1657	`- type: map_at_3`
1658	`value: 8.866`
1659	`- type: map_at_5`
1660	`value: 10.418`
1661	`- type: mrr_at_1`
1662	`value: 43.653`
1663	`- type: mrr_at_10`
1664	`value: 52.681`
1665	`- type: mrr_at_100`
1666	`value: 53.315999999999995`
1667	`- type: mrr_at_1000`
1668	`value: 53.357`
1669	`- type: mrr_at_3`
1670	`value: 51.393`
1671	`- type: mrr_at_5`
1672	`value: 51.903999999999996`
1673	`- type: ndcg_at_1`
1674	`value: 42.415000000000006`
1675	`- type: ndcg_at_10`
1676	`value: 34.305`
1677	`- type: ndcg_at_100`
1678	`value: 30.825999999999997`
1679	`- type: ndcg_at_1000`
1680	`value: 39.393`
1681	`- type: ndcg_at_3`
1682	`value: 39.931`
1683	`- type: ndcg_at_5`
1684	`value: 37.519999999999996`
1685	`- type: precision_at_1`
1686	`value: 43.653`
1687	`- type: precision_at_10`
1688	`value: 25.728`
1689	`- type: precision_at_100`
1690	`value: 7.932`
1691	`- type: precision_at_1000`
1692	`value: 2.07`
1693	`- type: precision_at_3`
1694	`value: 38.184000000000005`
1695	`- type: precision_at_5`
1696	`value: 32.879000000000005`
1697	`- type: recall_at_1`
1698	`value: 5.288`
1699	`- type: recall_at_10`
1700	`value: 16.195`
1701	`- type: recall_at_100`
1702	`value: 31.135`
1703	`- type: recall_at_1000`
1704	`value: 61.531000000000006`
1705	`- type: recall_at_3`
1706	`value: 10.313`
1707	`- type: recall_at_5`
1708	`value: 12.754999999999999`
1709	`- task:`
1710	`type: Retrieval`
1711	`dataset:`
1712	`type: nq`
1713	`name: MTEB NQ`
1714	`config: default`
1715	`split: test`
1716	`revision: None`
1717	`metrics:`
1718	`- type: map_at_1`
1719	`value: 28.216`
1720	`- type: map_at_10`
1721	`value: 42.588`
1722	`- type: map_at_100`
1723	`value: 43.702999999999996`
1724	`- type: map_at_1000`
1725	`value: 43.739`
1726	`- type: map_at_3`
1727	`value: 38.177`
1728	`- type: map_at_5`
1729	`value: 40.754000000000005`
1730	`- type: mrr_at_1`
1731	`value: 31.866`
1732	`- type: mrr_at_10`
1733	`value: 45.189`
1734	`- type: mrr_at_100`
1735	`value: 46.056000000000004`
1736	`- type: mrr_at_1000`
1737	`value: 46.081`
1738	`- type: mrr_at_3`
1739	`value: 41.526999999999994`
1740	`- type: mrr_at_5`
1741	`value: 43.704`
1742	`- type: ndcg_at_1`
1743	`value: 31.837`
1744	`- type: ndcg_at_10`
1745	`value: 50.178`
1746	`- type: ndcg_at_100`
1747	`value: 54.98800000000001`
1748	`- type: ndcg_at_1000`
1749	`value: 55.812`
1750	`- type: ndcg_at_3`
1751	`value: 41.853`
1752	`- type: ndcg_at_5`
1753	`value: 46.153`
1754	`- type: precision_at_1`
1755	`value: 31.837`
1756	`- type: precision_at_10`
1757	`value: 8.43`
1758	`- type: precision_at_100`
1759	`value: 1.1119999999999999`
1760	`- type: precision_at_1000`
1761	`value: 0.11900000000000001`
1762	`- type: precision_at_3`
1763	`value: 19.023`
1764	`- type: precision_at_5`
1765	`value: 13.911000000000001`
1766	`- type: recall_at_1`
1767	`value: 28.216`
1768	`- type: recall_at_10`
1769	`value: 70.8`
1770	`- type: recall_at_100`
1771	`value: 91.857`
1772	`- type: recall_at_1000`
1773	`value: 97.941`
1774	`- type: recall_at_3`
1775	`value: 49.196`
1776	`- type: recall_at_5`
1777	`value: 59.072`
1778	`- task:`
1779	`type: Retrieval`
1780	`dataset:`
1781	`type: quora`
1782	`name: MTEB QuoraRetrieval`
1783	`config: default`
1784	`split: test`
1785	`revision: None`
1786	`metrics:`
1787	`- type: map_at_1`
1788	`value: 71.22800000000001`
1789	`- type: map_at_10`
1790	`value: 85.115`
1791	`- type: map_at_100`
1792	`value: 85.72`
1793	`- type: map_at_1000`
1794	`value: 85.737`
1795	`- type: map_at_3`
1796	`value: 82.149`
1797	`- type: map_at_5`
1798	`value: 84.029`
1799	`- type: mrr_at_1`
1800	`value: 81.96`
1801	`- type: mrr_at_10`
1802	`value: 88.00200000000001`
1803	`- type: mrr_at_100`
1804	`value: 88.088`
1805	`- type: mrr_at_1000`
1806	`value: 88.089`
1807	`- type: mrr_at_3`
1808	`value: 87.055`
1809	`- type: mrr_at_5`
1810	`value: 87.715`
1811	`- type: ndcg_at_1`
1812	`value: 82.01`
1813	`- type: ndcg_at_10`
1814	`value: 88.78`
1815	`- type: ndcg_at_100`
1816	`value: 89.91`
1817	`- type: ndcg_at_1000`
1818	`value: 90.013`
1819	`- type: ndcg_at_3`
1820	`value: 85.957`
1821	`- type: ndcg_at_5`
1822	`value: 87.56`
1823	`- type: precision_at_1`
1824	`value: 82.01`
1825	`- type: precision_at_10`
1826	`value: 13.462`
1827	`- type: precision_at_100`
1828	`value: 1.528`
1829	`- type: precision_at_1000`
1830	`value: 0.157`
1831	`- type: precision_at_3`
1832	`value: 37.553`
1833	`- type: precision_at_5`
1834	`value: 24.732000000000003`
1835	`- type: recall_at_1`
1836	`value: 71.22800000000001`
1837	`- type: recall_at_10`
1838	`value: 95.69`
1839	`- type: recall_at_100`
1840	`value: 99.531`
1841	`- type: recall_at_1000`
1842	`value: 99.98`
1843	`- type: recall_at_3`
1844	`value: 87.632`
1845	`- type: recall_at_5`
1846	`value: 92.117`
1847	`- task:`
1848	`type: Clustering`
1849	`dataset:`
1850	`type: mteb/reddit-clustering`
1851	`name: MTEB RedditClustering`
1852	`config: default`
1853	`split: test`
1854	`revision: 24640382cdbf8abc73003fb0fa6d111a705499eb`
1855	`metrics:`
1856	`- type: v_measure`
1857	`value: 52.31768034366916`
1858	`- task:`
1859	`type: Clustering`
1860	`dataset:`
1861	`type: mteb/reddit-clustering-p2p`
1862	`name: MTEB RedditClusteringP2P`
1863	`config: default`
1864	`split: test`
1865	`revision: 282350215ef01743dc01b456c7f5241fa8937f16`
1866	`metrics:`
1867	`- type: v_measure`
1868	`value: 60.640266772723606`
1869	`- task:`
1870	`type: Retrieval`
1871	`dataset:`
1872	`type: scidocs`
1873	`name: MTEB SCIDOCS`
1874	`config: default`
1875	`split: test`
1876	`revision: None`
1877	`metrics:`
1878	`- type: map_at_1`
1879	`value: 4.7780000000000005`
1880	`- type: map_at_10`
1881	`value: 12.299`
1882	`- type: map_at_100`
1883	`value: 14.363000000000001`
1884	`- type: map_at_1000`
1885	`value: 14.71`
1886	`- type: map_at_3`
1887	`value: 8.738999999999999`
1888	`- type: map_at_5`
1889	`value: 10.397`
1890	`- type: mrr_at_1`
1891	`value: 23.599999999999998`
1892	`- type: mrr_at_10`
1893	`value: 34.845`
1894	`- type: mrr_at_100`
1895	`value: 35.916`
1896	`- type: mrr_at_1000`
1897	`value: 35.973`
1898	`- type: mrr_at_3`
1899	`value: 31.7`
1900	`- type: mrr_at_5`
1901	`value: 33.535`
1902	`- type: ndcg_at_1`
1903	`value: 23.599999999999998`
1904	`- type: ndcg_at_10`
1905	`value: 20.522000000000002`
1906	`- type: ndcg_at_100`
1907	`value: 28.737000000000002`
1908	`- type: ndcg_at_1000`
1909	`value: 34.596`
1910	`- type: ndcg_at_3`
1911	`value: 19.542`
1912	`- type: ndcg_at_5`
1913	`value: 16.958000000000002`
1914	`- type: precision_at_1`
1915	`value: 23.599999999999998`
1916	`- type: precision_at_10`
1917	`value: 10.67`
1918	`- type: precision_at_100`
1919	`value: 2.259`
1920	`- type: precision_at_1000`
1921	`value: 0.367`
1922	`- type: precision_at_3`
1923	`value: 18.333`
1924	`- type: precision_at_5`
1925	`value: 14.879999999999999`
1926	`- type: recall_at_1`
1927	`value: 4.7780000000000005`
1928	`- type: recall_at_10`
1929	`value: 21.617`
1930	`- type: recall_at_100`
1931	`value: 45.905`
1932	`- type: recall_at_1000`
1933	`value: 74.42`
1934	`- type: recall_at_3`
1935	`value: 11.148`
1936	`- type: recall_at_5`
1937	`value: 15.082999999999998`
1938	`- task:`
1939	`type: STS`
1940	`dataset:`
1941	`type: mteb/sickr-sts`
1942	`name: MTEB SICK-R`
1943	`config: default`
1944	`split: test`
1945	`revision: a6ea5a8cab320b040a23452cc28066d9beae2cee`
1946	`metrics:`
1947	`- type: cos_sim_pearson`
1948	`value: 83.22372750297885`
1949	`- type: cos_sim_spearman`
1950	`value: 79.40972617119405`
1951	`- type: euclidean_pearson`
1952	`value: 80.6101072020434`
1953	`- type: euclidean_spearman`
1954	`value: 79.53844217225202`
1955	`- type: manhattan_pearson`
1956	`value: 80.57265975286111`
1957	`- type: manhattan_spearman`
1958	`value: 79.46335611792958`
1959	`- task:`
1960	`type: STS`
1961	`dataset:`
1962	`type: mteb/sts12-sts`
1963	`name: MTEB STS12`
1964	`config: default`
1965	`split: test`
1966	`revision: a0d554a64d88156834ff5ae9920b964011b16384`
1967	`metrics:`
1968	`- type: cos_sim_pearson`
1969	`value: 85.43713315520749`
1970	`- type: cos_sim_spearman`
1971	`value: 77.44128693329532`
1972	`- type: euclidean_pearson`
1973	`value: 81.63869928101123`
1974	`- type: euclidean_spearman`
1975	`value: 77.29512977961515`
1976	`- type: manhattan_pearson`
1977	`value: 81.63704185566183`
1978	`- type: manhattan_spearman`
1979	`value: 77.29909412738657`
1980	`- task:`
1981	`type: STS`
1982	`dataset:`
1983	`type: mteb/sts13-sts`
1984	`name: MTEB STS13`
1985	`config: default`
1986	`split: test`
1987	`revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca`
1988	`metrics:`
1989	`- type: cos_sim_pearson`
1990	`value: 81.59451537860527`
1991	`- type: cos_sim_spearman`
1992	`value: 82.97994638856723`
1993	`- type: euclidean_pearson`
1994	`value: 82.89478688288412`
1995	`- type: euclidean_spearman`
1996	`value: 83.58740751053104`
1997	`- type: manhattan_pearson`
1998	`value: 82.69140840941608`
1999	`- type: manhattan_spearman`
2000	`value: 83.33665956040555`
2001	`- task:`
2002	`type: STS`
2003	`dataset:`
2004	`type: mteb/sts14-sts`
2005	`name: MTEB STS14`
2006	`config: default`
2007	`split: test`
2008	`revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375`
2009	`metrics:`
2010	`- type: cos_sim_pearson`
2011	`value: 82.00756527711764`
2012	`- type: cos_sim_spearman`
2013	`value: 81.83560996841379`
2014	`- type: euclidean_pearson`
2015	`value: 82.07684151976518`
2016	`- type: euclidean_spearman`
2017	`value: 82.00913052060511`
2018	`- type: manhattan_pearson`
2019	`value: 82.05690778488794`
2020	`- type: manhattan_spearman`
2021	`value: 82.02260252019525`
2022	`- task:`
2023	`type: STS`
2024	`dataset:`
2025	`type: mteb/sts15-sts`
2026	`name: MTEB STS15`
2027	`config: default`
2028	`split: test`
2029	`revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3`
2030	`metrics:`
2031	`- type: cos_sim_pearson`
2032	`value: 86.13710262895447`
2033	`- type: cos_sim_spearman`
2034	`value: 87.26412811156248`
2035	`- type: euclidean_pearson`
2036	`value: 86.94151453230228`
2037	`- type: euclidean_spearman`
2038	`value: 87.5363796699571`
2039	`- type: manhattan_pearson`
2040	`value: 86.86989424083748`
2041	`- type: manhattan_spearman`
2042	`value: 87.47315940781353`
2043	`- task:`
2044	`type: STS`
2045	`dataset:`
2046	`type: mteb/sts16-sts`
2047	`name: MTEB STS16`
2048	`config: default`
2049	`split: test`
2050	`revision: 4d8694f8f0e0100860b497b999b3dbed754a0513`
2051	`metrics:`
2052	`- type: cos_sim_pearson`
2053	`value: 83.0230597603627`
2054	`- type: cos_sim_spearman`
2055	`value: 84.93344499318864`
2056	`- type: euclidean_pearson`
2057	`value: 84.23754743431141`
2058	`- type: euclidean_spearman`
2059	`value: 85.09707376597099`
2060	`- type: manhattan_pearson`
2061	`value: 84.04325160987763`
2062	`- type: manhattan_spearman`
2063	`value: 84.89353071339909`
2064	`- task:`
2065	`type: STS`
2066	`dataset:`
2067	`type: mteb/sts17-crosslingual-sts`
2068	`name: MTEB STS17 (en-en)`
2069	`config: en-en`
2070	`split: test`
2071	`revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d`
2072	`metrics:`
2073	`- type: cos_sim_pearson`
2074	`value: 86.75620824563921`
2075	`- type: cos_sim_spearman`
2076	`value: 87.15065513706398`
2077	`- type: euclidean_pearson`
2078	`value: 88.26281533633521`
2079	`- type: euclidean_spearman`
2080	`value: 87.51963738643983`
2081	`- type: manhattan_pearson`
2082	`value: 88.25599267618065`
2083	`- type: manhattan_spearman`
2084	`value: 87.58048736047483`
2085	`- task:`
2086	`type: STS`
2087	`dataset:`
2088	`type: mteb/sts22-crosslingual-sts`
2089	`name: MTEB STS22 (en)`
2090	`config: en`
2091	`split: test`
2092	`revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80`
2093	`metrics:`
2094	`- type: cos_sim_pearson`
2095	`value: 64.74645319195137`
2096	`- type: cos_sim_spearman`
2097	`value: 65.29996325037214`
2098	`- type: euclidean_pearson`
2099	`value: 67.04297794086443`
2100	`- type: euclidean_spearman`
2101	`value: 65.43841726694343`
2102	`- type: manhattan_pearson`
2103	`value: 67.39459955690904`
2104	`- type: manhattan_spearman`
2105	`value: 65.92864704413651`
2106	`- task:`
2107	`type: STS`
2108	`dataset:`
2109	`type: mteb/stsbenchmark-sts`
2110	`name: MTEB STSBenchmark`
2111	`config: default`
2112	`split: test`
2113	`revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831`
2114	`metrics:`
2115	`- type: cos_sim_pearson`
2116	`value: 84.31291020270801`
2117	`- type: cos_sim_spearman`
2118	`value: 85.86473738688068`
2119	`- type: euclidean_pearson`
2120	`value: 85.65537275064152`
2121	`- type: euclidean_spearman`
2122	`value: 86.13087454209642`
2123	`- type: manhattan_pearson`
2124	`value: 85.43946955047609`
2125	`- type: manhattan_spearman`
2126	`value: 85.91568175344916`
2127	`- task:`
2128	`type: Reranking`
2129	`dataset:`
2130	`type: mteb/scidocs-reranking`
2131	`name: MTEB SciDocsRR`
2132	`config: default`
2133	`split: test`
2134	`revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab`
2135	`metrics:`
2136	`- type: map`
2137	`value: 85.93798118350695`
2138	`- type: mrr`
2139	`value: 95.93536274908824`
2140	`- task:`
2141	`type: Retrieval`
2142	`dataset:`
2143	`type: scifact`
2144	`name: MTEB SciFact`
2145	`config: default`
2146	`split: test`
2147	`revision: None`
2148	`metrics:`
2149	`- type: map_at_1`
2150	`value: 57.594`
2151	`- type: map_at_10`
2152	`value: 66.81899999999999`
2153	`- type: map_at_100`
2154	`value: 67.368`
2155	`- type: map_at_1000`
2156	`value: 67.4`
2157	`- type: map_at_3`
2158	`value: 64.061`
2159	`- type: map_at_5`
2160	`value: 65.47`
2161	`- type: mrr_at_1`
2162	`value: 60.667`
2163	`- type: mrr_at_10`
2164	`value: 68.219`
2165	`- type: mrr_at_100`
2166	`value: 68.655`
2167	`- type: mrr_at_1000`
2168	`value: 68.684`
2169	`- type: mrr_at_3`
2170	`value: 66.22200000000001`
2171	`- type: mrr_at_5`
2172	`value: 67.289`
2173	`- type: ndcg_at_1`
2174	`value: 60.667`
2175	`- type: ndcg_at_10`
2176	`value: 71.275`
2177	`- type: ndcg_at_100`
2178	`value: 73.642`
2179	`- type: ndcg_at_1000`
2180	`value: 74.373`
2181	`- type: ndcg_at_3`
2182	`value: 66.521`
2183	`- type: ndcg_at_5`
2184	`value: 68.581`
2185	`- type: precision_at_1`
2186	`value: 60.667`
2187	`- type: precision_at_10`
2188	`value: 9.433`
2189	`- type: precision_at_100`
2190	`value: 1.0699999999999998`
2191	`- type: precision_at_1000`
2192	`value: 0.11299999999999999`
2193	`- type: precision_at_3`
2194	`value: 25.556`
2195	`- type: precision_at_5`
2196	`value: 16.8`
2197	`- type: recall_at_1`
2198	`value: 57.594`
2199	`- type: recall_at_10`
2200	`value: 83.622`
2201	`- type: recall_at_100`
2202	`value: 94.167`
2203	`- type: recall_at_1000`
2204	`value: 99.667`
2205	`- type: recall_at_3`
2206	`value: 70.64399999999999`
2207	`- type: recall_at_5`
2208	`value: 75.983`
2209	`- task:`
2210	`type: PairClassification`
2211	`dataset:`
2212	`type: mteb/sprintduplicatequestions-pairclassification`
2213	`name: MTEB SprintDuplicateQuestions`
2214	`config: default`
2215	`split: test`
2216	`revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46`
2217	`metrics:`
2218	`- type: cos_sim_accuracy`
2219	`value: 99.85841584158416`
2220	`- type: cos_sim_ap`
2221	`value: 96.66996142314342`
2222	`- type: cos_sim_f1`
2223	`value: 92.83208020050125`
2224	`- type: cos_sim_precision`
2225	`value: 93.06532663316584`
2226	`- type: cos_sim_recall`
2227	`value: 92.60000000000001`
2228	`- type: dot_accuracy`
2229	`value: 99.85841584158416`
2230	`- type: dot_ap`
2231	`value: 96.6775307676576`
2232	`- type: dot_f1`
2233	`value: 92.69289729177312`
2234	`- type: dot_precision`
2235	`value: 94.77533960292581`
2236	`- type: dot_recall`
2237	`value: 90.7`
2238	`- type: euclidean_accuracy`
2239	`value: 99.86138613861387`
2240	`- type: euclidean_ap`
2241	`value: 96.6338454403108`
2242	`- type: euclidean_f1`
2243	`value: 92.92214357937311`
2244	`- type: euclidean_precision`
2245	`value: 93.96728016359918`
2246	`- type: euclidean_recall`
2247	`value: 91.9`
2248	`- type: manhattan_accuracy`
2249	`value: 99.86237623762376`
2250	`- type: manhattan_ap`
2251	`value: 96.60370449645053`
2252	`- type: manhattan_f1`
2253	`value: 92.91177970423253`
2254	`- type: manhattan_precision`
2255	`value: 94.7970863683663`
2256	`- type: manhattan_recall`
2257	`value: 91.10000000000001`
2258	`- type: max_accuracy`
2259	`value: 99.86237623762376`
2260	`- type: max_ap`
2261	`value: 96.6775307676576`
2262	`- type: max_f1`
2263	`value: 92.92214357937311`
2264	`- task:`
2265	`type: Clustering`
2266	`dataset:`
2267	`type: mteb/stackexchange-clustering`
2268	`name: MTEB StackExchangeClustering`
2269	`config: default`
2270	`split: test`
2271	`revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259`
2272	`metrics:`
2273	`- type: v_measure`
2274	`value: 60.77977058695198`
2275	`- task:`
2276	`type: Clustering`
2277	`dataset:`
2278	`type: mteb/stackexchange-clustering-p2p`
2279	`name: MTEB StackExchangeClusteringP2P`
2280	`config: default`
2281	`split: test`
2282	`revision: 815ca46b2622cec33ccafc3735d572c266efdb44`
2283	`metrics:`
2284	`- type: v_measure`
2285	`value: 35.2725272535638`
2286	`- task:`
2287	`type: Reranking`
2288	`dataset:`
2289	`type: mteb/stackoverflowdupquestions-reranking`
2290	`name: MTEB StackOverflowDupQuestions`
2291	`config: default`
2292	`split: test`
2293	`revision: e185fbe320c72810689fc5848eb6114e1ef5ec69`
2294	`metrics:`
2295	`- type: map`
2296	`value: 53.64052466362125`
2297	`- type: mrr`
2298	`value: 54.533067014684654`
2299	`- task:`
2300	`type: Summarization`
2301	`dataset:`
2302	`type: mteb/summeval`
2303	`name: MTEB SummEval`
2304	`config: default`
2305	`split: test`
2306	`revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c`
2307	`metrics:`
2308	`- type: cos_sim_pearson`
2309	`value: 30.677624219206578`
2310	`- type: cos_sim_spearman`
2311	`value: 30.121368518123447`
2312	`- type: dot_pearson`
2313	`value: 30.69870088041608`
2314	`- type: dot_spearman`
2315	`value: 29.61284927093751`
2316	`- task:`
2317	`type: Retrieval`
2318	`dataset:`
2319	`type: trec-covid`
2320	`name: MTEB TRECCOVID`
2321	`config: default`
2322	`split: test`
2323	`revision: None`
2324	`metrics:`
2325	`- type: map_at_1`
2326	`value: 0.22`
2327	`- type: map_at_10`
2328	`value: 1.855`
2329	`- type: map_at_100`
2330	`value: 9.885`
2331	`- type: map_at_1000`
2332	`value: 23.416999999999998`
2333	`- type: map_at_3`
2334	`value: 0.637`
2335	`- type: map_at_5`
2336	`value: 1.024`
2337	`- type: mrr_at_1`
2338	`value: 88.0`
2339	`- type: mrr_at_10`
2340	`value: 93.067`
2341	`- type: mrr_at_100`
2342	`value: 93.067`
2343	`- type: mrr_at_1000`
2344	`value: 93.067`
2345	`- type: mrr_at_3`
2346	`value: 92.667`
2347	`- type: mrr_at_5`
2348	`value: 93.067`
2349	`- type: ndcg_at_1`
2350	`value: 82.0`
2351	`- type: ndcg_at_10`
2352	`value: 75.899`
2353	`- type: ndcg_at_100`
2354	`value: 55.115`
2355	`- type: ndcg_at_1000`
2356	`value: 48.368`
2357	`- type: ndcg_at_3`
2358	`value: 79.704`
2359	`- type: ndcg_at_5`
2360	`value: 78.39699999999999`
2361	`- type: precision_at_1`
2362	`value: 88.0`
2363	`- type: precision_at_10`
2364	`value: 79.60000000000001`
2365	`- type: precision_at_100`
2366	`value: 56.06`
2367	`- type: precision_at_1000`
2368	`value: 21.206`
2369	`- type: precision_at_3`
2370	`value: 84.667`
2371	`- type: precision_at_5`
2372	`value: 83.2`
2373	`- type: recall_at_1`
2374	`value: 0.22`
2375	`- type: recall_at_10`
2376	`value: 2.078`
2377	`- type: recall_at_100`
2378	`value: 13.297`
2379	`- type: recall_at_1000`
2380	`value: 44.979`
2381	`- type: recall_at_3`
2382	`value: 0.6689999999999999`
2383	`- type: recall_at_5`
2384	`value: 1.106`
2385	`- task:`
2386	`type: Retrieval`
2387	`dataset:`
2388	`type: webis-touche2020`
2389	`name: MTEB Touche2020`
2390	`config: default`
2391	`split: test`
2392	`revision: None`
2393	`metrics:`
2394	`- type: map_at_1`
2395	`value: 2.258`
2396	`- type: map_at_10`
2397	`value: 10.439`
2398	`- type: map_at_100`
2399	`value: 16.89`
2400	`- type: map_at_1000`
2401	`value: 18.407999999999998`
2402	`- type: map_at_3`
2403	`value: 5.668`
2404	`- type: map_at_5`
2405	`value: 7.718`
2406	`- type: mrr_at_1`
2407	`value: 32.653`
2408	`- type: mrr_at_10`
2409	`value: 51.159`
2410	`- type: mrr_at_100`
2411	`value: 51.714000000000006`
2412	`- type: mrr_at_1000`
2413	`value: 51.714000000000006`
2414	`- type: mrr_at_3`
2415	`value: 47.959`
2416	`- type: mrr_at_5`
2417	`value: 50.407999999999994`
2418	`- type: ndcg_at_1`
2419	`value: 29.592000000000002`
2420	`- type: ndcg_at_10`
2421	`value: 26.037`
2422	`- type: ndcg_at_100`
2423	`value: 37.924`
2424	`- type: ndcg_at_1000`
2425	`value: 49.126999999999995`
2426	`- type: ndcg_at_3`
2427	`value: 30.631999999999998`
2428	`- type: ndcg_at_5`
2429	`value: 28.571`
2430	`- type: precision_at_1`
2431	`value: 32.653`
2432	`- type: precision_at_10`
2433	`value: 22.857`
2434	`- type: precision_at_100`
2435	`value: 7.754999999999999`
2436	`- type: precision_at_1000`
2437	`value: 1.529`
2438	`- type: precision_at_3`
2439	`value: 34.014`
2440	`- type: precision_at_5`
2441	`value: 29.796`
2442	`- type: recall_at_1`
2443	`value: 2.258`
2444	`- type: recall_at_10`
2445	`value: 16.554`
2446	`- type: recall_at_100`
2447	`value: 48.439`
2448	`- type: recall_at_1000`
2449	`value: 82.80499999999999`
2450	`- type: recall_at_3`
2451	`value: 7.283`
2452	`- type: recall_at_5`
2453	`value: 10.732`
2454	`- task:`
2455	`type: Classification`
2456	`dataset:`
2457	`type: mteb/toxic_conversations_50k`
2458	`name: MTEB ToxicConversationsClassification`
2459	`config: default`
2460	`split: test`
2461	`revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c`
2462	`metrics:`
2463	`- type: accuracy`
2464	`value: 69.8858`
2465	`- type: ap`
2466	`value: 13.835684144362109`
2467	`- type: f1`
2468	`value: 53.803351693244586`
2469	`- task:`
2470	`type: Classification`
2471	`dataset:`
2472	`type: mteb/tweet_sentiment_extraction`
2473	`name: MTEB TweetSentimentExtractionClassification`
2474	`config: default`
2475	`split: test`
2476	`revision: d604517c81ca91fe16a244d1248fc021f9ecee7a`
2477	`metrics:`
2478	`- type: accuracy`
2479	`value: 60.50650820599886`
2480	`- type: f1`
2481	`value: 60.84357825979259`
2482	`- task:`
2483	`type: Clustering`
2484	`dataset:`
2485	`type: mteb/twentynewsgroups-clustering`
2486	`name: MTEB TwentyNewsgroupsClustering`
2487	`config: default`
2488	`split: test`
2489	`revision: 6125ec4e24fa026cec8a478383ee943acfbd5449`
2490	`metrics:`
2491	`- type: v_measure`
2492	`value: 48.52131044852134`
2493	`- task:`
2494	`type: PairClassification`
2495	`dataset:`
2496	`type: mteb/twittersemeval2015-pairclassification`
2497	`name: MTEB TwitterSemEval2015`
2498	`config: default`
2499	`split: test`
2500	`revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1`
2501	`metrics:`
2502	`- type: cos_sim_accuracy`
2503	`value: 85.59337187816654`
2504	`- type: cos_sim_ap`
2505	`value: 73.23925826533437`
2506	`- type: cos_sim_f1`
2507	`value: 67.34693877551021`
2508	`- type: cos_sim_precision`
2509	`value: 62.40432237730752`
2510	`- type: cos_sim_recall`
2511	`value: 73.13984168865434`
2512	`- type: dot_accuracy`
2513	`value: 85.31322644096085`
2514	`- type: dot_ap`
2515	`value: 72.30723963807422`
2516	`- type: dot_f1`
2517	`value: 66.47051612112296`
2518	`- type: dot_precision`
2519	`value: 62.0792305930845`
2520	`- type: dot_recall`
2521	`value: 71.53034300791556`
2522	`- type: euclidean_accuracy`
2523	`value: 85.61125350181797`
2524	`- type: euclidean_ap`
2525	`value: 73.32843720487845`
2526	`- type: euclidean_f1`
2527	`value: 67.36549633745895`
2528	`- type: euclidean_precision`
2529	`value: 64.60755813953489`
2530	`- type: euclidean_recall`
2531	`value: 70.36939313984169`
2532	`- type: manhattan_accuracy`
2533	`value: 85.63509566668654`
2534	`- type: manhattan_ap`
2535	`value: 73.16658488311325`
2536	`- type: manhattan_f1`
2537	`value: 67.20597386434349`
2538	`- type: manhattan_precision`
2539	`value: 63.60424028268551`
2540	`- type: manhattan_recall`
2541	`value: 71.2401055408971`
2542	`- type: max_accuracy`
2543	`value: 85.63509566668654`
2544	`- type: max_ap`
2545	`value: 73.32843720487845`
2546	`- type: max_f1`
2547	`value: 67.36549633745895`
2548	`- task:`
2549	`type: PairClassification`
2550	`dataset:`
2551	`type: mteb/twitterurlcorpus-pairclassification`
2552	`name: MTEB TwitterURLCorpus`
2553	`config: default`
2554	`split: test`
2555	`revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf`
2556	`metrics:`
2557	`- type: cos_sim_accuracy`
2558	`value: 88.33779640625606`
2559	`- type: cos_sim_ap`
2560	`value: 84.83868375898157`
2561	`- type: cos_sim_f1`
2562	`value: 77.16506154017773`
2563	`- type: cos_sim_precision`
2564	`value: 74.62064005753327`
2565	`- type: cos_sim_recall`
2566	`value: 79.88912842623961`
2567	`- type: dot_accuracy`
2568	`value: 88.02732176815307`
2569	`- type: dot_ap`
2570	`value: 83.95089283763002`
2571	`- type: dot_f1`
2572	`value: 76.29635101196631`
2573	`- type: dot_precision`
2574	`value: 73.31771720613288`
2575	`- type: dot_recall`
2576	`value: 79.52725592854944`
2577	`- type: euclidean_accuracy`
2578	`value: 88.44452206310397`
2579	`- type: euclidean_ap`
2580	`value: 84.98384576824827`
2581	`- type: euclidean_f1`
2582	`value: 77.29311047696697`
2583	`- type: euclidean_precision`
2584	`value: 74.51232583065381`
2585	`- type: euclidean_recall`
2586	`value: 80.28949799815214`
2587	`- type: manhattan_accuracy`
2588	`value: 88.47362906042613`
2589	`- type: manhattan_ap`
2590	`value: 84.91421462218432`
2591	`- type: manhattan_f1`
2592	`value: 77.05107637204792`
2593	`- type: manhattan_precision`
2594	`value: 74.74484256243214`
2595	`- type: manhattan_recall`
2596	`value: 79.50415768401602`
2597	`- type: max_accuracy`
2598	`value: 88.47362906042613`
2599	`- type: max_ap`
2600	`value: 84.98384576824827`
2601	`- type: max_f1`
2602	`value: 77.29311047696697`
2603	`license: mit`
2604	`language:`
2605	`- en`
2606	`---`
2607
2608
2609	`<h1 align="center">FlagEmbedding</h1>`
2610
2611
2612	`<h4 align="center">`
2613	`<p>`
2614	`<a href=#model-list>Model List</a> \|`
2615	`<a href=#frequently-asked-questions>FAQ</a> \|`
2616	`<a href=#usage>Usage</a> \|`
2617	`<a href="#evaluation">Evaluation</a> \|`
2618	`<a href="#train">Train</a> \|`
2619	`<a href="#contact">Contact</a> \|`
2620	`<a href="#citation">Citation</a> \|`
2621	`<a href="#license">License</a>`
2622	`<p>`
2623	`</h4>`
2624
2625	`More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).`
2626
2627	`If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using [bge-m3](https://huggingface.co/BAAI/bge-m3).`
2628
2629
2630	`[English](README.md) \| [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)`
2631
2632	`FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:`
2633
2634	`- Long-Context LLM: [Activation Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon)`
2635	`- Fine-tuning of LM : [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)`
2636	`- Dense Retrieval: [BGE-M3](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3), [LLM Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), [BGE Embedding](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding)`
2637	`- Reranker Model: [BGE Reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)`
2638	`- Benchmark: [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)`
2639
2640	`## News`
2641	`- 1/30/2024: Release BGE-M3, a new member to BGE model series! M3 stands for Multi-linguality (100+ languages), Multi-granularities (input length up to 8192), Multi-Functionality (unification of dense, lexical, multi-vec/colbert retrieval).`
2642	`It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks.`
2643	`[Technical Report](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/BGE_M3/BGE_M3.pdf) and [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3). :fire:`
2644	`- 1/9/2024: Release [Activation-Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon), an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. [Technical Report](https://arxiv.org/abs/2401.03462) :fire:`
2645	`- 12/24/2023: Release LLaRA, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. [Technical Report](https://arxiv.org/abs/2312.15503) :fire:`
2646	`- 11/23/2023: Release [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail), a method to maintain general capabilities during fine-tuning by merging multiple language models. [Technical Report](https://arxiv.org/abs/2311.13534) :fire:`
2647	`- 10/12/2023: Release [LLM-Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Technical Report](https://arxiv.org/pdf/2310.07554.pdf)`
2648	`- 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) of BGE has been released`
2649	`- 09/15/2023: The [massive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released`
2650	`- 09/12/2023: New models:`
2651	- New reranker model: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
2652	- update embedding model: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
2653
2654
2655	`<details>`
2656	`<summary>More</summary>`
2657	`<!-- ### More -->`
2658
2659	`- 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.`
2660	`- 08/09/2023: BGE Models are integrated into Langchain, you can use it like [this](#using-langchain); C-MTEB leaderboard is [available](https://huggingface.co/spaces/mteb/leaderboard).`
2661	`- 08/05/2023: Release base-scale and small-scale models, best performance among the models of the same size 🤗`
2662	- 08/02/2023: Release `bge-large-`(short for BAAI General Embedding) Models, rank 1st on MTEB and C-MTEB benchmark!* :tada: :tada:
2663	`- 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (C-MTEB), consisting of 31 test dataset.`
2664
2665	`</details>`
2666
2667
2668	`## Model List`
2669
2670	`bge` is short for `BAAI general embedding`.
2671
2672	`\| Model \| Language \| \| Description \| query instruction for retrieval [1] \|`
2673	`\|:-------------------------------\|:--------:\| :--------:\| :--------:\|:--------:\|`
2674	`\| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) \| Multilingual \| [Inference](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3#usage) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3) \| Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) \| \|`
2675	`\| [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) \| English \| [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) \| a unified embedding model to support diverse retrieval augmentation needs for LLMs \| See [README](./FlagEmbedding/llm_embedder/README.md) \|`
2676	`\| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) \| Chinese and English \| [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) \| a cross-encoder model which is more accurate but less efficient [2] \| \|`
2677	`\| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) \| Chinese and English \| [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) \| a cross-encoder model which is more accurate but less efficient [2] \| \|`
2678	\| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2679	\| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2680	\| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2681	\| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2682	\| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2683	\| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2684	\| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| :trophy: rank 1st in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard \| `Represent this sentence for searching relevant passages: ` \|
2685	\| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a base-scale model but with similar ability to `bge-large-en` \| `Represent this sentence for searching relevant passages: ` \|
2686	\| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \|a small-scale model but with competitive performance \| `Represent this sentence for searching relevant passages: ` \|
2687	\| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| :trophy: rank 1st in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark \| `为这个句子生成表示以用于检索相关文章：` \|
2688	\| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a base-scale model but with similar ability to `bge-large-zh` \| `为这个句子生成表示以用于检索相关文章：` \|
2689	\| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a small-scale model but with competitive performance \| `为这个句子生成表示以用于检索相关文章：` \|
2690
2691	`[1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, no instruction needs to be added to passages.`
2692
2693	`[2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.`
2694	`For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.`
2695
2696	`All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.`
2697	`If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .`
2698
2699
2700	`## Frequently asked questions`
2701
2702	`<details>`
2703	`<summary>1. How to fine-tune bge embedding model?</summary>`
2704
2705	`<!-- ### How to fine-tune bge embedding model? -->`
2706	`Following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to prepare data and fine-tune your model.`
2707	`Some suggestions:`
2708	`- Mine hard negatives following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives), which can improve the retrieval performance.`
2709	`- If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.`
2710	`- If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.`
2711
2712
2713	`</details>`
2714
2715	`<details>`
2716	`<summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary>`
2717
2718	`<!-- ### The similarity score between two dissimilar sentences is higher than 0.5 -->`
2719	`Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.`
2720
2721	`Since we finetune the models by contrastive learning with a temperature of 0.01,`
2722	`the similarity distribution of the current BGE model is about in the interval \[0.6, 1\].`
2723	`So a similarity score greater than 0.5 does not indicate that the two sentences are similar.`
2724
2725	`For downstream tasks, such as passage retrieval or semantic similarity,`
2726	`what matters is the relative order of the scores, not the absolute value.`
2727	`If you need to filter similar sentences based on a similarity threshold,`
2728	`please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).`
2729
2730	`</details>`
2731
2732	`<details>`
2733	`<summary>3. When does the query instruction need to be used</summary>`
2734
2735	`<!-- ### When does the query instruction need to be used -->`
2736
2737	For the `bge-*-v1.5`, we improve its retrieval ability when not using instruction.
2738	`No instruction only has a slight degradation in retrieval performance compared with using instruction.`
2739	`So you can generate embedding without instruction in all cases for convenience.`
2740
2741	`For a retrieval task that uses short queries to find long related documents,`
2742	`it is recommended to add instructions for these short queries.`
2743	`The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.`
2744	`In all cases, the documents/passages do not need to add the instruction.`
2745
2746	`</details>`
2747
2748
2749	`## Usage`
2750
2751	`### Usage for Embedding Model`
2752
2753	Here are some examples for using `bge` models with
2754	`[FlagEmbedding](#using-flagembedding), [Sentence-Transformers](#using-sentence-transformers), [Langchain](#using-langchain), or [Huggingface Transformers](#using-huggingface-transformers).`
2755
2756	`#### Using FlagEmbedding`
2757	```
2758	`pip install -U FlagEmbedding`
2759	```
2760	`If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding.`
2761
2762	```python
2763	`from FlagEmbedding import FlagModel`
2764	`sentences_1 = ["样例数据-1", "样例数据-2"]`
2765	`sentences_2 = ["样例数据-3", "样例数据-4"]`
2766	`model = FlagModel('BAAI/bge-large-zh-v1.5',`
2767	`query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章：",`
2768	`use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation`
2769	`embeddings_1 = model.encode(sentences_1)`
2770	`embeddings_2 = model.encode(sentences_2)`
2771	`similarity = embeddings_1 @ embeddings_2.T`
2772	`print(similarity)`
2773
2774	`# for s2p(short query to long passage) retrieval task, suggest to use encode_queries() which will automatically add the instruction to each query`
2775	`# corpus in retrieval task can still use encode() or encode_corpus(), since they don't need instruction`
2776	`queries = ['query_1', 'query_2']`
2777	`passages = ["样例文档-1", "样例文档-2"]`
2778	`q_embeddings = model.encode_queries(queries)`
2779	`p_embeddings = model.encode(passages)`
2780	`scores = q_embeddings @ p_embeddings.T`
2781	```
2782	For the value of the argument `query_instruction_for_retrieval`, see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list).
2783
2784	By default, FlagModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
2785	You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.
2786
2787
2788	`#### Using Sentence-Transformers`
2789
2790	You can also use the `bge` models with [sentence-transformers](https://www.SBERT.net):
2791
2792	```
2793	`pip install -U sentence-transformers`
2794	```
2795	```python
2796	`from sentence_transformers import SentenceTransformer`
2797	`sentences_1 = ["样例数据-1", "样例数据-2"]`
2798	`sentences_2 = ["样例数据-3", "样例数据-4"]`
2799	`model = SentenceTransformer('BAAI/bge-large-zh-v1.5')`
2800	`embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)`
2801	`embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)`
2802	`similarity = embeddings_1 @ embeddings_2.T`
2803	`print(similarity)`
2804	```
2805	`For s2p(short query to long passage) retrieval task,`
2806	`each short query should start with an instruction (instructions see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list)).`
2807	`But the instruction is not needed for passages.`
2808	```python
2809	`from sentence_transformers import SentenceTransformer`
2810	`queries = ['query_1', 'query_2']`
2811	`passages = ["样例文档-1", "样例文档-2"]`
2812	`instruction = "为这个句子生成表示以用于检索相关文章："`
2813
2814	`model = SentenceTransformer('BAAI/bge-large-zh-v1.5')`
2815	`q_embeddings = model.encode([instruction+q for q in queries], normalize_embeddings=True)`
2816	`p_embeddings = model.encode(passages, normalize_embeddings=True)`
2817	`scores = q_embeddings @ p_embeddings.T`
2818	```
2819
2820	`#### Using Langchain`
2821
2822	You can use `bge` in langchain like this:
2823	```python
2824	`from langchain.embeddings import HuggingFaceBgeEmbeddings`
2825	`model_name = "BAAI/bge-large-en-v1.5"`
2826	`model_kwargs = {'device': 'cuda'}`
2827	`encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity`
2828	`model = HuggingFaceBgeEmbeddings(`
2829	`model_name=model_name,`
2830	`model_kwargs=model_kwargs,`
2831	`encode_kwargs=encode_kwargs,`
2832	`query_instruction="为这个句子生成表示以用于检索相关文章："`
2833	`)`
2834	`model.query_instruction = "为这个句子生成表示以用于检索相关文章："`
2835	```
2836
2837
2838	`#### Using HuggingFace Transformers`
2839
2840	`With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.`
2841
2842	```python
2843	`from transformers import AutoTokenizer, AutoModel`
2844	`import torch`
2845	`# Sentences we want sentence embeddings for`
2846	`sentences = ["样例数据-1", "样例数据-2"]`
2847
2848	`# Load model from HuggingFace Hub`
2849	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')`
2850	`model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')`
2851	`model.eval()`
2852
2853	`# Tokenize sentences`
2854	`encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')`
2855	`# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)`
2856	`# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')`
2857
2858	`# Compute token embeddings`
2859	`with torch.no_grad():`
2860	`model_output = model(**encoded_input)`
2861	`# Perform pooling. In this case, cls pooling.`
2862	`sentence_embeddings = model_output[0][:, 0]`
2863	`# normalize embeddings`
2864	`sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)`
2865	`print("Sentence embeddings:", sentence_embeddings)`
2866	```
2867
2868	`### Usage for Reranker`
2869
2870	`Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.`
2871	`You can get a relevance score by inputting query and passage to the reranker.`
2872	`The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.`
2873
2874
2875	`#### Using FlagEmbedding`
2876	```
2877	`pip install -U FlagEmbedding`
2878	```
2879
2880	`Get relevance scores (higher scores indicate more relevance):`
2881	```python
2882	`from FlagEmbedding import FlagReranker`
2883	`reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation`
2884
2885	`score = reranker.compute_score(['query', 'passage'])`
2886	`print(score)`
2887
2888	`scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])`
2889	`print(scores)`
2890	```
2891
2892
2893	`#### Using Huggingface transformers`
2894
2895	```python
2896	`import torch`
2897	`from transformers import AutoModelForSequenceClassification, AutoTokenizer`
2898
2899	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')`
2900	`model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')`
2901	`model.eval()`
2902
2903	`pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]`
2904	`with torch.no_grad():`
2905	`inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)`
2906	`scores = model(**inputs, return_dict=True).logits.view(-1, ).float()`
2907	`print(scores)`
2908	```
2909
2910	`#### Usage of the ONNX files`
2911
2912	```python
2913	`from optimum.onnxruntime import ORTModelForFeatureExtraction # type: ignore`
2914
2915	`import torch`
2916	`from transformers import AutoModel, AutoTokenizer`
2917
2918	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')`
2919	`model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5')`
2920	`model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-small-en-v1.5', file_name="onnx/model.onnx")`
2921
2922	`# Sentences we want sentence embeddings for`
2923	`sentences = ["样例数据-1", "样例数据-2"]`
2924
2925	`# Tokenize sentences`
2926	`encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')`
2927	`# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)`
2928	`# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')`
2929
2930	`model_output_ort = model_ort(**encoded_input)`
2931	`# Compute token embeddings`
2932	`with torch.no_grad():`
2933	`model_output = model(**encoded_input)`
2934
2935	`# model_output and model_output_ort are identical`
2936
2937	```
2938
2939	`#### Usage via infinity`
2940	`Its also possible to deploy the onnx files with the [infinity_emb](https://github.com/michaelfeil/infinity) pip package.`
2941	Recommended is `device="cuda", engine="torch"` with flash attention on gpu, and `device="cpu", engine="optimum"` for onnx inference.
2942
2943	```python
2944	`import asyncio`
2945	`from infinity_emb import AsyncEmbeddingEngine, EngineArgs`
2946
2947	`sentences = ["Embed this is sentence via Infinity.", "Paris is in France."]`
2948	`engine = AsyncEmbeddingEngine.from_args(`
2949	`EngineArgs(model_name_or_path = "BAAI/bge-small-en-v1.5", device="cpu", engine="optimum" # or engine="torch"`
2950	`))`
2951
2952	`async def main():`
2953	`async with engine:`
2954	`embeddings, usage = await engine.embed(sentences=sentences)`
2955	`asyncio.run(main())`
2956	```
2957
2958
2959	`## Evaluation`
2960
2961	`baai-general-embedding` models achieve state-of-the-art performance on both MTEB and C-MTEB leaderboard!
2962	`For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md).`
2963
2964	`- MTEB:`
2965
2966	`\| Model Name \| Dimension \| Sequence Length \| Average (56) \| Retrieval (15) \|Clustering (11) \| Pair Classification (3) \| Reranking (4) \| STS (10) \| Summarization (1) \| Classification (12) \|`
2967	`\|:----:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|`
2968	`\| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) \| 1024 \| 512 \| 64.23 \| 54.29 \| 46.08 \| 87.12 \| 60.03 \| 83.11 \| 31.61 \| 75.97 \|`
2969	`\| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) \| 768 \| 512 \| 63.55 \| 53.25 \| 45.77 \| 86.55 \| 58.86 \| 82.4 \| 31.07 \| 75.53 \|`
2970	`\| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) \| 384 \| 512 \| 62.17 \|51.68 \| 43.82 \| 84.92 \| 58.36 \| 81.59 \| 30.12 \| 74.14 \|`
2971	`\| [bge-large-en](https://huggingface.co/BAAI/bge-large-en) \| 1024 \| 512 \| 63.98 \| 53.9 \| 46.98 \| 85.8 \| 59.48 \| 81.56 \| 32.06 \| 76.21 \|`
2972	`\| [bge-base-en](https://huggingface.co/BAAI/bge-base-en) \| 768 \| 512 \| 63.36 \| 53.0 \| 46.32 \| 85.86 \| 58.7 \| 81.84 \| 29.27 \| 75.27 \|`
2973	`\| [gte-large](https://huggingface.co/thenlper/gte-large) \| 1024 \| 512 \| 63.13 \| 52.22 \| 46.84 \| 85.00 \| 59.13 \| 83.35 \| 31.66 \| 73.33 \|`
2974	`\| [gte-base](https://huggingface.co/thenlper/gte-base) \| 768 \| 512 \| 62.39 \| 51.14 \| 46.2 \| 84.57 \| 58.61 \| 82.3 \| 31.17 \| 73.01 \|`
2975	`\| [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) \| 1024\| 512 \| 62.25 \| 50.56 \| 44.49 \| 86.03 \| 56.61 \| 82.05 \| 30.19 \| 75.24 \|`
2976	`\| [bge-small-en](https://huggingface.co/BAAI/bge-small-en) \| 384 \| 512 \| 62.11 \| 51.82 \| 44.31 \| 83.78 \| 57.97 \| 80.72 \| 30.53 \| 74.37 \|`
2977	`\| [instructor-xl](https://huggingface.co/hkunlp/instructor-xl) \| 768 \| 512 \| 61.79 \| 49.26 \| 44.74 \| 86.62 \| 57.29 \| 83.06 \| 32.32 \| 61.79 \|`
2978	`\| [e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) \| 768 \| 512 \| 61.5 \| 50.29 \| 43.80 \| 85.73 \| 55.91 \| 81.05 \| 30.28 \| 73.84 \|`
2979	`\| [gte-small](https://huggingface.co/thenlper/gte-small) \| 384 \| 512 \| 61.36 \| 49.46 \| 44.89 \| 83.54 \| 57.7 \| 82.07 \| 30.42 \| 72.31 \|`
2980	`\| [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings) \| 1536 \| 8192 \| 60.99 \| 49.25 \| 45.9 \| 84.89 \| 56.32 \| 80.97 \| 30.8 \| 70.93 \|`
2981	`\| [e5-small-v2](https://huggingface.co/intfloat/e5-base-v2) \| 384 \| 512 \| 59.93 \| 49.04 \| 39.92 \| 84.67 \| 54.32 \| 80.39 \| 31.16 \| 72.94 \|`
2982	`\| [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) \| 768 \| 512 \| 59.51 \| 42.24 \| 43.72 \| 85.06 \| 56.42 \| 82.63 \| 30.08 \| 73.42 \|`
2983	`\| [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) \| 768 \| 514 \| 57.78 \| 43.81 \| 43.69 \| 83.04 \| 59.36 \| 80.28 \| 27.49 \| 65.07 \|`
2984	`\| [sgpt-bloom-7b1-msmarco](https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco) \| 4096 \| 2048 \| 57.59 \| 48.22 \| 38.93 \| 81.9 \| 55.65 \| 77.74 \| 33.6 \| 66.19 \|`
2985
2986
2987
2988	`- C-MTEB:`
2989	`We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.`
2990	`Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md) for a detailed introduction.`
2991
2992	`\| Model \| Embedding dimension \| Avg \| Retrieval \| STS \| PairClassification \| Classification \| Reranking \| Clustering \|`
2993	`\|:-------------------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|`
2994	`\| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) \| 1024 \| 64.53 \| 70.46 \| 56.25 \| 81.6 \| 69.13 \| 65.84 \| 48.99 \|`
2995	`\| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) \| 768 \| 63.13 \| 69.49 \| 53.72 \| 79.75 \| 68.07 \| 65.39 \| 47.53 \|`
2996	`\| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) \| 512 \| 57.82 \| 61.77 \| 49.11 \| 70.41 \| 63.96 \| 60.92 \| 44.18 \|`
2997	`\| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) \| 1024 \| 64.20 \| 71.53 \| 54.98 \| 78.94 \| 68.32 \| 65.11 \| 48.39 \|`
2998	`\| [bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) \| 1024 \| 63.53 \| 70.55 \| 53 \| 76.77 \| 68.58 \| 64.91 \| 50.01 \|`
2999	`\| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) \| 768 \| 62.96 \| 69.53 \| 54.12 \| 77.5 \| 67.07 \| 64.91 \| 47.63 \|`
3000	`\| [multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) \| 1024 \| 58.79 \| 63.66 \| 48.44 \| 69.89 \| 67.34 \| 56.00 \| 48.23 \|`
3001	`\| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) \| 512 \| 58.27 \| 63.07 \| 49.45 \| 70.35 \| 63.64 \| 61.48 \| 45.09 \|`
3002	`\| [m3e-base](https://huggingface.co/moka-ai/m3e-base) \| 768 \| 57.10 \| 56.91 \| 50.47 \| 63.99 \| 67.52 \| 59.34 \| 47.68 \|`
3003	`\| [m3e-large](https://huggingface.co/moka-ai/m3e-large) \| 1024 \| 57.05 \| 54.75 \| 50.42 \| 64.3 \| 68.2 \| 59.66 \| 48.88 \|`
3004	`\| [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) \| 768 \| 55.48 \| 61.63 \| 46.49 \| 67.07 \| 65.35 \| 54.35 \| 40.68 \|`
3005	`\| [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) \| 384 \| 55.38 \| 59.95 \| 45.27 \| 66.45 \| 65.85 \| 53.86 \| 45.26 \|`
3006	`\| [text-embedding-ada-002(OpenAI)](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) \| 1536 \| 53.02 \| 52.0 \| 43.35 \| 69.56 \| 64.31 \| 54.28 \| 45.68 \|`
3007	`\| [luotuo](https://huggingface.co/silk-road/luotuo-bert-medium) \| 1024 \| 49.37 \| 44.4 \| 42.78 \| 66.62 \| 61 \| 49.25 \| 44.39 \|`
3008	`\| [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) \| 768 \| 47.63 \| 38.79 \| 43.41 \| 67.41 \| 62.19 \| 49.45 \| 37.66 \|`
3009	`\| [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) \| 1024 \| 47.36 \| 41.94 \| 44.97 \| 70.86 \| 60.66 \| 49.16 \| 30.02 \|`
3010
3011
3012	`- Reranking:`
3013	`See [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/) for evaluation script.`
3014
3015	`\| Model \| T2Reranking \| T2RerankingZh2En\* \| T2RerankingEn2Zh\* \| MMarcoReranking \| CMedQAv1 \| CMedQAv2 \| Avg \|`
3016	`\|:-------------------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|`
3017	`\| text2vec-base-multilingual \| 64.66 \| 62.94 \| 62.51 \| 14.37 \| 48.46 \| 48.6 \| 50.26 \|`
3018	`\| multilingual-e5-small \| 65.62 \| 60.94 \| 56.41 \| 29.91 \| 67.26 \| 66.54 \| 57.78 \|`
3019	`\| multilingual-e5-large \| 64.55 \| 61.61 \| 54.28 \| 28.6 \| 67.42 \| 67.92 \| 57.4 \|`
3020	`\| multilingual-e5-base \| 64.21 \| 62.13 \| 54.68 \| 29.5 \| 66.23 \| 66.98 \| 57.29 \|`
3021	`\| m3e-base \| 66.03 \| 62.74 \| 56.07 \| 17.51 \| 77.05 \| 76.76 \| 59.36 \|`
3022	`\| m3e-large \| 66.13 \| 62.72 \| 56.1 \| 16.46 \| 77.76 \| 78.27 \| 59.57 \|`
3023	`\| bge-base-zh-v1.5 \| 66.49 \| 63.25 \| 57.02 \| 29.74 \| 80.47 \| 84.88 \| 63.64 \|`
3024	`\| bge-large-zh-v1.5 \| 65.74 \| 63.39 \| 57.03 \| 28.74 \| 83.45 \| 85.44 \| 63.97 \|`
3025	`\| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) \| 67.28 \| 63.95 \| 60.45 \| 35.46 \| 81.26 \| 84.1 \| 65.42 \|`
3026	`\| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) \| 67.6 \| 64.03 \| 61.44 \| 37.16 \| 82.15 \| 84.18 \| 66.09 \|`
3027
3028	`\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks`
3029
3030	`## Train`
3031
3032	`### BAAI Embedding`
3033
3034	`We pre-train the models using [retromae](https://github.com/staoxiao/RetroMAE) and train them on large-scale pairs data using contrastive learning.`
3035	`You can fine-tune the embedding model on your data following our [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune).`
3036	`We also provide a [pre-train example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain).`
3037	`Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.`
3038	`More training details for bge see [baai_general_embedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md).`
3039
3040
3041
3042	`### BGE Reranker`
3043
3044	`Cross-encoder will perform full-attention over the input pair,`
3045	`which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.`
3046	`Therefore, it can be used to re-rank the top-k documents returned by embedding model.`
3047	`We train the cross-encoder on a multilingual pair data,`
3048	`The data format is the same as embedding model, so you can fine-tune it easily following our [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker).`
3049	`More details please refer to [./FlagEmbedding/reranker/README.md](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)`
3050
3051
3052	`## Contact`
3053	`If you have any question or suggestion related to this project, feel free to open an issue or pull request.`
3054	`You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn).`
3055
3056
3057	`## Citation`
3058
3059	`If you find this repository useful, please consider giving a star :star: and citation`
3060
3061	```
3062	`@misc{bge_embedding,`
3063	`title={C-Pack: Packaged Resources To Advance General Chinese Embedding},`
3064	`author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},`
3065	`year={2023},`
3066	`eprint={2309.07597},`
3067	`archivePrefix={arXiv},`
3068	`primaryClass={cs.CL}`
3069	`}`
3070	```
3071
3072	`## License`
3073	`FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge.`
3074
3075