README.md · bge-base-en-v1.5

README.md

92.3 KB · 3074 lines · markdown Raw

1	`---`
2	`tags:`
3	`- sentence-transformers`
4	`- feature-extraction`
5	`- sentence-similarity`
6	`- transformers`
7	`- mteb`
8	`model-index:`
9	`- name: bge-base-en-v1.5`
10	`results:`
11	`- task:`
12	`type: Classification`
13	`dataset:`
14	`type: mteb/amazon_counterfactual`
15	`name: MTEB AmazonCounterfactualClassification (en)`
16	`config: en`
17	`split: test`
18	`revision: e8379541af4e31359cca9fbcf4b00f2671dba205`
19	`metrics:`
20	`- type: accuracy`
21	`value: 76.14925373134328`
22	`- type: ap`
23	`value: 39.32336517995478`
24	`- type: f1`
25	`value: 70.16902252611425`
26	`- task:`
27	`type: Classification`
28	`dataset:`
29	`type: mteb/amazon_polarity`
30	`name: MTEB AmazonPolarityClassification`
31	`config: default`
32	`split: test`
33	`revision: e2d317d38cd51312af73b3d32a06d1a08b442046`
34	`metrics:`
35	`- type: accuracy`
36	`value: 93.386825`
37	`- type: ap`
38	`value: 90.21276917991995`
39	`- type: f1`
40	`value: 93.37741030006174`
41	`- task:`
42	`type: Classification`
43	`dataset:`
44	`type: mteb/amazon_reviews_multi`
45	`name: MTEB AmazonReviewsClassification (en)`
46	`config: en`
47	`split: test`
48	`revision: 1399c76144fd37290681b995c656ef9b2e06e26d`
49	`metrics:`
50	`- type: accuracy`
51	`value: 48.846000000000004`
52	`- type: f1`
53	`value: 48.14646269778261`
54	`- task:`
55	`type: Retrieval`
56	`dataset:`
57	`type: arguana`
58	`name: MTEB ArguAna`
59	`config: default`
60	`split: test`
61	`revision: None`
62	`metrics:`
63	`- type: map_at_1`
64	`value: 40.754000000000005`
65	`- type: map_at_10`
66	`value: 55.761`
67	`- type: map_at_100`
68	`value: 56.330999999999996`
69	`- type: map_at_1000`
70	`value: 56.333999999999996`
71	`- type: map_at_3`
72	`value: 51.92`
73	`- type: map_at_5`
74	`value: 54.010999999999996`
75	`- type: mrr_at_1`
76	`value: 41.181`
77	`- type: mrr_at_10`
78	`value: 55.967999999999996`
79	`- type: mrr_at_100`
80	`value: 56.538`
81	`- type: mrr_at_1000`
82	`value: 56.542`
83	`- type: mrr_at_3`
84	`value: 51.980000000000004`
85	`- type: mrr_at_5`
86	`value: 54.208999999999996`
87	`- type: ndcg_at_1`
88	`value: 40.754000000000005`
89	`- type: ndcg_at_10`
90	`value: 63.605000000000004`
91	`- type: ndcg_at_100`
92	`value: 66.05199999999999`
93	`- type: ndcg_at_1000`
94	`value: 66.12`
95	`- type: ndcg_at_3`
96	`value: 55.708`
97	`- type: ndcg_at_5`
98	`value: 59.452000000000005`
99	`- type: precision_at_1`
100	`value: 40.754000000000005`
101	`- type: precision_at_10`
102	`value: 8.841000000000001`
103	`- type: precision_at_100`
104	`value: 0.991`
105	`- type: precision_at_1000`
106	`value: 0.1`
107	`- type: precision_at_3`
108	`value: 22.238`
109	`- type: precision_at_5`
110	`value: 15.149000000000001`
111	`- type: recall_at_1`
112	`value: 40.754000000000005`
113	`- type: recall_at_10`
114	`value: 88.407`
115	`- type: recall_at_100`
116	`value: 99.14699999999999`
117	`- type: recall_at_1000`
118	`value: 99.644`
119	`- type: recall_at_3`
120	`value: 66.714`
121	`- type: recall_at_5`
122	`value: 75.747`
123	`- task:`
124	`type: Clustering`
125	`dataset:`
126	`type: mteb/arxiv-clustering-p2p`
127	`name: MTEB ArxivClusteringP2P`
128	`config: default`
129	`split: test`
130	`revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d`
131	`metrics:`
132	`- type: v_measure`
133	`value: 48.74884539679369`
134	`- task:`
135	`type: Clustering`
136	`dataset:`
137	`type: mteb/arxiv-clustering-s2s`
138	`name: MTEB ArxivClusteringS2S`
139	`config: default`
140	`split: test`
141	`revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53`
142	`metrics:`
143	`- type: v_measure`
144	`value: 42.8075893810716`
145	`- task:`
146	`type: Reranking`
147	`dataset:`
148	`type: mteb/askubuntudupquestions-reranking`
149	`name: MTEB AskUbuntuDupQuestions`
150	`config: default`
151	`split: test`
152	`revision: 2000358ca161889fa9c082cb41daa8dcfb161a54`
153	`metrics:`
154	`- type: map`
155	`value: 62.128470519187736`
156	`- type: mrr`
157	`value: 74.28065778481289`
158	`- task:`
159	`type: STS`
160	`dataset:`
161	`type: mteb/biosses-sts`
162	`name: MTEB BIOSSES`
163	`config: default`
164	`split: test`
165	`revision: d3fb88f8f02e40887cd149695127462bbcf29b4a`
166	`metrics:`
167	`- type: cos_sim_pearson`
168	`value: 89.24629081484655`
169	`- type: cos_sim_spearman`
170	`value: 86.93752309911496`
171	`- type: euclidean_pearson`
172	`value: 87.58589628573816`
173	`- type: euclidean_spearman`
174	`value: 88.05622328825284`
175	`- type: manhattan_pearson`
176	`value: 87.5594959805773`
177	`- type: manhattan_spearman`
178	`value: 88.19658793233961`
179	`- task:`
180	`type: Classification`
181	`dataset:`
182	`type: mteb/banking77`
183	`name: MTEB Banking77Classification`
184	`config: default`
185	`split: test`
186	`revision: 0fd18e25b25c072e09e0d92ab615fda904d66300`
187	`metrics:`
188	`- type: accuracy`
189	`value: 86.9512987012987`
190	`- type: f1`
191	`value: 86.92515357973708`
192	`- task:`
193	`type: Clustering`
194	`dataset:`
195	`type: mteb/biorxiv-clustering-p2p`
196	`name: MTEB BiorxivClusteringP2P`
197	`config: default`
198	`split: test`
199	`revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40`
200	`metrics:`
201	`- type: v_measure`
202	`value: 39.10263762928872`
203	`- task:`
204	`type: Clustering`
205	`dataset:`
206	`type: mteb/biorxiv-clustering-s2s`
207	`name: MTEB BiorxivClusteringS2S`
208	`config: default`
209	`split: test`
210	`revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908`
211	`metrics:`
212	`- type: v_measure`
213	`value: 36.69711517426737`
214	`- task:`
215	`type: Retrieval`
216	`dataset:`
217	`type: BeIR/cqadupstack`
218	`name: MTEB CQADupstackAndroidRetrieval`
219	`config: default`
220	`split: test`
221	`revision: None`
222	`metrics:`
223	`- type: map_at_1`
224	`value: 32.327`
225	`- type: map_at_10`
226	`value: 44.099`
227	`- type: map_at_100`
228	`value: 45.525`
229	`- type: map_at_1000`
230	`value: 45.641999999999996`
231	`- type: map_at_3`
232	`value: 40.47`
233	`- type: map_at_5`
234	`value: 42.36`
235	`- type: mrr_at_1`
236	`value: 39.199`
237	`- type: mrr_at_10`
238	`value: 49.651`
239	`- type: mrr_at_100`
240	`value: 50.29`
241	`- type: mrr_at_1000`
242	`value: 50.329`
243	`- type: mrr_at_3`
244	`value: 46.924`
245	`- type: mrr_at_5`
246	`value: 48.548`
247	`- type: ndcg_at_1`
248	`value: 39.199`
249	`- type: ndcg_at_10`
250	`value: 50.773`
251	`- type: ndcg_at_100`
252	`value: 55.67999999999999`
253	`- type: ndcg_at_1000`
254	`value: 57.495`
255	`- type: ndcg_at_3`
256	`value: 45.513999999999996`
257	`- type: ndcg_at_5`
258	`value: 47.703`
259	`- type: precision_at_1`
260	`value: 39.199`
261	`- type: precision_at_10`
262	`value: 9.914000000000001`
263	`- type: precision_at_100`
264	`value: 1.5310000000000001`
265	`- type: precision_at_1000`
266	`value: 0.198`
267	`- type: precision_at_3`
268	`value: 21.984`
269	`- type: precision_at_5`
270	`value: 15.737000000000002`
271	`- type: recall_at_1`
272	`value: 32.327`
273	`- type: recall_at_10`
274	`value: 63.743`
275	`- type: recall_at_100`
276	`value: 84.538`
277	`- type: recall_at_1000`
278	`value: 96.089`
279	`- type: recall_at_3`
280	`value: 48.065000000000005`
281	`- type: recall_at_5`
282	`value: 54.519`
283	`- task:`
284	`type: Retrieval`
285	`dataset:`
286	`type: BeIR/cqadupstack`
287	`name: MTEB CQADupstackEnglishRetrieval`
288	`config: default`
289	`split: test`
290	`revision: None`
291	`metrics:`
292	`- type: map_at_1`
293	`value: 32.671`
294	`- type: map_at_10`
295	`value: 42.954`
296	`- type: map_at_100`
297	`value: 44.151`
298	`- type: map_at_1000`
299	`value: 44.287`
300	`- type: map_at_3`
301	`value: 39.912`
302	`- type: map_at_5`
303	`value: 41.798`
304	`- type: mrr_at_1`
305	`value: 41.465`
306	`- type: mrr_at_10`
307	`value: 49.351`
308	`- type: mrr_at_100`
309	`value: 49.980000000000004`
310	`- type: mrr_at_1000`
311	`value: 50.016000000000005`
312	`- type: mrr_at_3`
313	`value: 47.144000000000005`
314	`- type: mrr_at_5`
315	`value: 48.592999999999996`
316	`- type: ndcg_at_1`
317	`value: 41.465`
318	`- type: ndcg_at_10`
319	`value: 48.565999999999995`
320	`- type: ndcg_at_100`
321	`value: 52.76499999999999`
322	`- type: ndcg_at_1000`
323	`value: 54.749`
324	`- type: ndcg_at_3`
325	`value: 44.57`
326	`- type: ndcg_at_5`
327	`value: 46.759`
328	`- type: precision_at_1`
329	`value: 41.465`
330	`- type: precision_at_10`
331	`value: 9.107999999999999`
332	`- type: precision_at_100`
333	`value: 1.433`
334	`- type: precision_at_1000`
335	`value: 0.191`
336	`- type: precision_at_3`
337	`value: 21.423000000000002`
338	`- type: precision_at_5`
339	`value: 15.414`
340	`- type: recall_at_1`
341	`value: 32.671`
342	`- type: recall_at_10`
343	`value: 57.738`
344	`- type: recall_at_100`
345	`value: 75.86500000000001`
346	`- type: recall_at_1000`
347	`value: 88.36`
348	`- type: recall_at_3`
349	`value: 45.626`
350	`- type: recall_at_5`
351	`value: 51.812000000000005`
352	`- task:`
353	`type: Retrieval`
354	`dataset:`
355	`type: BeIR/cqadupstack`
356	`name: MTEB CQADupstackGamingRetrieval`
357	`config: default`
358	`split: test`
359	`revision: None`
360	`metrics:`
361	`- type: map_at_1`
362	`value: 41.185`
363	`- type: map_at_10`
364	`value: 53.929`
365	`- type: map_at_100`
366	`value: 54.92`
367	`- type: map_at_1000`
368	`value: 54.967999999999996`
369	`- type: map_at_3`
370	`value: 50.70400000000001`
371	`- type: map_at_5`
372	`value: 52.673`
373	`- type: mrr_at_1`
374	`value: 47.398`
375	`- type: mrr_at_10`
376	`value: 57.303000000000004`
377	`- type: mrr_at_100`
378	`value: 57.959`
379	`- type: mrr_at_1000`
380	`value: 57.985`
381	`- type: mrr_at_3`
382	`value: 54.932`
383	`- type: mrr_at_5`
384	`value: 56.464999999999996`
385	`- type: ndcg_at_1`
386	`value: 47.398`
387	`- type: ndcg_at_10`
388	`value: 59.653`
389	`- type: ndcg_at_100`
390	`value: 63.627`
391	`- type: ndcg_at_1000`
392	`value: 64.596`
393	`- type: ndcg_at_3`
394	`value: 54.455`
395	`- type: ndcg_at_5`
396	`value: 57.245000000000005`
397	`- type: precision_at_1`
398	`value: 47.398`
399	`- type: precision_at_10`
400	`value: 9.524000000000001`
401	`- type: precision_at_100`
402	`value: 1.243`
403	`- type: precision_at_1000`
404	`value: 0.13699999999999998`
405	`- type: precision_at_3`
406	`value: 24.389`
407	`- type: precision_at_5`
408	`value: 16.752`
409	`- type: recall_at_1`
410	`value: 41.185`
411	`- type: recall_at_10`
412	`value: 73.193`
413	`- type: recall_at_100`
414	`value: 90.357`
415	`- type: recall_at_1000`
416	`value: 97.253`
417	`- type: recall_at_3`
418	`value: 59.199999999999996`
419	`- type: recall_at_5`
420	`value: 66.118`
421	`- task:`
422	`type: Retrieval`
423	`dataset:`
424	`type: BeIR/cqadupstack`
425	`name: MTEB CQADupstackGisRetrieval`
426	`config: default`
427	`split: test`
428	`revision: None`
429	`metrics:`
430	`- type: map_at_1`
431	`value: 27.27`
432	`- type: map_at_10`
433	`value: 36.223`
434	`- type: map_at_100`
435	`value: 37.218`
436	`- type: map_at_1000`
437	`value: 37.293`
438	`- type: map_at_3`
439	`value: 33.503`
440	`- type: map_at_5`
441	`value: 35.097`
442	`- type: mrr_at_1`
443	`value: 29.492`
444	`- type: mrr_at_10`
445	`value: 38.352000000000004`
446	`- type: mrr_at_100`
447	`value: 39.188`
448	`- type: mrr_at_1000`
449	`value: 39.247`
450	`- type: mrr_at_3`
451	`value: 35.876000000000005`
452	`- type: mrr_at_5`
453	`value: 37.401`
454	`- type: ndcg_at_1`
455	`value: 29.492`
456	`- type: ndcg_at_10`
457	`value: 41.239`
458	`- type: ndcg_at_100`
459	`value: 46.066`
460	`- type: ndcg_at_1000`
461	`value: 47.992000000000004`
462	`- type: ndcg_at_3`
463	`value: 36.11`
464	`- type: ndcg_at_5`
465	`value: 38.772`
466	`- type: precision_at_1`
467	`value: 29.492`
468	`- type: precision_at_10`
469	`value: 6.260000000000001`
470	`- type: precision_at_100`
471	`value: 0.914`
472	`- type: precision_at_1000`
473	`value: 0.11100000000000002`
474	`- type: precision_at_3`
475	`value: 15.104000000000001`
476	`- type: precision_at_5`
477	`value: 10.644`
478	`- type: recall_at_1`
479	`value: 27.27`
480	`- type: recall_at_10`
481	`value: 54.589`
482	`- type: recall_at_100`
483	`value: 76.70700000000001`
484	`- type: recall_at_1000`
485	`value: 91.158`
486	`- type: recall_at_3`
487	`value: 40.974`
488	`- type: recall_at_5`
489	`value: 47.327000000000005`
490	`- task:`
491	`type: Retrieval`
492	`dataset:`
493	`type: BeIR/cqadupstack`
494	`name: MTEB CQADupstackMathematicaRetrieval`
495	`config: default`
496	`split: test`
497	`revision: None`
498	`metrics:`
499	`- type: map_at_1`
500	`value: 17.848`
501	`- type: map_at_10`
502	`value: 26.207`
503	`- type: map_at_100`
504	`value: 27.478`
505	`- type: map_at_1000`
506	`value: 27.602`
507	`- type: map_at_3`
508	`value: 23.405`
509	`- type: map_at_5`
510	`value: 24.98`
511	`- type: mrr_at_1`
512	`value: 21.891`
513	`- type: mrr_at_10`
514	`value: 31.041999999999998`
515	`- type: mrr_at_100`
516	`value: 32.092`
517	`- type: mrr_at_1000`
518	`value: 32.151999999999994`
519	`- type: mrr_at_3`
520	`value: 28.358`
521	`- type: mrr_at_5`
522	`value: 29.969`
523	`- type: ndcg_at_1`
524	`value: 21.891`
525	`- type: ndcg_at_10`
526	`value: 31.585`
527	`- type: ndcg_at_100`
528	`value: 37.531`
529	`- type: ndcg_at_1000`
530	`value: 40.256`
531	`- type: ndcg_at_3`
532	`value: 26.508`
533	`- type: ndcg_at_5`
534	`value: 28.894`
535	`- type: precision_at_1`
536	`value: 21.891`
537	`- type: precision_at_10`
538	`value: 5.795999999999999`
539	`- type: precision_at_100`
540	`value: 0.9990000000000001`
541	`- type: precision_at_1000`
542	`value: 0.13799999999999998`
543	`- type: precision_at_3`
544	`value: 12.769`
545	`- type: precision_at_5`
546	`value: 9.279`
547	`- type: recall_at_1`
548	`value: 17.848`
549	`- type: recall_at_10`
550	`value: 43.452`
551	`- type: recall_at_100`
552	`value: 69.216`
553	`- type: recall_at_1000`
554	`value: 88.102`
555	`- type: recall_at_3`
556	`value: 29.18`
557	`- type: recall_at_5`
558	`value: 35.347`
559	`- task:`
560	`type: Retrieval`
561	`dataset:`
562	`type: BeIR/cqadupstack`
563	`name: MTEB CQADupstackPhysicsRetrieval`
564	`config: default`
565	`split: test`
566	`revision: None`
567	`metrics:`
568	`- type: map_at_1`
569	`value: 30.94`
570	`- type: map_at_10`
571	`value: 41.248000000000005`
572	`- type: map_at_100`
573	`value: 42.495`
574	`- type: map_at_1000`
575	`value: 42.602000000000004`
576	`- type: map_at_3`
577	`value: 37.939`
578	`- type: map_at_5`
579	`value: 39.924`
580	`- type: mrr_at_1`
581	`value: 37.824999999999996`
582	`- type: mrr_at_10`
583	`value: 47.041`
584	`- type: mrr_at_100`
585	`value: 47.83`
586	`- type: mrr_at_1000`
587	`value: 47.878`
588	`- type: mrr_at_3`
589	`value: 44.466`
590	`- type: mrr_at_5`
591	`value: 46.111999999999995`
592	`- type: ndcg_at_1`
593	`value: 37.824999999999996`
594	`- type: ndcg_at_10`
595	`value: 47.223`
596	`- type: ndcg_at_100`
597	`value: 52.394`
598	`- type: ndcg_at_1000`
599	`value: 54.432`
600	`- type: ndcg_at_3`
601	`value: 42.032000000000004`
602	`- type: ndcg_at_5`
603	`value: 44.772`
604	`- type: precision_at_1`
605	`value: 37.824999999999996`
606	`- type: precision_at_10`
607	`value: 8.393`
608	`- type: precision_at_100`
609	`value: 1.2890000000000001`
610	`- type: precision_at_1000`
611	`value: 0.164`
612	`- type: precision_at_3`
613	`value: 19.698`
614	`- type: precision_at_5`
615	`value: 14.013`
616	`- type: recall_at_1`
617	`value: 30.94`
618	`- type: recall_at_10`
619	`value: 59.316`
620	`- type: recall_at_100`
621	`value: 80.783`
622	`- type: recall_at_1000`
623	`value: 94.15400000000001`
624	`- type: recall_at_3`
625	`value: 44.712`
626	`- type: recall_at_5`
627	`value: 51.932`
628	`- task:`
629	`type: Retrieval`
630	`dataset:`
631	`type: BeIR/cqadupstack`
632	`name: MTEB CQADupstackProgrammersRetrieval`
633	`config: default`
634	`split: test`
635	`revision: None`
636	`metrics:`
637	`- type: map_at_1`
638	`value: 27.104`
639	`- type: map_at_10`
640	`value: 36.675999999999995`
641	`- type: map_at_100`
642	`value: 38.076`
643	`- type: map_at_1000`
644	`value: 38.189`
645	`- type: map_at_3`
646	`value: 33.733999999999995`
647	`- type: map_at_5`
648	`value: 35.287`
649	`- type: mrr_at_1`
650	`value: 33.904`
651	`- type: mrr_at_10`
652	`value: 42.55`
653	`- type: mrr_at_100`
654	`value: 43.434`
655	`- type: mrr_at_1000`
656	`value: 43.494`
657	`- type: mrr_at_3`
658	`value: 40.126`
659	`- type: mrr_at_5`
660	`value: 41.473`
661	`- type: ndcg_at_1`
662	`value: 33.904`
663	`- type: ndcg_at_10`
664	`value: 42.414`
665	`- type: ndcg_at_100`
666	`value: 48.203`
667	`- type: ndcg_at_1000`
668	`value: 50.437`
669	`- type: ndcg_at_3`
670	`value: 37.633`
671	`- type: ndcg_at_5`
672	`value: 39.67`
673	`- type: precision_at_1`
674	`value: 33.904`
675	`- type: precision_at_10`
676	`value: 7.82`
677	`- type: precision_at_100`
678	`value: 1.2409999999999999`
679	`- type: precision_at_1000`
680	`value: 0.159`
681	`- type: precision_at_3`
682	`value: 17.884`
683	`- type: precision_at_5`
684	`value: 12.648000000000001`
685	`- type: recall_at_1`
686	`value: 27.104`
687	`- type: recall_at_10`
688	`value: 53.563`
689	`- type: recall_at_100`
690	`value: 78.557`
691	`- type: recall_at_1000`
692	`value: 93.533`
693	`- type: recall_at_3`
694	`value: 39.92`
695	`- type: recall_at_5`
696	`value: 45.457`
697	`- task:`
698	`type: Retrieval`
699	`dataset:`
700	`type: BeIR/cqadupstack`
701	`name: MTEB CQADupstackRetrieval`
702	`config: default`
703	`split: test`
704	`revision: None`
705	`metrics:`
706	`- type: map_at_1`
707	`value: 27.707749999999997`
708	`- type: map_at_10`
709	`value: 36.961`
710	`- type: map_at_100`
711	`value: 38.158833333333334`
712	`- type: map_at_1000`
713	`value: 38.270333333333326`
714	`- type: map_at_3`
715	`value: 34.07183333333334`
716	`- type: map_at_5`
717	`value: 35.69533333333334`
718	`- type: mrr_at_1`
719	`value: 32.81875`
720	`- type: mrr_at_10`
721	`value: 41.293`
722	`- type: mrr_at_100`
723	`value: 42.116499999999995`
724	`- type: mrr_at_1000`
725	`value: 42.170249999999996`
726	`- type: mrr_at_3`
727	`value: 38.83983333333333`
728	`- type: mrr_at_5`
729	`value: 40.29775`
730	`- type: ndcg_at_1`
731	`value: 32.81875`
732	`- type: ndcg_at_10`
733	`value: 42.355`
734	`- type: ndcg_at_100`
735	`value: 47.41374999999999`
736	`- type: ndcg_at_1000`
737	`value: 49.5805`
738	`- type: ndcg_at_3`
739	`value: 37.52825`
740	`- type: ndcg_at_5`
741	`value: 39.83266666666667`
742	`- type: precision_at_1`
743	`value: 32.81875`
744	`- type: precision_at_10`
745	`value: 7.382416666666666`
746	`- type: precision_at_100`
747	`value: 1.1640833333333334`
748	`- type: precision_at_1000`
749	`value: 0.15383333333333335`
750	`- type: precision_at_3`
751	`value: 17.134166666666665`
752	`- type: precision_at_5`
753	`value: 12.174833333333336`
754	`- type: recall_at_1`
755	`value: 27.707749999999997`
756	`- type: recall_at_10`
757	`value: 53.945`
758	`- type: recall_at_100`
759	`value: 76.191`
760	`- type: recall_at_1000`
761	`value: 91.101`
762	`- type: recall_at_3`
763	`value: 40.39083333333334`
764	`- type: recall_at_5`
765	`value: 46.40083333333333`
766	`- task:`
767	`type: Retrieval`
768	`dataset:`
769	`type: BeIR/cqadupstack`
770	`name: MTEB CQADupstackStatsRetrieval`
771	`config: default`
772	`split: test`
773	`revision: None`
774	`metrics:`
775	`- type: map_at_1`
776	`value: 26.482`
777	`- type: map_at_10`
778	`value: 33.201`
779	`- type: map_at_100`
780	`value: 34.107`
781	`- type: map_at_1000`
782	`value: 34.197`
783	`- type: map_at_3`
784	`value: 31.174000000000003`
785	`- type: map_at_5`
786	`value: 32.279`
787	`- type: mrr_at_1`
788	`value: 29.908`
789	`- type: mrr_at_10`
790	`value: 36.235`
791	`- type: mrr_at_100`
792	`value: 37.04`
793	`- type: mrr_at_1000`
794	`value: 37.105`
795	`- type: mrr_at_3`
796	`value: 34.355999999999995`
797	`- type: mrr_at_5`
798	`value: 35.382999999999996`
799	`- type: ndcg_at_1`
800	`value: 29.908`
801	`- type: ndcg_at_10`
802	`value: 37.325`
803	`- type: ndcg_at_100`
804	`value: 41.795`
805	`- type: ndcg_at_1000`
806	`value: 44.105`
807	`- type: ndcg_at_3`
808	`value: 33.555`
809	`- type: ndcg_at_5`
810	`value: 35.266999999999996`
811	`- type: precision_at_1`
812	`value: 29.908`
813	`- type: precision_at_10`
814	`value: 5.721`
815	`- type: precision_at_100`
816	`value: 0.8630000000000001`
817	`- type: precision_at_1000`
818	`value: 0.11299999999999999`
819	`- type: precision_at_3`
820	`value: 14.008000000000001`
821	`- type: precision_at_5`
822	`value: 9.754999999999999`
823	`- type: recall_at_1`
824	`value: 26.482`
825	`- type: recall_at_10`
826	`value: 47.072`
827	`- type: recall_at_100`
828	`value: 67.27`
829	`- type: recall_at_1000`
830	`value: 84.371`
831	`- type: recall_at_3`
832	`value: 36.65`
833	`- type: recall_at_5`
834	`value: 40.774`
835	`- task:`
836	`type: Retrieval`
837	`dataset:`
838	`type: BeIR/cqadupstack`
839	`name: MTEB CQADupstackTexRetrieval`
840	`config: default`
841	`split: test`
842	`revision: None`
843	`metrics:`
844	`- type: map_at_1`
845	`value: 18.815`
846	`- type: map_at_10`
847	`value: 26.369999999999997`
848	`- type: map_at_100`
849	`value: 27.458`
850	`- type: map_at_1000`
851	`value: 27.588`
852	`- type: map_at_3`
853	`value: 23.990000000000002`
854	`- type: map_at_5`
855	`value: 25.345000000000002`
856	`- type: mrr_at_1`
857	`value: 22.953000000000003`
858	`- type: mrr_at_10`
859	`value: 30.342999999999996`
860	`- type: mrr_at_100`
861	`value: 31.241000000000003`
862	`- type: mrr_at_1000`
863	`value: 31.319000000000003`
864	`- type: mrr_at_3`
865	`value: 28.16`
866	`- type: mrr_at_5`
867	`value: 29.406`
868	`- type: ndcg_at_1`
869	`value: 22.953000000000003`
870	`- type: ndcg_at_10`
871	`value: 31.151`
872	`- type: ndcg_at_100`
873	`value: 36.309000000000005`
874	`- type: ndcg_at_1000`
875	`value: 39.227000000000004`
876	`- type: ndcg_at_3`
877	`value: 26.921`
878	`- type: ndcg_at_5`
879	`value: 28.938000000000002`
880	`- type: precision_at_1`
881	`value: 22.953000000000003`
882	`- type: precision_at_10`
883	`value: 5.602`
884	`- type: precision_at_100`
885	`value: 0.9530000000000001`
886	`- type: precision_at_1000`
887	`value: 0.13899999999999998`
888	`- type: precision_at_3`
889	`value: 12.606`
890	`- type: precision_at_5`
891	`value: 9.119`
892	`- type: recall_at_1`
893	`value: 18.815`
894	`- type: recall_at_10`
895	`value: 41.574`
896	`- type: recall_at_100`
897	`value: 64.84400000000001`
898	`- type: recall_at_1000`
899	`value: 85.406`
900	`- type: recall_at_3`
901	`value: 29.694`
902	`- type: recall_at_5`
903	`value: 34.935`
904	`- task:`
905	`type: Retrieval`
906	`dataset:`
907	`type: BeIR/cqadupstack`
908	`name: MTEB CQADupstackUnixRetrieval`
909	`config: default`
910	`split: test`
911	`revision: None`
912	`metrics:`
913	`- type: map_at_1`
914	`value: 27.840999999999998`
915	`- type: map_at_10`
916	`value: 36.797999999999995`
917	`- type: map_at_100`
918	`value: 37.993`
919	`- type: map_at_1000`
920	`value: 38.086999999999996`
921	`- type: map_at_3`
922	`value: 34.050999999999995`
923	`- type: map_at_5`
924	`value: 35.379`
925	`- type: mrr_at_1`
926	`value: 32.649`
927	`- type: mrr_at_10`
928	`value: 41.025`
929	`- type: mrr_at_100`
930	`value: 41.878`
931	`- type: mrr_at_1000`
932	`value: 41.929`
933	`- type: mrr_at_3`
934	`value: 38.573`
935	`- type: mrr_at_5`
936	`value: 39.715`
937	`- type: ndcg_at_1`
938	`value: 32.649`
939	`- type: ndcg_at_10`
940	`value: 42.142`
941	`- type: ndcg_at_100`
942	`value: 47.558`
943	`- type: ndcg_at_1000`
944	`value: 49.643`
945	`- type: ndcg_at_3`
946	`value: 37.12`
947	`- type: ndcg_at_5`
948	`value: 38.983000000000004`
949	`- type: precision_at_1`
950	`value: 32.649`
951	`- type: precision_at_10`
952	`value: 7.08`
953	`- type: precision_at_100`
954	`value: 1.1039999999999999`
955	`- type: precision_at_1000`
956	`value: 0.13899999999999998`
957	`- type: precision_at_3`
958	`value: 16.698`
959	`- type: precision_at_5`
960	`value: 11.511000000000001`
961	`- type: recall_at_1`
962	`value: 27.840999999999998`
963	`- type: recall_at_10`
964	`value: 54.245`
965	`- type: recall_at_100`
966	`value: 77.947`
967	`- type: recall_at_1000`
968	`value: 92.36999999999999`
969	`- type: recall_at_3`
970	`value: 40.146`
971	`- type: recall_at_5`
972	`value: 44.951`
973	`- task:`
974	`type: Retrieval`
975	`dataset:`
976	`type: BeIR/cqadupstack`
977	`name: MTEB CQADupstackWebmastersRetrieval`
978	`config: default`
979	`split: test`
980	`revision: None`
981	`metrics:`
982	`- type: map_at_1`
983	`value: 26.529000000000003`
984	`- type: map_at_10`
985	`value: 35.010000000000005`
986	`- type: map_at_100`
987	`value: 36.647`
988	`- type: map_at_1000`
989	`value: 36.857`
990	`- type: map_at_3`
991	`value: 31.968000000000004`
992	`- type: map_at_5`
993	`value: 33.554`
994	`- type: mrr_at_1`
995	`value: 31.818`
996	`- type: mrr_at_10`
997	`value: 39.550999999999995`
998	`- type: mrr_at_100`
999	`value: 40.54`
1000	`- type: mrr_at_1000`
1001	`value: 40.596`
1002	`- type: mrr_at_3`
1003	`value: 36.726`
1004	`- type: mrr_at_5`
1005	`value: 38.416`
1006	`- type: ndcg_at_1`
1007	`value: 31.818`
1008	`- type: ndcg_at_10`
1009	`value: 40.675`
1010	`- type: ndcg_at_100`
1011	`value: 46.548`
1012	`- type: ndcg_at_1000`
1013	`value: 49.126`
1014	`- type: ndcg_at_3`
1015	`value: 35.829`
1016	`- type: ndcg_at_5`
1017	`value: 38.0`
1018	`- type: precision_at_1`
1019	`value: 31.818`
1020	`- type: precision_at_10`
1021	`value: 7.826`
1022	`- type: precision_at_100`
1023	`value: 1.538`
1024	`- type: precision_at_1000`
1025	`value: 0.24`
1026	`- type: precision_at_3`
1027	`value: 16.601`
1028	`- type: precision_at_5`
1029	`value: 12.095`
1030	`- type: recall_at_1`
1031	`value: 26.529000000000003`
1032	`- type: recall_at_10`
1033	`value: 51.03`
1034	`- type: recall_at_100`
1035	`value: 77.556`
1036	`- type: recall_at_1000`
1037	`value: 93.804`
1038	`- type: recall_at_3`
1039	`value: 36.986000000000004`
1040	`- type: recall_at_5`
1041	`value: 43.096000000000004`
1042	`- task:`
1043	`type: Retrieval`
1044	`dataset:`
1045	`type: BeIR/cqadupstack`
1046	`name: MTEB CQADupstackWordpressRetrieval`
1047	`config: default`
1048	`split: test`
1049	`revision: None`
1050	`metrics:`
1051	`- type: map_at_1`
1052	`value: 23.480999999999998`
1053	`- type: map_at_10`
1054	`value: 30.817`
1055	`- type: map_at_100`
1056	`value: 31.838`
1057	`- type: map_at_1000`
1058	`value: 31.932`
1059	`- type: map_at_3`
1060	`value: 28.011999999999997`
1061	`- type: map_at_5`
1062	`value: 29.668`
1063	`- type: mrr_at_1`
1064	`value: 25.323`
1065	`- type: mrr_at_10`
1066	`value: 33.072`
1067	`- type: mrr_at_100`
1068	`value: 33.926`
1069	`- type: mrr_at_1000`
1070	`value: 33.993`
1071	`- type: mrr_at_3`
1072	`value: 30.436999999999998`
1073	`- type: mrr_at_5`
1074	`value: 32.092`
1075	`- type: ndcg_at_1`
1076	`value: 25.323`
1077	`- type: ndcg_at_10`
1078	`value: 35.514`
1079	`- type: ndcg_at_100`
1080	`value: 40.489000000000004`
1081	`- type: ndcg_at_1000`
1082	`value: 42.908`
1083	`- type: ndcg_at_3`
1084	`value: 30.092000000000002`
1085	`- type: ndcg_at_5`
1086	`value: 32.989000000000004`
1087	`- type: precision_at_1`
1088	`value: 25.323`
1089	`- type: precision_at_10`
1090	`value: 5.545`
1091	`- type: precision_at_100`
1092	`value: 0.861`
1093	`- type: precision_at_1000`
1094	`value: 0.117`
1095	`- type: precision_at_3`
1096	`value: 12.446`
1097	`- type: precision_at_5`
1098	`value: 9.131`
1099	`- type: recall_at_1`
1100	`value: 23.480999999999998`
1101	`- type: recall_at_10`
1102	`value: 47.825`
1103	`- type: recall_at_100`
1104	`value: 70.652`
1105	`- type: recall_at_1000`
1106	`value: 88.612`
1107	`- type: recall_at_3`
1108	`value: 33.537`
1109	`- type: recall_at_5`
1110	`value: 40.542`
1111	`- task:`
1112	`type: Retrieval`
1113	`dataset:`
1114	`type: climate-fever`
1115	`name: MTEB ClimateFEVER`
1116	`config: default`
1117	`split: test`
1118	`revision: None`
1119	`metrics:`
1120	`- type: map_at_1`
1121	`value: 13.333999999999998`
1122	`- type: map_at_10`
1123	`value: 22.524`
1124	`- type: map_at_100`
1125	`value: 24.506`
1126	`- type: map_at_1000`
1127	`value: 24.715`
1128	`- type: map_at_3`
1129	`value: 19.022`
1130	`- type: map_at_5`
1131	`value: 20.693`
1132	`- type: mrr_at_1`
1133	`value: 29.186`
1134	`- type: mrr_at_10`
1135	`value: 41.22`
1136	`- type: mrr_at_100`
1137	`value: 42.16`
1138	`- type: mrr_at_1000`
1139	`value: 42.192`
1140	`- type: mrr_at_3`
1141	`value: 38.013000000000005`
1142	`- type: mrr_at_5`
1143	`value: 39.704`
1144	`- type: ndcg_at_1`
1145	`value: 29.186`
1146	`- type: ndcg_at_10`
1147	`value: 31.167`
1148	`- type: ndcg_at_100`
1149	`value: 38.879000000000005`
1150	`- type: ndcg_at_1000`
1151	`value: 42.376000000000005`
1152	`- type: ndcg_at_3`
1153	`value: 25.817`
1154	`- type: ndcg_at_5`
1155	`value: 27.377000000000002`
1156	`- type: precision_at_1`
1157	`value: 29.186`
1158	`- type: precision_at_10`
1159	`value: 9.693999999999999`
1160	`- type: precision_at_100`
1161	`value: 1.8030000000000002`
1162	`- type: precision_at_1000`
1163	`value: 0.246`
1164	`- type: precision_at_3`
1165	`value: 19.11`
1166	`- type: precision_at_5`
1167	`value: 14.344999999999999`
1168	`- type: recall_at_1`
1169	`value: 13.333999999999998`
1170	`- type: recall_at_10`
1171	`value: 37.092000000000006`
1172	`- type: recall_at_100`
1173	`value: 63.651`
1174	`- type: recall_at_1000`
1175	`value: 83.05`
1176	`- type: recall_at_3`
1177	`value: 23.74`
1178	`- type: recall_at_5`
1179	`value: 28.655`
1180	`- task:`
1181	`type: Retrieval`
1182	`dataset:`
1183	`type: dbpedia-entity`
1184	`name: MTEB DBPedia`
1185	`config: default`
1186	`split: test`
1187	`revision: None`
1188	`metrics:`
1189	`- type: map_at_1`
1190	`value: 9.151`
1191	`- type: map_at_10`
1192	`value: 19.653000000000002`
1193	`- type: map_at_100`
1194	`value: 28.053`
1195	`- type: map_at_1000`
1196	`value: 29.709000000000003`
1197	`- type: map_at_3`
1198	`value: 14.191`
1199	`- type: map_at_5`
1200	`value: 16.456`
1201	`- type: mrr_at_1`
1202	`value: 66.25`
1203	`- type: mrr_at_10`
1204	`value: 74.4`
1205	`- type: mrr_at_100`
1206	`value: 74.715`
1207	`- type: mrr_at_1000`
1208	`value: 74.726`
1209	`- type: mrr_at_3`
1210	`value: 72.417`
1211	`- type: mrr_at_5`
1212	`value: 73.667`
1213	`- type: ndcg_at_1`
1214	`value: 54.25`
1215	`- type: ndcg_at_10`
1216	`value: 40.77`
1217	`- type: ndcg_at_100`
1218	`value: 46.359`
1219	`- type: ndcg_at_1000`
1220	`value: 54.193000000000005`
1221	`- type: ndcg_at_3`
1222	`value: 44.832`
1223	`- type: ndcg_at_5`
1224	`value: 42.63`
1225	`- type: precision_at_1`
1226	`value: 66.25`
1227	`- type: precision_at_10`
1228	`value: 32.175`
1229	`- type: precision_at_100`
1230	`value: 10.668`
1231	`- type: precision_at_1000`
1232	`value: 2.067`
1233	`- type: precision_at_3`
1234	`value: 47.667`
1235	`- type: precision_at_5`
1236	`value: 41.3`
1237	`- type: recall_at_1`
1238	`value: 9.151`
1239	`- type: recall_at_10`
1240	`value: 25.003999999999998`
1241	`- type: recall_at_100`
1242	`value: 52.976`
1243	`- type: recall_at_1000`
1244	`value: 78.315`
1245	`- type: recall_at_3`
1246	`value: 15.487`
1247	`- type: recall_at_5`
1248	`value: 18.999`
1249	`- task:`
1250	`type: Classification`
1251	`dataset:`
1252	`type: mteb/emotion`
1253	`name: MTEB EmotionClassification`
1254	`config: default`
1255	`split: test`
1256	`revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37`
1257	`metrics:`
1258	`- type: accuracy`
1259	`value: 51.89999999999999`
1260	`- type: f1`
1261	`value: 46.47777925067403`
1262	`- task:`
1263	`type: Retrieval`
1264	`dataset:`
1265	`type: fever`
1266	`name: MTEB FEVER`
1267	`config: default`
1268	`split: test`
1269	`revision: None`
1270	`metrics:`
1271	`- type: map_at_1`
1272	`value: 73.706`
1273	`- type: map_at_10`
1274	`value: 82.423`
1275	`- type: map_at_100`
1276	`value: 82.67999999999999`
1277	`- type: map_at_1000`
1278	`value: 82.694`
1279	`- type: map_at_3`
1280	`value: 81.328`
1281	`- type: map_at_5`
1282	`value: 82.001`
1283	`- type: mrr_at_1`
1284	`value: 79.613`
1285	`- type: mrr_at_10`
1286	`value: 87.07000000000001`
1287	`- type: mrr_at_100`
1288	`value: 87.169`
1289	`- type: mrr_at_1000`
1290	`value: 87.17`
1291	`- type: mrr_at_3`
1292	`value: 86.404`
1293	`- type: mrr_at_5`
1294	`value: 86.856`
1295	`- type: ndcg_at_1`
1296	`value: 79.613`
1297	`- type: ndcg_at_10`
1298	`value: 86.289`
1299	`- type: ndcg_at_100`
1300	`value: 87.201`
1301	`- type: ndcg_at_1000`
1302	`value: 87.428`
1303	`- type: ndcg_at_3`
1304	`value: 84.625`
1305	`- type: ndcg_at_5`
1306	`value: 85.53699999999999`
1307	`- type: precision_at_1`
1308	`value: 79.613`
1309	`- type: precision_at_10`
1310	`value: 10.399`
1311	`- type: precision_at_100`
1312	`value: 1.1079999999999999`
1313	`- type: precision_at_1000`
1314	`value: 0.11499999999999999`
1315	`- type: precision_at_3`
1316	`value: 32.473`
1317	`- type: precision_at_5`
1318	`value: 20.132`
1319	`- type: recall_at_1`
1320	`value: 73.706`
1321	`- type: recall_at_10`
1322	`value: 93.559`
1323	`- type: recall_at_100`
1324	`value: 97.188`
1325	`- type: recall_at_1000`
1326	`value: 98.555`
1327	`- type: recall_at_3`
1328	`value: 88.98700000000001`
1329	`- type: recall_at_5`
1330	`value: 91.373`
1331	`- task:`
1332	`type: Retrieval`
1333	`dataset:`
1334	`type: fiqa`
1335	`name: MTEB FiQA2018`
1336	`config: default`
1337	`split: test`
1338	`revision: None`
1339	`metrics:`
1340	`- type: map_at_1`
1341	`value: 19.841`
1342	`- type: map_at_10`
1343	`value: 32.643`
1344	`- type: map_at_100`
1345	`value: 34.575`
1346	`- type: map_at_1000`
1347	`value: 34.736`
1348	`- type: map_at_3`
1349	`value: 28.317999999999998`
1350	`- type: map_at_5`
1351	`value: 30.964000000000002`
1352	`- type: mrr_at_1`
1353	`value: 39.660000000000004`
1354	`- type: mrr_at_10`
1355	`value: 48.620000000000005`
1356	`- type: mrr_at_100`
1357	`value: 49.384`
1358	`- type: mrr_at_1000`
1359	`value: 49.415`
1360	`- type: mrr_at_3`
1361	`value: 45.988`
1362	`- type: mrr_at_5`
1363	`value: 47.361`
1364	`- type: ndcg_at_1`
1365	`value: 39.660000000000004`
1366	`- type: ndcg_at_10`
1367	`value: 40.646`
1368	`- type: ndcg_at_100`
1369	`value: 47.657`
1370	`- type: ndcg_at_1000`
1371	`value: 50.428`
1372	`- type: ndcg_at_3`
1373	`value: 36.689`
1374	`- type: ndcg_at_5`
1375	`value: 38.211`
1376	`- type: precision_at_1`
1377	`value: 39.660000000000004`
1378	`- type: precision_at_10`
1379	`value: 11.235000000000001`
1380	`- type: precision_at_100`
1381	`value: 1.8530000000000002`
1382	`- type: precision_at_1000`
1383	`value: 0.23600000000000002`
1384	`- type: precision_at_3`
1385	`value: 24.587999999999997`
1386	`- type: precision_at_5`
1387	`value: 18.395`
1388	`- type: recall_at_1`
1389	`value: 19.841`
1390	`- type: recall_at_10`
1391	`value: 48.135`
1392	`- type: recall_at_100`
1393	`value: 74.224`
1394	`- type: recall_at_1000`
1395	`value: 90.826`
1396	`- type: recall_at_3`
1397	`value: 33.536`
1398	`- type: recall_at_5`
1399	`value: 40.311`
1400	`- task:`
1401	`type: Retrieval`
1402	`dataset:`
1403	`type: hotpotqa`
1404	`name: MTEB HotpotQA`
1405	`config: default`
1406	`split: test`
1407	`revision: None`
1408	`metrics:`
1409	`- type: map_at_1`
1410	`value: 40.358`
1411	`- type: map_at_10`
1412	`value: 64.497`
1413	`- type: map_at_100`
1414	`value: 65.362`
1415	`- type: map_at_1000`
1416	`value: 65.41900000000001`
1417	`- type: map_at_3`
1418	`value: 61.06700000000001`
1419	`- type: map_at_5`
1420	`value: 63.317`
1421	`- type: mrr_at_1`
1422	`value: 80.716`
1423	`- type: mrr_at_10`
1424	`value: 86.10799999999999`
1425	`- type: mrr_at_100`
1426	`value: 86.265`
1427	`- type: mrr_at_1000`
1428	`value: 86.27`
1429	`- type: mrr_at_3`
1430	`value: 85.271`
1431	`- type: mrr_at_5`
1432	`value: 85.82499999999999`
1433	`- type: ndcg_at_1`
1434	`value: 80.716`
1435	`- type: ndcg_at_10`
1436	`value: 72.597`
1437	`- type: ndcg_at_100`
1438	`value: 75.549`
1439	`- type: ndcg_at_1000`
1440	`value: 76.61`
1441	`- type: ndcg_at_3`
1442	`value: 67.874`
1443	`- type: ndcg_at_5`
1444	`value: 70.655`
1445	`- type: precision_at_1`
1446	`value: 80.716`
1447	`- type: precision_at_10`
1448	`value: 15.148`
1449	`- type: precision_at_100`
1450	`value: 1.745`
1451	`- type: precision_at_1000`
1452	`value: 0.188`
1453	`- type: precision_at_3`
1454	`value: 43.597`
1455	`- type: precision_at_5`
1456	`value: 28.351`
1457	`- type: recall_at_1`
1458	`value: 40.358`
1459	`- type: recall_at_10`
1460	`value: 75.739`
1461	`- type: recall_at_100`
1462	`value: 87.259`
1463	`- type: recall_at_1000`
1464	`value: 94.234`
1465	`- type: recall_at_3`
1466	`value: 65.39500000000001`
1467	`- type: recall_at_5`
1468	`value: 70.878`
1469	`- task:`
1470	`type: Classification`
1471	`dataset:`
1472	`type: mteb/imdb`
1473	`name: MTEB ImdbClassification`
1474	`config: default`
1475	`split: test`
1476	`revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7`
1477	`metrics:`
1478	`- type: accuracy`
1479	`value: 90.80799999999998`
1480	`- type: ap`
1481	`value: 86.81350378180757`
1482	`- type: f1`
1483	`value: 90.79901248314215`
1484	`- task:`
1485	`type: Retrieval`
1486	`dataset:`
1487	`type: msmarco`
1488	`name: MTEB MSMARCO`
1489	`config: default`
1490	`split: dev`
1491	`revision: None`
1492	`metrics:`
1493	`- type: map_at_1`
1494	`value: 22.096`
1495	`- type: map_at_10`
1496	`value: 34.384`
1497	`- type: map_at_100`
1498	`value: 35.541`
1499	`- type: map_at_1000`
1500	`value: 35.589999999999996`
1501	`- type: map_at_3`
1502	`value: 30.496000000000002`
1503	`- type: map_at_5`
1504	`value: 32.718`
1505	`- type: mrr_at_1`
1506	`value: 22.750999999999998`
1507	`- type: mrr_at_10`
1508	`value: 35.024`
1509	`- type: mrr_at_100`
1510	`value: 36.125`
1511	`- type: mrr_at_1000`
1512	`value: 36.168`
1513	`- type: mrr_at_3`
1514	`value: 31.225`
1515	`- type: mrr_at_5`
1516	`value: 33.416000000000004`
1517	`- type: ndcg_at_1`
1518	`value: 22.750999999999998`
1519	`- type: ndcg_at_10`
1520	`value: 41.351`
1521	`- type: ndcg_at_100`
1522	`value: 46.92`
1523	`- type: ndcg_at_1000`
1524	`value: 48.111`
1525	`- type: ndcg_at_3`
1526	`value: 33.439`
1527	`- type: ndcg_at_5`
1528	`value: 37.407000000000004`
1529	`- type: precision_at_1`
1530	`value: 22.750999999999998`
1531	`- type: precision_at_10`
1532	`value: 6.564`
1533	`- type: precision_at_100`
1534	`value: 0.935`
1535	`- type: precision_at_1000`
1536	`value: 0.104`
1537	`- type: precision_at_3`
1538	`value: 14.288`
1539	`- type: precision_at_5`
1540	`value: 10.581999999999999`
1541	`- type: recall_at_1`
1542	`value: 22.096`
1543	`- type: recall_at_10`
1544	`value: 62.771`
1545	`- type: recall_at_100`
1546	`value: 88.529`
1547	`- type: recall_at_1000`
1548	`value: 97.55`
1549	`- type: recall_at_3`
1550	`value: 41.245`
1551	`- type: recall_at_5`
1552	`value: 50.788`
1553	`- task:`
1554	`type: Classification`
1555	`dataset:`
1556	`type: mteb/mtop_domain`
1557	`name: MTEB MTOPDomainClassification (en)`
1558	`config: en`
1559	`split: test`
1560	`revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf`
1561	`metrics:`
1562	`- type: accuracy`
1563	`value: 94.16780665754673`
1564	`- type: f1`
1565	`value: 93.96331194859894`
1566	`- task:`
1567	`type: Classification`
1568	`dataset:`
1569	`type: mteb/mtop_intent`
1570	`name: MTEB MTOPIntentClassification (en)`
1571	`config: en`
1572	`split: test`
1573	`revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba`
1574	`metrics:`
1575	`- type: accuracy`
1576	`value: 76.90606475148198`
1577	`- type: f1`
1578	`value: 58.58344986604187`
1579	`- task:`
1580	`type: Classification`
1581	`dataset:`
1582	`type: mteb/amazon_massive_intent`
1583	`name: MTEB MassiveIntentClassification (en)`
1584	`config: en`
1585	`split: test`
1586	`revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7`
1587	`metrics:`
1588	`- type: accuracy`
1589	`value: 76.14660390047075`
1590	`- type: f1`
1591	`value: 74.31533923533614`
1592	`- task:`
1593	`type: Classification`
1594	`dataset:`
1595	`type: mteb/amazon_massive_scenario`
1596	`name: MTEB MassiveScenarioClassification (en)`
1597	`config: en`
1598	`split: test`
1599	`revision: 7d571f92784cd94a019292a1f45445077d0ef634`
1600	`metrics:`
1601	`- type: accuracy`
1602	`value: 80.16139878950908`
1603	`- type: f1`
1604	`value: 80.18532656824924`
1605	`- task:`
1606	`type: Clustering`
1607	`dataset:`
1608	`type: mteb/medrxiv-clustering-p2p`
1609	`name: MTEB MedrxivClusteringP2P`
1610	`config: default`
1611	`split: test`
1612	`revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73`
1613	`metrics:`
1614	`- type: v_measure`
1615	`value: 32.949880906135085`
1616	`- task:`
1617	`type: Clustering`
1618	`dataset:`
1619	`type: mteb/medrxiv-clustering-s2s`
1620	`name: MTEB MedrxivClusteringS2S`
1621	`config: default`
1622	`split: test`
1623	`revision: 35191c8c0dca72d8ff3efcd72aa802307d469663`
1624	`metrics:`
1625	`- type: v_measure`
1626	`value: 31.56300351524862`
1627	`- task:`
1628	`type: Reranking`
1629	`dataset:`
1630	`type: mteb/mind_small`
1631	`name: MTEB MindSmallReranking`
1632	`config: default`
1633	`split: test`
1634	`revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69`
1635	`metrics:`
1636	`- type: map`
1637	`value: 31.196521894371315`
1638	`- type: mrr`
1639	`value: 32.22644231694389`
1640	`- task:`
1641	`type: Retrieval`
1642	`dataset:`
1643	`type: nfcorpus`
1644	`name: MTEB NFCorpus`
1645	`config: default`
1646	`split: test`
1647	`revision: None`
1648	`metrics:`
1649	`- type: map_at_1`
1650	`value: 6.783`
1651	`- type: map_at_10`
1652	`value: 14.549000000000001`
1653	`- type: map_at_100`
1654	`value: 18.433`
1655	`- type: map_at_1000`
1656	`value: 19.949`
1657	`- type: map_at_3`
1658	`value: 10.936`
1659	`- type: map_at_5`
1660	`value: 12.514`
1661	`- type: mrr_at_1`
1662	`value: 47.368`
1663	`- type: mrr_at_10`
1664	`value: 56.42`
1665	`- type: mrr_at_100`
1666	`value: 56.908`
1667	`- type: mrr_at_1000`
1668	`value: 56.95`
1669	`- type: mrr_at_3`
1670	`value: 54.283`
1671	`- type: mrr_at_5`
1672	`value: 55.568`
1673	`- type: ndcg_at_1`
1674	`value: 45.666000000000004`
1675	`- type: ndcg_at_10`
1676	`value: 37.389`
1677	`- type: ndcg_at_100`
1678	`value: 34.253`
1679	`- type: ndcg_at_1000`
1680	`value: 43.059999999999995`
1681	`- type: ndcg_at_3`
1682	`value: 42.725`
1683	`- type: ndcg_at_5`
1684	`value: 40.193`
1685	`- type: precision_at_1`
1686	`value: 47.368`
1687	`- type: precision_at_10`
1688	`value: 27.988000000000003`
1689	`- type: precision_at_100`
1690	`value: 8.672`
1691	`- type: precision_at_1000`
1692	`value: 2.164`
1693	`- type: precision_at_3`
1694	`value: 40.248`
1695	`- type: precision_at_5`
1696	`value: 34.737`
1697	`- type: recall_at_1`
1698	`value: 6.783`
1699	`- type: recall_at_10`
1700	`value: 17.838`
1701	`- type: recall_at_100`
1702	`value: 33.672000000000004`
1703	`- type: recall_at_1000`
1704	`value: 66.166`
1705	`- type: recall_at_3`
1706	`value: 11.849`
1707	`- type: recall_at_5`
1708	`value: 14.205000000000002`
1709	`- task:`
1710	`type: Retrieval`
1711	`dataset:`
1712	`type: nq`
1713	`name: MTEB NQ`
1714	`config: default`
1715	`split: test`
1716	`revision: None`
1717	`metrics:`
1718	`- type: map_at_1`
1719	`value: 31.698999999999998`
1720	`- type: map_at_10`
1721	`value: 46.556`
1722	`- type: map_at_100`
1723	`value: 47.652`
1724	`- type: map_at_1000`
1725	`value: 47.68`
1726	`- type: map_at_3`
1727	`value: 42.492000000000004`
1728	`- type: map_at_5`
1729	`value: 44.763999999999996`
1730	`- type: mrr_at_1`
1731	`value: 35.747`
1732	`- type: mrr_at_10`
1733	`value: 49.242999999999995`
1734	`- type: mrr_at_100`
1735	`value: 50.052`
1736	`- type: mrr_at_1000`
1737	`value: 50.068`
1738	`- type: mrr_at_3`
1739	`value: 45.867000000000004`
1740	`- type: mrr_at_5`
1741	`value: 47.778999999999996`
1742	`- type: ndcg_at_1`
1743	`value: 35.717999999999996`
1744	`- type: ndcg_at_10`
1745	`value: 54.14600000000001`
1746	`- type: ndcg_at_100`
1747	`value: 58.672999999999995`
1748	`- type: ndcg_at_1000`
1749	`value: 59.279`
1750	`- type: ndcg_at_3`
1751	`value: 46.407`
1752	`- type: ndcg_at_5`
1753	`value: 50.181`
1754	`- type: precision_at_1`
1755	`value: 35.717999999999996`
1756	`- type: precision_at_10`
1757	`value: 8.844000000000001`
1758	`- type: precision_at_100`
1759	`value: 1.139`
1760	`- type: precision_at_1000`
1761	`value: 0.12`
1762	`- type: precision_at_3`
1763	`value: 20.993000000000002`
1764	`- type: precision_at_5`
1765	`value: 14.791000000000002`
1766	`- type: recall_at_1`
1767	`value: 31.698999999999998`
1768	`- type: recall_at_10`
1769	`value: 74.693`
1770	`- type: recall_at_100`
1771	`value: 94.15299999999999`
1772	`- type: recall_at_1000`
1773	`value: 98.585`
1774	`- type: recall_at_3`
1775	`value: 54.388999999999996`
1776	`- type: recall_at_5`
1777	`value: 63.08200000000001`
1778	`- task:`
1779	`type: Retrieval`
1780	`dataset:`
1781	`type: quora`
1782	`name: MTEB QuoraRetrieval`
1783	`config: default`
1784	`split: test`
1785	`revision: None`
1786	`metrics:`
1787	`- type: map_at_1`
1788	`value: 71.283`
1789	`- type: map_at_10`
1790	`value: 85.24000000000001`
1791	`- type: map_at_100`
1792	`value: 85.882`
1793	`- type: map_at_1000`
1794	`value: 85.897`
1795	`- type: map_at_3`
1796	`value: 82.326`
1797	`- type: map_at_5`
1798	`value: 84.177`
1799	`- type: mrr_at_1`
1800	`value: 82.21000000000001`
1801	`- type: mrr_at_10`
1802	`value: 88.228`
1803	`- type: mrr_at_100`
1804	`value: 88.32`
1805	`- type: mrr_at_1000`
1806	`value: 88.32`
1807	`- type: mrr_at_3`
1808	`value: 87.323`
1809	`- type: mrr_at_5`
1810	`value: 87.94800000000001`
1811	`- type: ndcg_at_1`
1812	`value: 82.17999999999999`
1813	`- type: ndcg_at_10`
1814	`value: 88.9`
1815	`- type: ndcg_at_100`
1816	`value: 90.079`
1817	`- type: ndcg_at_1000`
1818	`value: 90.158`
1819	`- type: ndcg_at_3`
1820	`value: 86.18299999999999`
1821	`- type: ndcg_at_5`
1822	`value: 87.71799999999999`
1823	`- type: precision_at_1`
1824	`value: 82.17999999999999`
1825	`- type: precision_at_10`
1826	`value: 13.464`
1827	`- type: precision_at_100`
1828	`value: 1.533`
1829	`- type: precision_at_1000`
1830	`value: 0.157`
1831	`- type: precision_at_3`
1832	`value: 37.693`
1833	`- type: precision_at_5`
1834	`value: 24.792`
1835	`- type: recall_at_1`
1836	`value: 71.283`
1837	`- type: recall_at_10`
1838	`value: 95.742`
1839	`- type: recall_at_100`
1840	`value: 99.67200000000001`
1841	`- type: recall_at_1000`
1842	`value: 99.981`
1843	`- type: recall_at_3`
1844	`value: 87.888`
1845	`- type: recall_at_5`
1846	`value: 92.24`
1847	`- task:`
1848	`type: Clustering`
1849	`dataset:`
1850	`type: mteb/reddit-clustering`
1851	`name: MTEB RedditClustering`
1852	`config: default`
1853	`split: test`
1854	`revision: 24640382cdbf8abc73003fb0fa6d111a705499eb`
1855	`metrics:`
1856	`- type: v_measure`
1857	`value: 56.24267063669042`
1858	`- task:`
1859	`type: Clustering`
1860	`dataset:`
1861	`type: mteb/reddit-clustering-p2p`
1862	`name: MTEB RedditClusteringP2P`
1863	`config: default`
1864	`split: test`
1865	`revision: 282350215ef01743dc01b456c7f5241fa8937f16`
1866	`metrics:`
1867	`- type: v_measure`
1868	`value: 62.88056988932578`
1869	`- task:`
1870	`type: Retrieval`
1871	`dataset:`
1872	`type: scidocs`
1873	`name: MTEB SCIDOCS`
1874	`config: default`
1875	`split: test`
1876	`revision: None`
1877	`metrics:`
1878	`- type: map_at_1`
1879	`value: 4.903`
1880	`- type: map_at_10`
1881	`value: 13.202`
1882	`- type: map_at_100`
1883	`value: 15.5`
1884	`- type: map_at_1000`
1885	`value: 15.870999999999999`
1886	`- type: map_at_3`
1887	`value: 9.407`
1888	`- type: map_at_5`
1889	`value: 11.238`
1890	`- type: mrr_at_1`
1891	`value: 24.2`
1892	`- type: mrr_at_10`
1893	`value: 35.867`
1894	`- type: mrr_at_100`
1895	`value: 37.001`
1896	`- type: mrr_at_1000`
1897	`value: 37.043`
1898	`- type: mrr_at_3`
1899	`value: 32.5`
1900	`- type: mrr_at_5`
1901	`value: 34.35`
1902	`- type: ndcg_at_1`
1903	`value: 24.2`
1904	`- type: ndcg_at_10`
1905	`value: 21.731`
1906	`- type: ndcg_at_100`
1907	`value: 30.7`
1908	`- type: ndcg_at_1000`
1909	`value: 36.618`
1910	`- type: ndcg_at_3`
1911	`value: 20.72`
1912	`- type: ndcg_at_5`
1913	`value: 17.954`
1914	`- type: precision_at_1`
1915	`value: 24.2`
1916	`- type: precision_at_10`
1917	`value: 11.33`
1918	`- type: precision_at_100`
1919	`value: 2.4410000000000003`
1920	`- type: precision_at_1000`
1921	`value: 0.386`
1922	`- type: precision_at_3`
1923	`value: 19.667`
1924	`- type: precision_at_5`
1925	`value: 15.86`
1926	`- type: recall_at_1`
1927	`value: 4.903`
1928	`- type: recall_at_10`
1929	`value: 22.962`
1930	`- type: recall_at_100`
1931	`value: 49.563`
1932	`- type: recall_at_1000`
1933	`value: 78.238`
1934	`- type: recall_at_3`
1935	`value: 11.953`
1936	`- type: recall_at_5`
1937	`value: 16.067999999999998`
1938	`- task:`
1939	`type: STS`
1940	`dataset:`
1941	`type: mteb/sickr-sts`
1942	`name: MTEB SICK-R`
1943	`config: default`
1944	`split: test`
1945	`revision: a6ea5a8cab320b040a23452cc28066d9beae2cee`
1946	`metrics:`
1947	`- type: cos_sim_pearson`
1948	`value: 84.12694254604078`
1949	`- type: cos_sim_spearman`
1950	`value: 80.30141815181918`
1951	`- type: euclidean_pearson`
1952	`value: 81.34015449877128`
1953	`- type: euclidean_spearman`
1954	`value: 80.13984197010849`
1955	`- type: manhattan_pearson`
1956	`value: 81.31767068124086`
1957	`- type: manhattan_spearman`
1958	`value: 80.11720513114103`
1959	`- task:`
1960	`type: STS`
1961	`dataset:`
1962	`type: mteb/sts12-sts`
1963	`name: MTEB STS12`
1964	`config: default`
1965	`split: test`
1966	`revision: a0d554a64d88156834ff5ae9920b964011b16384`
1967	`metrics:`
1968	`- type: cos_sim_pearson`
1969	`value: 86.13112984010417`
1970	`- type: cos_sim_spearman`
1971	`value: 78.03063573402875`
1972	`- type: euclidean_pearson`
1973	`value: 83.51928418844804`
1974	`- type: euclidean_spearman`
1975	`value: 78.4045235411144`
1976	`- type: manhattan_pearson`
1977	`value: 83.49981637388689`
1978	`- type: manhattan_spearman`
1979	`value: 78.4042575139372`
1980	`- task:`
1981	`type: STS`
1982	`dataset:`
1983	`type: mteb/sts13-sts`
1984	`name: MTEB STS13`
1985	`config: default`
1986	`split: test`
1987	`revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca`
1988	`metrics:`
1989	`- type: cos_sim_pearson`
1990	`value: 82.50327987379504`
1991	`- type: cos_sim_spearman`
1992	`value: 84.18556767756205`
1993	`- type: euclidean_pearson`
1994	`value: 82.69684424327679`
1995	`- type: euclidean_spearman`
1996	`value: 83.5368106038335`
1997	`- type: manhattan_pearson`
1998	`value: 82.57967581007374`
1999	`- type: manhattan_spearman`
2000	`value: 83.43009053133697`
2001	`- task:`
2002	`type: STS`
2003	`dataset:`
2004	`type: mteb/sts14-sts`
2005	`name: MTEB STS14`
2006	`config: default`
2007	`split: test`
2008	`revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375`
2009	`metrics:`
2010	`- type: cos_sim_pearson`
2011	`value: 82.50756863007814`
2012	`- type: cos_sim_spearman`
2013	`value: 82.27204331279108`
2014	`- type: euclidean_pearson`
2015	`value: 81.39535251429741`
2016	`- type: euclidean_spearman`
2017	`value: 81.84386626336239`
2018	`- type: manhattan_pearson`
2019	`value: 81.34281737280695`
2020	`- type: manhattan_spearman`
2021	`value: 81.81149375673166`
2022	`- task:`
2023	`type: STS`
2024	`dataset:`
2025	`type: mteb/sts15-sts`
2026	`name: MTEB STS15`
2027	`config: default`
2028	`split: test`
2029	`revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3`
2030	`metrics:`
2031	`- type: cos_sim_pearson`
2032	`value: 86.8727714856726`
2033	`- type: cos_sim_spearman`
2034	`value: 87.95738287792312`
2035	`- type: euclidean_pearson`
2036	`value: 86.62920602795887`
2037	`- type: euclidean_spearman`
2038	`value: 87.05207355381243`
2039	`- type: manhattan_pearson`
2040	`value: 86.53587918472225`
2041	`- type: manhattan_spearman`
2042	`value: 86.95382961029586`
2043	`- task:`
2044	`type: STS`
2045	`dataset:`
2046	`type: mteb/sts16-sts`
2047	`name: MTEB STS16`
2048	`config: default`
2049	`split: test`
2050	`revision: 4d8694f8f0e0100860b497b999b3dbed754a0513`
2051	`metrics:`
2052	`- type: cos_sim_pearson`
2053	`value: 83.52240359769479`
2054	`- type: cos_sim_spearman`
2055	`value: 85.47685776238286`
2056	`- type: euclidean_pearson`
2057	`value: 84.25815333483058`
2058	`- type: euclidean_spearman`
2059	`value: 85.27415639683198`
2060	`- type: manhattan_pearson`
2061	`value: 84.29127757025637`
2062	`- type: manhattan_spearman`
2063	`value: 85.30226224917351`
2064	`- task:`
2065	`type: STS`
2066	`dataset:`
2067	`type: mteb/sts17-crosslingual-sts`
2068	`name: MTEB STS17 (en-en)`
2069	`config: en-en`
2070	`split: test`
2071	`revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d`
2072	`metrics:`
2073	`- type: cos_sim_pearson`
2074	`value: 86.42501708915708`
2075	`- type: cos_sim_spearman`
2076	`value: 86.42276182795041`
2077	`- type: euclidean_pearson`
2078	`value: 86.5408207354761`
2079	`- type: euclidean_spearman`
2080	`value: 85.46096321750838`
2081	`- type: manhattan_pearson`
2082	`value: 86.54177303026881`
2083	`- type: manhattan_spearman`
2084	`value: 85.50313151916117`
2085	`- task:`
2086	`type: STS`
2087	`dataset:`
2088	`type: mteb/sts22-crosslingual-sts`
2089	`name: MTEB STS22 (en)`
2090	`config: en`
2091	`split: test`
2092	`revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80`
2093	`metrics:`
2094	`- type: cos_sim_pearson`
2095	`value: 64.86521089250766`
2096	`- type: cos_sim_spearman`
2097	`value: 65.94868540323003`
2098	`- type: euclidean_pearson`
2099	`value: 67.16569626533084`
2100	`- type: euclidean_spearman`
2101	`value: 66.37667004134917`
2102	`- type: manhattan_pearson`
2103	`value: 67.1482365102333`
2104	`- type: manhattan_spearman`
2105	`value: 66.53240122580029`
2106	`- task:`
2107	`type: STS`
2108	`dataset:`
2109	`type: mteb/stsbenchmark-sts`
2110	`name: MTEB STSBenchmark`
2111	`config: default`
2112	`split: test`
2113	`revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831`
2114	`metrics:`
2115	`- type: cos_sim_pearson`
2116	`value: 84.64746265365318`
2117	`- type: cos_sim_spearman`
2118	`value: 86.41888825906786`
2119	`- type: euclidean_pearson`
2120	`value: 85.27453642725811`
2121	`- type: euclidean_spearman`
2122	`value: 85.94095796602544`
2123	`- type: manhattan_pearson`
2124	`value: 85.28643660505334`
2125	`- type: manhattan_spearman`
2126	`value: 85.95028003260744`
2127	`- task:`
2128	`type: Reranking`
2129	`dataset:`
2130	`type: mteb/scidocs-reranking`
2131	`name: MTEB SciDocsRR`
2132	`config: default`
2133	`split: test`
2134	`revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab`
2135	`metrics:`
2136	`- type: map`
2137	`value: 87.48903153618527`
2138	`- type: mrr`
2139	`value: 96.41081503826601`
2140	`- task:`
2141	`type: Retrieval`
2142	`dataset:`
2143	`type: scifact`
2144	`name: MTEB SciFact`
2145	`config: default`
2146	`split: test`
2147	`revision: None`
2148	`metrics:`
2149	`- type: map_at_1`
2150	`value: 58.594`
2151	`- type: map_at_10`
2152	`value: 69.296`
2153	`- type: map_at_100`
2154	`value: 69.782`
2155	`- type: map_at_1000`
2156	`value: 69.795`
2157	`- type: map_at_3`
2158	`value: 66.23`
2159	`- type: map_at_5`
2160	`value: 68.293`
2161	`- type: mrr_at_1`
2162	`value: 61.667`
2163	`- type: mrr_at_10`
2164	`value: 70.339`
2165	`- type: mrr_at_100`
2166	`value: 70.708`
2167	`- type: mrr_at_1000`
2168	`value: 70.722`
2169	`- type: mrr_at_3`
2170	`value: 68.0`
2171	`- type: mrr_at_5`
2172	`value: 69.56700000000001`
2173	`- type: ndcg_at_1`
2174	`value: 61.667`
2175	`- type: ndcg_at_10`
2176	`value: 74.039`
2177	`- type: ndcg_at_100`
2178	`value: 76.103`
2179	`- type: ndcg_at_1000`
2180	`value: 76.47800000000001`
2181	`- type: ndcg_at_3`
2182	`value: 68.967`
2183	`- type: ndcg_at_5`
2184	`value: 71.96900000000001`
2185	`- type: precision_at_1`
2186	`value: 61.667`
2187	`- type: precision_at_10`
2188	`value: 9.866999999999999`
2189	`- type: precision_at_100`
2190	`value: 1.097`
2191	`- type: precision_at_1000`
2192	`value: 0.11299999999999999`
2193	`- type: precision_at_3`
2194	`value: 27.111`
2195	`- type: precision_at_5`
2196	`value: 18.2`
2197	`- type: recall_at_1`
2198	`value: 58.594`
2199	`- type: recall_at_10`
2200	`value: 87.422`
2201	`- type: recall_at_100`
2202	`value: 96.667`
2203	`- type: recall_at_1000`
2204	`value: 99.667`
2205	`- type: recall_at_3`
2206	`value: 74.217`
2207	`- type: recall_at_5`
2208	`value: 81.539`
2209	`- task:`
2210	`type: PairClassification`
2211	`dataset:`
2212	`type: mteb/sprintduplicatequestions-pairclassification`
2213	`name: MTEB SprintDuplicateQuestions`
2214	`config: default`
2215	`split: test`
2216	`revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46`
2217	`metrics:`
2218	`- type: cos_sim_accuracy`
2219	`value: 99.85049504950496`
2220	`- type: cos_sim_ap`
2221	`value: 96.33111544137081`
2222	`- type: cos_sim_f1`
2223	`value: 92.35443037974684`
2224	`- type: cos_sim_precision`
2225	`value: 93.53846153846153`
2226	`- type: cos_sim_recall`
2227	`value: 91.2`
2228	`- type: dot_accuracy`
2229	`value: 99.82376237623762`
2230	`- type: dot_ap`
2231	`value: 95.38082527310888`
2232	`- type: dot_f1`
2233	`value: 90.90909090909092`
2234	`- type: dot_precision`
2235	`value: 92.90187891440502`
2236	`- type: dot_recall`
2237	`value: 89.0`
2238	`- type: euclidean_accuracy`
2239	`value: 99.84851485148515`
2240	`- type: euclidean_ap`
2241	`value: 96.32316003996347`
2242	`- type: euclidean_f1`
2243	`value: 92.2071392659628`
2244	`- type: euclidean_precision`
2245	`value: 92.71991911021233`
2246	`- type: euclidean_recall`
2247	`value: 91.7`
2248	`- type: manhattan_accuracy`
2249	`value: 99.84851485148515`
2250	`- type: manhattan_ap`
2251	`value: 96.3655668249217`
2252	`- type: manhattan_f1`
2253	`value: 92.18356026222895`
2254	`- type: manhattan_precision`
2255	`value: 92.98067141403867`
2256	`- type: manhattan_recall`
2257	`value: 91.4`
2258	`- type: max_accuracy`
2259	`value: 99.85049504950496`
2260	`- type: max_ap`
2261	`value: 96.3655668249217`
2262	`- type: max_f1`
2263	`value: 92.35443037974684`
2264	`- task:`
2265	`type: Clustering`
2266	`dataset:`
2267	`type: mteb/stackexchange-clustering`
2268	`name: MTEB StackExchangeClustering`
2269	`config: default`
2270	`split: test`
2271	`revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259`
2272	`metrics:`
2273	`- type: v_measure`
2274	`value: 65.94861371629051`
2275	`- task:`
2276	`type: Clustering`
2277	`dataset:`
2278	`type: mteb/stackexchange-clustering-p2p`
2279	`name: MTEB StackExchangeClusteringP2P`
2280	`config: default`
2281	`split: test`
2282	`revision: 815ca46b2622cec33ccafc3735d572c266efdb44`
2283	`metrics:`
2284	`- type: v_measure`
2285	`value: 35.009430451385`
2286	`- task:`
2287	`type: Reranking`
2288	`dataset:`
2289	`type: mteb/stackoverflowdupquestions-reranking`
2290	`name: MTEB StackOverflowDupQuestions`
2291	`config: default`
2292	`split: test`
2293	`revision: e185fbe320c72810689fc5848eb6114e1ef5ec69`
2294	`metrics:`
2295	`- type: map`
2296	`value: 54.61164066427969`
2297	`- type: mrr`
2298	`value: 55.49710603938544`
2299	`- task:`
2300	`type: Summarization`
2301	`dataset:`
2302	`type: mteb/summeval`
2303	`name: MTEB SummEval`
2304	`config: default`
2305	`split: test`
2306	`revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c`
2307	`metrics:`
2308	`- type: cos_sim_pearson`
2309	`value: 30.622620124907662`
2310	`- type: cos_sim_spearman`
2311	`value: 31.0678351356163`
2312	`- type: dot_pearson`
2313	`value: 30.863727693306814`
2314	`- type: dot_spearman`
2315	`value: 31.230306567021255`
2316	`- task:`
2317	`type: Retrieval`
2318	`dataset:`
2319	`type: trec-covid`
2320	`name: MTEB TRECCOVID`
2321	`config: default`
2322	`split: test`
2323	`revision: None`
2324	`metrics:`
2325	`- type: map_at_1`
2326	`value: 0.22`
2327	`- type: map_at_10`
2328	`value: 2.011`
2329	`- type: map_at_100`
2330	`value: 10.974`
2331	`- type: map_at_1000`
2332	`value: 25.819`
2333	`- type: map_at_3`
2334	`value: 0.6649999999999999`
2335	`- type: map_at_5`
2336	`value: 1.076`
2337	`- type: mrr_at_1`
2338	`value: 86.0`
2339	`- type: mrr_at_10`
2340	`value: 91.8`
2341	`- type: mrr_at_100`
2342	`value: 91.8`
2343	`- type: mrr_at_1000`
2344	`value: 91.8`
2345	`- type: mrr_at_3`
2346	`value: 91.0`
2347	`- type: mrr_at_5`
2348	`value: 91.8`
2349	`- type: ndcg_at_1`
2350	`value: 82.0`
2351	`- type: ndcg_at_10`
2352	`value: 78.07300000000001`
2353	`- type: ndcg_at_100`
2354	`value: 58.231`
2355	`- type: ndcg_at_1000`
2356	`value: 51.153000000000006`
2357	`- type: ndcg_at_3`
2358	`value: 81.123`
2359	`- type: ndcg_at_5`
2360	`value: 81.059`
2361	`- type: precision_at_1`
2362	`value: 86.0`
2363	`- type: precision_at_10`
2364	`value: 83.0`
2365	`- type: precision_at_100`
2366	`value: 59.38`
2367	`- type: precision_at_1000`
2368	`value: 22.55`
2369	`- type: precision_at_3`
2370	`value: 87.333`
2371	`- type: precision_at_5`
2372	`value: 86.8`
2373	`- type: recall_at_1`
2374	`value: 0.22`
2375	`- type: recall_at_10`
2376	`value: 2.2079999999999997`
2377	`- type: recall_at_100`
2378	`value: 14.069`
2379	`- type: recall_at_1000`
2380	`value: 47.678`
2381	`- type: recall_at_3`
2382	`value: 0.7040000000000001`
2383	`- type: recall_at_5`
2384	`value: 1.161`
2385	`- task:`
2386	`type: Retrieval`
2387	`dataset:`
2388	`type: webis-touche2020`
2389	`name: MTEB Touche2020`
2390	`config: default`
2391	`split: test`
2392	`revision: None`
2393	`metrics:`
2394	`- type: map_at_1`
2395	`value: 2.809`
2396	`- type: map_at_10`
2397	`value: 10.394`
2398	`- type: map_at_100`
2399	`value: 16.598`
2400	`- type: map_at_1000`
2401	`value: 18.142`
2402	`- type: map_at_3`
2403	`value: 5.572`
2404	`- type: map_at_5`
2405	`value: 7.1370000000000005`
2406	`- type: mrr_at_1`
2407	`value: 32.653`
2408	`- type: mrr_at_10`
2409	`value: 46.564`
2410	`- type: mrr_at_100`
2411	`value: 47.469`
2412	`- type: mrr_at_1000`
2413	`value: 47.469`
2414	`- type: mrr_at_3`
2415	`value: 42.177`
2416	`- type: mrr_at_5`
2417	`value: 44.524`
2418	`- type: ndcg_at_1`
2419	`value: 30.612000000000002`
2420	`- type: ndcg_at_10`
2421	`value: 25.701`
2422	`- type: ndcg_at_100`
2423	`value: 37.532`
2424	`- type: ndcg_at_1000`
2425	`value: 48.757`
2426	`- type: ndcg_at_3`
2427	`value: 28.199999999999996`
2428	`- type: ndcg_at_5`
2429	`value: 25.987`
2430	`- type: precision_at_1`
2431	`value: 32.653`
2432	`- type: precision_at_10`
2433	`value: 23.469`
2434	`- type: precision_at_100`
2435	`value: 7.9799999999999995`
2436	`- type: precision_at_1000`
2437	`value: 1.5350000000000001`
2438	`- type: precision_at_3`
2439	`value: 29.932`
2440	`- type: precision_at_5`
2441	`value: 26.122`
2442	`- type: recall_at_1`
2443	`value: 2.809`
2444	`- type: recall_at_10`
2445	`value: 16.887`
2446	`- type: recall_at_100`
2447	`value: 48.67`
2448	`- type: recall_at_1000`
2449	`value: 82.89699999999999`
2450	`- type: recall_at_3`
2451	`value: 6.521000000000001`
2452	`- type: recall_at_5`
2453	`value: 9.609`
2454	`- task:`
2455	`type: Classification`
2456	`dataset:`
2457	`type: mteb/toxic_conversations_50k`
2458	`name: MTEB ToxicConversationsClassification`
2459	`config: default`
2460	`split: test`
2461	`revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c`
2462	`metrics:`
2463	`- type: accuracy`
2464	`value: 71.57860000000001`
2465	`- type: ap`
2466	`value: 13.82629211536393`
2467	`- type: f1`
2468	`value: 54.59860966183956`
2469	`- task:`
2470	`type: Classification`
2471	`dataset:`
2472	`type: mteb/tweet_sentiment_extraction`
2473	`name: MTEB TweetSentimentExtractionClassification`
2474	`config: default`
2475	`split: test`
2476	`revision: d604517c81ca91fe16a244d1248fc021f9ecee7a`
2477	`metrics:`
2478	`- type: accuracy`
2479	`value: 59.38030560271647`
2480	`- type: f1`
2481	`value: 59.69685552567865`
2482	`- task:`
2483	`type: Clustering`
2484	`dataset:`
2485	`type: mteb/twentynewsgroups-clustering`
2486	`name: MTEB TwentyNewsgroupsClustering`
2487	`config: default`
2488	`split: test`
2489	`revision: 6125ec4e24fa026cec8a478383ee943acfbd5449`
2490	`metrics:`
2491	`- type: v_measure`
2492	`value: 51.4736717043405`
2493	`- task:`
2494	`type: PairClassification`
2495	`dataset:`
2496	`type: mteb/twittersemeval2015-pairclassification`
2497	`name: MTEB TwitterSemEval2015`
2498	`config: default`
2499	`split: test`
2500	`revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1`
2501	`metrics:`
2502	`- type: cos_sim_accuracy`
2503	`value: 86.92853311080646`
2504	`- type: cos_sim_ap`
2505	`value: 77.67872502591382`
2506	`- type: cos_sim_f1`
2507	`value: 70.33941236068895`
2508	`- type: cos_sim_precision`
2509	`value: 67.63273258645884`
2510	`- type: cos_sim_recall`
2511	`value: 73.27176781002639`
2512	`- type: dot_accuracy`
2513	`value: 85.79603027954938`
2514	`- type: dot_ap`
2515	`value: 73.73786190233379`
2516	`- type: dot_f1`
2517	`value: 67.3437901774235`
2518	`- type: dot_precision`
2519	`value: 65.67201604814443`
2520	`- type: dot_recall`
2521	`value: 69.10290237467018`
2522	`- type: euclidean_accuracy`
2523	`value: 86.94045419324074`
2524	`- type: euclidean_ap`
2525	`value: 77.6687791535167`
2526	`- type: euclidean_f1`
2527	`value: 70.47209214023542`
2528	`- type: euclidean_precision`
2529	`value: 67.7207492094381`
2530	`- type: euclidean_recall`
2531	`value: 73.45646437994723`
2532	`- type: manhattan_accuracy`
2533	`value: 86.87488823985218`
2534	`- type: manhattan_ap`
2535	`value: 77.63373392430728`
2536	`- type: manhattan_f1`
2537	`value: 70.40920716112532`
2538	`- type: manhattan_precision`
2539	`value: 68.31265508684864`
2540	`- type: manhattan_recall`
2541	`value: 72.63852242744063`
2542	`- type: max_accuracy`
2543	`value: 86.94045419324074`
2544	`- type: max_ap`
2545	`value: 77.67872502591382`
2546	`- type: max_f1`
2547	`value: 70.47209214023542`
2548	`- task:`
2549	`type: PairClassification`
2550	`dataset:`
2551	`type: mteb/twitterurlcorpus-pairclassification`
2552	`name: MTEB TwitterURLCorpus`
2553	`config: default`
2554	`split: test`
2555	`revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf`
2556	`metrics:`
2557	`- type: cos_sim_accuracy`
2558	`value: 88.67155664221679`
2559	`- type: cos_sim_ap`
2560	`value: 85.64591703003417`
2561	`- type: cos_sim_f1`
2562	`value: 77.59531005352656`
2563	`- type: cos_sim_precision`
2564	`value: 73.60967184801382`
2565	`- type: cos_sim_recall`
2566	`value: 82.03726516784724`
2567	`- type: dot_accuracy`
2568	`value: 88.41541506578181`
2569	`- type: dot_ap`
2570	`value: 84.6482788957769`
2571	`- type: dot_f1`
2572	`value: 77.04748541466657`
2573	`- type: dot_precision`
2574	`value: 74.02440754931176`
2575	`- type: dot_recall`
2576	`value: 80.3279950723745`
2577	`- type: euclidean_accuracy`
2578	`value: 88.63080684596576`
2579	`- type: euclidean_ap`
2580	`value: 85.44570045321562`
2581	`- type: euclidean_f1`
2582	`value: 77.28769403336106`
2583	`- type: euclidean_precision`
2584	`value: 72.90600040958427`
2585	`- type: euclidean_recall`
2586	`value: 82.22975053895904`
2587	`- type: manhattan_accuracy`
2588	`value: 88.59393798269105`
2589	`- type: manhattan_ap`
2590	`value: 85.40271361038187`
2591	`- type: manhattan_f1`
2592	`value: 77.17606419344392`
2593	`- type: manhattan_precision`
2594	`value: 72.4447747078295`
2595	`- type: manhattan_recall`
2596	`value: 82.5685247921158`
2597	`- type: max_accuracy`
2598	`value: 88.67155664221679`
2599	`- type: max_ap`
2600	`value: 85.64591703003417`
2601	`- type: max_f1`
2602	`value: 77.59531005352656`
2603	`license: mit`
2604	`language:`
2605	`- en`
2606	`---`
2607
2608
2609	`<h1 align="center">FlagEmbedding</h1>`
2610
2611
2612	`<h4 align="center">`
2613	`<p>`
2614	`<a href=#model-list>Model List</a> \|`
2615	`<a href=#frequently-asked-questions>FAQ</a> \|`
2616	`<a href=#usage>Usage</a> \|`
2617	`<a href="#evaluation">Evaluation</a> \|`
2618	`<a href="#train">Train</a> \|`
2619	`<a href="#contact">Contact</a> \|`
2620	`<a href="#citation">Citation</a> \|`
2621	`<a href="#license">License</a>`
2622	`<p>`
2623	`</h4>`
2624
2625
2626	`For more details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).`
2627
2628	`If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using [bge-m3](https://huggingface.co/BAAI/bge-m3).`
2629
2630
2631	`[English](README.md) \| [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)`
2632
2633	`FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:`
2634
2635	`- Long-Context LLM: [Activation Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon)`
2636	`- Fine-tuning of LM : [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)`
2637	`- Dense Retrieval: [BGE-M3](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3), [LLM Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), [BGE Embedding](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding)`
2638	`- Reranker Model: [BGE Reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)`
2639	`- Benchmark: [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)`
2640
2641	`## News`
2642	`- 1/30/2024: Release BGE-M3, a new member to BGE model series! M3 stands for Multi-linguality (100+ languages), Multi-granularities (input length up to 8192), Multi-Functionality (unification of dense, lexical, multi-vec/colbert retrieval).`
2643	`It is the first embedding model which supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks.`
2644	`[Technical Report](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/BGE_M3/BGE_M3.pdf) and [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3). :fire:`
2645	`- 1/9/2024: Release [Activation-Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon), an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. [Technical Report](https://arxiv.org/abs/2401.03462) :fire:`
2646	`- 12/24/2023: Release LLaRA, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. [Technical Report](https://arxiv.org/abs/2312.15503) :fire:`
2647	`- 11/23/2023: Release [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail), a method to maintain general capabilities during fine-tuning by merging multiple language models. [Technical Report](https://arxiv.org/abs/2311.13534) :fire:`
2648	`- 10/12/2023: Release [LLM-Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Technical Report](https://arxiv.org/pdf/2310.07554.pdf)`
2649	`- 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) and [massive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released`
2650	`- 09/12/2023: New models:`
2651	- New reranker model: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
2652	- update embedding model: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
2653
2654
2655	`<details>`
2656	`<summary>More</summary>`
2657	`<!-- ### More -->`
2658
2659	`- 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.`
2660	`- 08/09/2023: BGE Models are integrated into Langchain, you can use it like [this](#using-langchain); C-MTEB leaderboard is [available](https://huggingface.co/spaces/mteb/leaderboard).`
2661	`- 08/05/2023: Release base-scale and small-scale models, best performance among the models of the same size 🤗`
2662	- 08/02/2023: Release `bge-large-`(short for BAAI General Embedding) Models, rank 1st on MTEB and C-MTEB benchmark!* :tada: :tada:
2663	`- 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (C-MTEB), consisting of 31 test dataset.`
2664
2665	`</details>`
2666
2667
2668	`## Model List`
2669
2670	`bge` is short for `BAAI general embedding`.
2671
2672	`\| Model \| Language \| \| Description \| query instruction for retrieval [1] \|`
2673	`\|:-------------------------------\|:--------:\| :--------:\| :--------:\|:--------:\|`
2674	`\| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) \| Multilingual \| [Inference](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3#usage) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3) \| Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) \| \|`
2675	`\| [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) \| English \| [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) \| a unified embedding model to support diverse retrieval augmentation needs for LLMs \| See [README](./FlagEmbedding/llm_embedder/README.md) \|`
2676	`\| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) \| Chinese and English \| [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) \| a cross-encoder model which is more accurate but less efficient [2] \| \|`
2677	`\| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) \| Chinese and English \| [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) \| a cross-encoder model which is more accurate but less efficient [2] \| \|`
2678	\| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2679	\| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2680	\| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2681	\| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2682	\| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2683	\| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2684	\| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| :trophy: rank 1st in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard \| `Represent this sentence for searching relevant passages: ` \|
2685	\| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a base-scale model but with similar ability to `bge-large-en` \| `Represent this sentence for searching relevant passages: ` \|
2686	\| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \|a small-scale model but with competitive performance \| `Represent this sentence for searching relevant passages: ` \|
2687	\| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| :trophy: rank 1st in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark \| `为这个句子生成表示以用于检索相关文章：` \|
2688	\| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a base-scale model but with similar ability to `bge-large-zh` \| `为这个句子生成表示以用于检索相关文章：` \|
2689	\| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a small-scale model but with competitive performance \| `为这个句子生成表示以用于检索相关文章：` \|
2690
2691
2692	`[1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, no instruction needs to be added to passages.`
2693
2694	`[2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.`
2695	`For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.`
2696
2697	`All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.`
2698	`If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .`
2699
2700
2701	`## Frequently asked questions`
2702
2703	`<details>`
2704	`<summary>1. How to fine-tune bge embedding model?</summary>`
2705
2706	`<!-- ### How to fine-tune bge embedding model? -->`
2707	`Following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to prepare data and fine-tune your model.`
2708	`Some suggestions:`
2709	`- Mine hard negatives following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives), which can improve the retrieval performance.`
2710	`- If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.`
2711	`- If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.`
2712
2713
2714	`</details>`
2715
2716	`<details>`
2717	`<summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary>`
2718
2719	`<!-- ### The similarity score between two dissimilar sentences is higher than 0.5 -->`
2720	`Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.`
2721
2722	`Since we finetune the models by contrastive learning with a temperature of 0.01,`
2723	`the similarity distribution of the current BGE model is about in the interval \[0.6, 1\].`
2724	`So a similarity score greater than 0.5 does not indicate that the two sentences are similar.`
2725
2726	`For downstream tasks, such as passage retrieval or semantic similarity,`
2727	`what matters is the relative order of the scores, not the absolute value.`
2728	`If you need to filter similar sentences based on a similarity threshold,`
2729	`please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).`
2730
2731	`</details>`
2732
2733	`<details>`
2734	`<summary>3. When does the query instruction need to be used</summary>`
2735
2736	`<!-- ### When does the query instruction need to be used -->`
2737
2738	For the `bge-*-v1.5`, we improve its retrieval ability when not using instruction.
2739	`No instruction only has a slight degradation in retrieval performance compared with using instruction.`
2740	`So you can generate embedding without instruction in all cases for convenience.`
2741
2742	`For a retrieval task that uses short queries to find long related documents,`
2743	`it is recommended to add instructions for these short queries.`
2744	`The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.`
2745	`In all cases, the documents/passages do not need to add the instruction.`
2746
2747	`</details>`
2748
2749
2750	`## Usage`
2751
2752	`### Usage for Embedding Model`
2753
2754	Here are some examples for using `bge` models with
2755	`[FlagEmbedding](#using-flagembedding), [Sentence-Transformers](#using-sentence-transformers), [Langchain](#using-langchain), or [Huggingface Transformers](#using-huggingface-transformers).`
2756
2757	`#### Using FlagEmbedding`
2758	```
2759	`pip install -U FlagEmbedding`
2760	```
2761	`If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding.`
2762
2763	```python
2764	`from FlagEmbedding import FlagModel`
2765	`sentences_1 = ["样例数据-1", "样例数据-2"]`
2766	`sentences_2 = ["样例数据-3", "样例数据-4"]`
2767	`model = FlagModel('BAAI/bge-large-zh-v1.5',`
2768	`query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章：",`
2769	`use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation`
2770	`embeddings_1 = model.encode(sentences_1)`
2771	`embeddings_2 = model.encode(sentences_2)`
2772	`similarity = embeddings_1 @ embeddings_2.T`
2773	`print(similarity)`
2774
2775	`# for s2p(short query to long passage) retrieval task, suggest to use encode_queries() which will automatically add the instruction to each query`
2776	`# corpus in retrieval task can still use encode() or encode_corpus(), since they don't need instruction`
2777	`queries = ['query_1', 'query_2']`
2778	`passages = ["样例文档-1", "样例文档-2"]`
2779	`q_embeddings = model.encode_queries(queries)`
2780	`p_embeddings = model.encode(passages)`
2781	`scores = q_embeddings @ p_embeddings.T`
2782	```
2783	For the value of the argument `query_instruction_for_retrieval`, see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list).
2784
2785	By default, FlagModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
2786	You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.
2787
2788
2789	`#### Using Sentence-Transformers`
2790
2791	You can also use the `bge` models with [sentence-transformers](https://www.SBERT.net):
2792
2793	```
2794	`pip install -U sentence-transformers`
2795	```
2796	```python
2797	`from sentence_transformers import SentenceTransformer`
2798	`sentences_1 = ["样例数据-1", "样例数据-2"]`
2799	`sentences_2 = ["样例数据-3", "样例数据-4"]`
2800	`model = SentenceTransformer('BAAI/bge-large-zh-v1.5')`
2801	`embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)`
2802	`embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)`
2803	`similarity = embeddings_1 @ embeddings_2.T`
2804	`print(similarity)`
2805	```
2806	`For s2p(short query to long passage) retrieval task,`
2807	`each short query should start with an instruction (instructions see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list)).`
2808	`But the instruction is not needed for passages.`
2809	```python
2810	`from sentence_transformers import SentenceTransformer`
2811	`queries = ['query_1', 'query_2']`
2812	`passages = ["样例文档-1", "样例文档-2"]`
2813	`instruction = "为这个句子生成表示以用于检索相关文章："`
2814
2815	`model = SentenceTransformer('BAAI/bge-large-zh-v1.5')`
2816	`q_embeddings = model.encode([instruction+q for q in queries], normalize_embeddings=True)`
2817	`p_embeddings = model.encode(passages, normalize_embeddings=True)`
2818	`scores = q_embeddings @ p_embeddings.T`
2819	```
2820
2821	`#### Using Langchain`
2822
2823	You can use `bge` in langchain like this:
2824	```python
2825	`from langchain.embeddings import HuggingFaceBgeEmbeddings`
2826	`model_name = "BAAI/bge-large-en-v1.5"`
2827	`model_kwargs = {'device': 'cuda'}`
2828	`encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity`
2829	`model = HuggingFaceBgeEmbeddings(`
2830	`model_name=model_name,`
2831	`model_kwargs=model_kwargs,`
2832	`encode_kwargs=encode_kwargs,`
2833	`query_instruction="为这个句子生成表示以用于检索相关文章："`
2834	`)`
2835	`model.query_instruction = "为这个句子生成表示以用于检索相关文章："`
2836	```
2837
2838
2839	`#### Using HuggingFace Transformers`
2840
2841	`With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.`
2842
2843	```python
2844	`from transformers import AutoTokenizer, AutoModel`
2845	`import torch`
2846	`# Sentences we want sentence embeddings for`
2847	`sentences = ["样例数据-1", "样例数据-2"]`
2848
2849	`# Load model from HuggingFace Hub`
2850	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')`
2851	`model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')`
2852	`model.eval()`
2853
2854	`# Tokenize sentences`
2855	`encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')`
2856	`# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)`
2857	`# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')`
2858
2859	`# Compute token embeddings`
2860	`with torch.no_grad():`
2861	`model_output = model(**encoded_input)`
2862	`# Perform pooling. In this case, cls pooling.`
2863	`sentence_embeddings = model_output[0][:, 0]`
2864	`# normalize embeddings`
2865	`sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)`
2866	`print("Sentence embeddings:", sentence_embeddings)`
2867	```
2868
2869
2870	`#### Usage of the ONNX files`
2871
2872	```python
2873	`from optimum.onnxruntime import ORTModelForFeatureExtraction # type: ignore`
2874
2875	`import torch`
2876	`from transformers import AutoModel, AutoTokenizer`
2877
2878	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-en-v1.5')`
2879	`model = AutoModel.from_pretrained('BAAI/bge-large-en-v1.5', revision="refs/pr/13")`
2880	`model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-large-en-v1.5', revision="refs/pr/13",file_name="onnx/model.onnx")`
2881
2882	`# Sentences we want sentence embeddings for`
2883	`sentences = ["样例数据-1", "样例数据-2"]`
2884
2885	`# Tokenize sentences`
2886	`encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')`
2887	`# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)`
2888	`# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')`
2889
2890	`model_output_ort = model_ort(**encoded_input)`
2891	`# Compute token embeddings`
2892	`with torch.no_grad():`
2893	`model_output = model(**encoded_input)`
2894
2895	`# model_output and model_output_ort are identical`
2896
2897	```
2898
2899	`#### Usage via infinity`
2900	`Its also possible to deploy the onnx files with the [infinity_emb](https://github.com/michaelfeil/infinity) pip package.`
2901	```python
2902	`import asyncio`
2903	`from infinity_emb import AsyncEmbeddingEngine, EngineArgs`
2904
2905	`sentences = ["Embed this is sentence via Infinity.", "Paris is in France."]`
2906	`engine = AsyncEmbeddingEngine.from_args(`
2907	`EngineArgs(model_name_or_path = "BAAI/bge-large-en-v1.5", device="cpu", engine="optimum" # or engine="torch"`
2908	`))`
2909
2910	`async def main():`
2911	`async with engine:`
2912	`embeddings, usage = await engine.embed(sentences=sentences)`
2913	`asyncio.run(main())`
2914	```
2915
2916	`### Usage for Reranker`
2917
2918	`Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.`
2919	`You can get a relevance score by inputting query and passage to the reranker.`
2920	`The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.`
2921
2922
2923	`#### Using FlagEmbedding`
2924	```
2925	`pip install -U FlagEmbedding`
2926	```
2927
2928	`Get relevance scores (higher scores indicate more relevance):`
2929	```python
2930	`from FlagEmbedding import FlagReranker`
2931	`reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation`
2932
2933	`score = reranker.compute_score(['query', 'passage'])`
2934	`print(score)`
2935
2936	`scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])`
2937	`print(scores)`
2938	```
2939
2940
2941	`#### Using Huggingface transformers`
2942
2943	```python
2944	`import torch`
2945	`from transformers import AutoModelForSequenceClassification, AutoTokenizer`
2946
2947	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')`
2948	`model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')`
2949	`model.eval()`
2950
2951	`pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]`
2952	`with torch.no_grad():`
2953	`inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)`
2954	`scores = model(**inputs, return_dict=True).logits.view(-1, ).float()`
2955	`print(scores)`
2956	```
2957
2958	`## Evaluation`
2959
2960	`baai-general-embedding` models achieve state-of-the-art performance on both MTEB and C-MTEB leaderboard!
2961	`For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md).`
2962
2963	`- MTEB:`
2964
2965	`\| Model Name \| Dimension \| Sequence Length \| Average (56) \| Retrieval (15) \|Clustering (11) \| Pair Classification (3) \| Reranking (4) \| STS (10) \| Summarization (1) \| Classification (12) \|`
2966	`\|:----:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|`
2967	`\| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) \| 1024 \| 512 \| 64.23 \| 54.29 \| 46.08 \| 87.12 \| 60.03 \| 83.11 \| 31.61 \| 75.97 \|`
2968	`\| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) \| 768 \| 512 \| 63.55 \| 53.25 \| 45.77 \| 86.55 \| 58.86 \| 82.4 \| 31.07 \| 75.53 \|`
2969	`\| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) \| 384 \| 512 \| 62.17 \|51.68 \| 43.82 \| 84.92 \| 58.36 \| 81.59 \| 30.12 \| 74.14 \|`
2970	`\| [bge-large-en](https://huggingface.co/BAAI/bge-large-en) \| 1024 \| 512 \| 63.98 \| 53.9 \| 46.98 \| 85.8 \| 59.48 \| 81.56 \| 32.06 \| 76.21 \|`
2971	`\| [bge-base-en](https://huggingface.co/BAAI/bge-base-en) \| 768 \| 512 \| 63.36 \| 53.0 \| 46.32 \| 85.86 \| 58.7 \| 81.84 \| 29.27 \| 75.27 \|`
2972	`\| [gte-large](https://huggingface.co/thenlper/gte-large) \| 1024 \| 512 \| 63.13 \| 52.22 \| 46.84 \| 85.00 \| 59.13 \| 83.35 \| 31.66 \| 73.33 \|`
2973	`\| [gte-base](https://huggingface.co/thenlper/gte-base) \| 768 \| 512 \| 62.39 \| 51.14 \| 46.2 \| 84.57 \| 58.61 \| 82.3 \| 31.17 \| 73.01 \|`
2974	`\| [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) \| 1024\| 512 \| 62.25 \| 50.56 \| 44.49 \| 86.03 \| 56.61 \| 82.05 \| 30.19 \| 75.24 \|`
2975	`\| [bge-small-en](https://huggingface.co/BAAI/bge-small-en) \| 384 \| 512 \| 62.11 \| 51.82 \| 44.31 \| 83.78 \| 57.97 \| 80.72 \| 30.53 \| 74.37 \|`
2976	`\| [instructor-xl](https://huggingface.co/hkunlp/instructor-xl) \| 768 \| 512 \| 61.79 \| 49.26 \| 44.74 \| 86.62 \| 57.29 \| 83.06 \| 32.32 \| 61.79 \|`
2977	`\| [e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) \| 768 \| 512 \| 61.5 \| 50.29 \| 43.80 \| 85.73 \| 55.91 \| 81.05 \| 30.28 \| 73.84 \|`
2978	`\| [gte-small](https://huggingface.co/thenlper/gte-small) \| 384 \| 512 \| 61.36 \| 49.46 \| 44.89 \| 83.54 \| 57.7 \| 82.07 \| 30.42 \| 72.31 \|`
2979	`\| [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings) \| 1536 \| 8192 \| 60.99 \| 49.25 \| 45.9 \| 84.89 \| 56.32 \| 80.97 \| 30.8 \| 70.93 \|`
2980	`\| [e5-small-v2](https://huggingface.co/intfloat/e5-base-v2) \| 384 \| 512 \| 59.93 \| 49.04 \| 39.92 \| 84.67 \| 54.32 \| 80.39 \| 31.16 \| 72.94 \|`
2981	`\| [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) \| 768 \| 512 \| 59.51 \| 42.24 \| 43.72 \| 85.06 \| 56.42 \| 82.63 \| 30.08 \| 73.42 \|`
2982	`\| [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) \| 768 \| 514 \| 57.78 \| 43.81 \| 43.69 \| 83.04 \| 59.36 \| 80.28 \| 27.49 \| 65.07 \|`
2983	`\| [sgpt-bloom-7b1-msmarco](https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco) \| 4096 \| 2048 \| 57.59 \| 48.22 \| 38.93 \| 81.9 \| 55.65 \| 77.74 \| 33.6 \| 66.19 \|`
2984
2985
2986
2987	`- C-MTEB:`
2988	`We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.`
2989	`Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md) for a detailed introduction.`
2990
2991	`\| Model \| Embedding dimension \| Avg \| Retrieval \| STS \| PairClassification \| Classification \| Reranking \| Clustering \|`
2992	`\|:-------------------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|`
2993	`\| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) \| 1024 \| 64.53 \| 70.46 \| 56.25 \| 81.6 \| 69.13 \| 65.84 \| 48.99 \|`
2994	`\| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) \| 768 \| 63.13 \| 69.49 \| 53.72 \| 79.75 \| 68.07 \| 65.39 \| 47.53 \|`
2995	`\| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) \| 512 \| 57.82 \| 61.77 \| 49.11 \| 70.41 \| 63.96 \| 60.92 \| 44.18 \|`
2996	`\| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) \| 1024 \| 64.20 \| 71.53 \| 54.98 \| 78.94 \| 68.32 \| 65.11 \| 48.39 \|`
2997	`\| [bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) \| 1024 \| 63.53 \| 70.55 \| 53 \| 76.77 \| 68.58 \| 64.91 \| 50.01 \|`
2998	`\| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) \| 768 \| 62.96 \| 69.53 \| 54.12 \| 77.5 \| 67.07 \| 64.91 \| 47.63 \|`
2999	`\| [multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) \| 1024 \| 58.79 \| 63.66 \| 48.44 \| 69.89 \| 67.34 \| 56.00 \| 48.23 \|`
3000	`\| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) \| 512 \| 58.27 \| 63.07 \| 49.45 \| 70.35 \| 63.64 \| 61.48 \| 45.09 \|`
3001	`\| [m3e-base](https://huggingface.co/moka-ai/m3e-base) \| 768 \| 57.10 \| 56.91 \| 50.47 \| 63.99 \| 67.52 \| 59.34 \| 47.68 \|`
3002	`\| [m3e-large](https://huggingface.co/moka-ai/m3e-large) \| 1024 \| 57.05 \| 54.75 \| 50.42 \| 64.3 \| 68.2 \| 59.66 \| 48.88 \|`
3003	`\| [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) \| 768 \| 55.48 \| 61.63 \| 46.49 \| 67.07 \| 65.35 \| 54.35 \| 40.68 \|`
3004	`\| [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) \| 384 \| 55.38 \| 59.95 \| 45.27 \| 66.45 \| 65.85 \| 53.86 \| 45.26 \|`
3005	`\| [text-embedding-ada-002(OpenAI)](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) \| 1536 \| 53.02 \| 52.0 \| 43.35 \| 69.56 \| 64.31 \| 54.28 \| 45.68 \|`
3006	`\| [luotuo](https://huggingface.co/silk-road/luotuo-bert-medium) \| 1024 \| 49.37 \| 44.4 \| 42.78 \| 66.62 \| 61 \| 49.25 \| 44.39 \|`
3007	`\| [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) \| 768 \| 47.63 \| 38.79 \| 43.41 \| 67.41 \| 62.19 \| 49.45 \| 37.66 \|`
3008	`\| [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) \| 1024 \| 47.36 \| 41.94 \| 44.97 \| 70.86 \| 60.66 \| 49.16 \| 30.02 \|`
3009
3010
3011	`- Reranking:`
3012	`See [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/) for evaluation script.`
3013
3014	`\| Model \| T2Reranking \| T2RerankingZh2En\* \| T2RerankingEn2Zh\* \| MMarcoReranking \| CMedQAv1 \| CMedQAv2 \| Avg \|`
3015	`\|:-------------------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|`
3016	`\| text2vec-base-multilingual \| 64.66 \| 62.94 \| 62.51 \| 14.37 \| 48.46 \| 48.6 \| 50.26 \|`
3017	`\| multilingual-e5-small \| 65.62 \| 60.94 \| 56.41 \| 29.91 \| 67.26 \| 66.54 \| 57.78 \|`
3018	`\| multilingual-e5-large \| 64.55 \| 61.61 \| 54.28 \| 28.6 \| 67.42 \| 67.92 \| 57.4 \|`
3019	`\| multilingual-e5-base \| 64.21 \| 62.13 \| 54.68 \| 29.5 \| 66.23 \| 66.98 \| 57.29 \|`
3020	`\| m3e-base \| 66.03 \| 62.74 \| 56.07 \| 17.51 \| 77.05 \| 76.76 \| 59.36 \|`
3021	`\| m3e-large \| 66.13 \| 62.72 \| 56.1 \| 16.46 \| 77.76 \| 78.27 \| 59.57 \|`
3022	`\| bge-base-zh-v1.5 \| 66.49 \| 63.25 \| 57.02 \| 29.74 \| 80.47 \| 84.88 \| 63.64 \|`
3023	`\| bge-large-zh-v1.5 \| 65.74 \| 63.39 \| 57.03 \| 28.74 \| 83.45 \| 85.44 \| 63.97 \|`
3024	`\| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) \| 67.28 \| 63.95 \| 60.45 \| 35.46 \| 81.26 \| 84.1 \| 65.42 \|`
3025	`\| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) \| 67.6 \| 64.03 \| 61.44 \| 37.16 \| 82.15 \| 84.18 \| 66.09 \|`
3026
3027	`\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks`
3028
3029	`## Train`
3030
3031	`### BAAI Embedding`
3032
3033	`We pre-train the models using [retromae](https://github.com/staoxiao/RetroMAE) and train them on large-scale pairs data using contrastive learning.`
3034	`You can fine-tune the embedding model on your data following our [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune).`
3035	`We also provide a [pre-train example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain).`
3036	`Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.`
3037	`More training details for bge see [baai_general_embedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md).`
3038
3039
3040
3041	`### BGE Reranker`
3042
3043	`Cross-encoder will perform full-attention over the input pair,`
3044	`which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.`
3045	`Therefore, it can be used to re-rank the top-k documents returned by embedding model.`
3046	`We train the cross-encoder on a multilingual pair data,`
3047	`The data format is the same as embedding model, so you can fine-tune it easily following our [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker).`
3048	`More details please refer to [./FlagEmbedding/reranker/README.md](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)`
3049
3050
3051	`## Contact`
3052	`If you have any question or suggestion related to this project, feel free to open an issue or pull request.`
3053	`You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn).`
3054
3055
3056	`## Citation`
3057
3058	`If you find this repository useful, please consider giving a star :star: and citation`
3059
3060	```
3061	`@misc{bge_embedding,`
3062	`title={C-Pack: Packaged Resources To Advance General Chinese Embedding},`
3063	`author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},`
3064	`year={2023},`
3065	`eprint={2309.07597},`
3066	`archivePrefix={arXiv},`
3067	`primaryClass={cs.CL}`
3068	`}`
3069	```
3070
3071	`## License`
3072	`FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge.`
3073
3074