README.md · bge-large-en-v1.5

README.md

92.4 KB · 3070 lines · markdown Raw

1	`---`
2	`tags:`
3	`- sentence-transformers`
4	`- feature-extraction`
5	`- sentence-similarity`
6	`- transformers`
7	`- mteb`
8	`model-index:`
9	`- name: bge-large-en-v1.5`
10	`results:`
11	`- task:`
12	`type: Classification`
13	`dataset:`
14	`type: mteb/amazon_counterfactual`
15	`name: MTEB AmazonCounterfactualClassification (en)`
16	`config: en`
17	`split: test`
18	`revision: e8379541af4e31359cca9fbcf4b00f2671dba205`
19	`metrics:`
20	`- type: accuracy`
21	`value: 75.8507462686567`
22	`- type: ap`
23	`value: 38.566457320228245`
24	`- type: f1`
25	`value: 69.69386648043475`
26	`- task:`
27	`type: Classification`
28	`dataset:`
29	`type: mteb/amazon_polarity`
30	`name: MTEB AmazonPolarityClassification`
31	`config: default`
32	`split: test`
33	`revision: e2d317d38cd51312af73b3d32a06d1a08b442046`
34	`metrics:`
35	`- type: accuracy`
36	`value: 92.416675`
37	`- type: ap`
38	`value: 89.1928861155922`
39	`- type: f1`
40	`value: 92.39477019574215`
41	`- task:`
42	`type: Classification`
43	`dataset:`
44	`type: mteb/amazon_reviews_multi`
45	`name: MTEB AmazonReviewsClassification (en)`
46	`config: en`
47	`split: test`
48	`revision: 1399c76144fd37290681b995c656ef9b2e06e26d`
49	`metrics:`
50	`- type: accuracy`
51	`value: 48.175999999999995`
52	`- type: f1`
53	`value: 47.80712792870253`
54	`- task:`
55	`type: Retrieval`
56	`dataset:`
57	`type: arguana`
58	`name: MTEB ArguAna`
59	`config: default`
60	`split: test`
61	`revision: None`
62	`metrics:`
63	`- type: map_at_1`
64	`value: 40.184999999999995`
65	`- type: map_at_10`
66	`value: 55.654`
67	`- type: map_at_100`
68	`value: 56.25`
69	`- type: map_at_1000`
70	`value: 56.255`
71	`- type: map_at_3`
72	`value: 51.742999999999995`
73	`- type: map_at_5`
74	`value: 54.129000000000005`
75	`- type: mrr_at_1`
76	`value: 40.967`
77	`- type: mrr_at_10`
78	`value: 55.96`
79	`- type: mrr_at_100`
80	`value: 56.54900000000001`
81	`- type: mrr_at_1000`
82	`value: 56.554`
83	`- type: mrr_at_3`
84	`value: 51.980000000000004`
85	`- type: mrr_at_5`
86	`value: 54.44`
87	`- type: ndcg_at_1`
88	`value: 40.184999999999995`
89	`- type: ndcg_at_10`
90	`value: 63.542`
91	`- type: ndcg_at_100`
92	`value: 65.96499999999999`
93	`- type: ndcg_at_1000`
94	`value: 66.08699999999999`
95	`- type: ndcg_at_3`
96	`value: 55.582`
97	`- type: ndcg_at_5`
98	`value: 59.855000000000004`
99	`- type: precision_at_1`
100	`value: 40.184999999999995`
101	`- type: precision_at_10`
102	`value: 8.841000000000001`
103	`- type: precision_at_100`
104	`value: 0.987`
105	`- type: precision_at_1000`
106	`value: 0.1`
107	`- type: precision_at_3`
108	`value: 22.238`
109	`- type: precision_at_5`
110	`value: 15.405`
111	`- type: recall_at_1`
112	`value: 40.184999999999995`
113	`- type: recall_at_10`
114	`value: 88.407`
115	`- type: recall_at_100`
116	`value: 98.72`
117	`- type: recall_at_1000`
118	`value: 99.644`
119	`- type: recall_at_3`
120	`value: 66.714`
121	`- type: recall_at_5`
122	`value: 77.027`
123	`- task:`
124	`type: Clustering`
125	`dataset:`
126	`type: mteb/arxiv-clustering-p2p`
127	`name: MTEB ArxivClusteringP2P`
128	`config: default`
129	`split: test`
130	`revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d`
131	`metrics:`
132	`- type: v_measure`
133	`value: 48.567077926750066`
134	`- task:`
135	`type: Clustering`
136	`dataset:`
137	`type: mteb/arxiv-clustering-s2s`
138	`name: MTEB ArxivClusteringS2S`
139	`config: default`
140	`split: test`
141	`revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53`
142	`metrics:`
143	`- type: v_measure`
144	`value: 43.19453389182364`
145	`- task:`
146	`type: Reranking`
147	`dataset:`
148	`type: mteb/askubuntudupquestions-reranking`
149	`name: MTEB AskUbuntuDupQuestions`
150	`config: default`
151	`split: test`
152	`revision: 2000358ca161889fa9c082cb41daa8dcfb161a54`
153	`metrics:`
154	`- type: map`
155	`value: 64.46555939623092`
156	`- type: mrr`
157	`value: 77.82361605768807`
158	`- task:`
159	`type: STS`
160	`dataset:`
161	`type: mteb/biosses-sts`
162	`name: MTEB BIOSSES`
163	`config: default`
164	`split: test`
165	`revision: d3fb88f8f02e40887cd149695127462bbcf29b4a`
166	`metrics:`
167	`- type: cos_sim_pearson`
168	`value: 84.9554128814735`
169	`- type: cos_sim_spearman`
170	`value: 84.65373612172036`
171	`- type: euclidean_pearson`
172	`value: 83.2905059954138`
173	`- type: euclidean_spearman`
174	`value: 84.52240782811128`
175	`- type: manhattan_pearson`
176	`value: 82.99533802997436`
177	`- type: manhattan_spearman`
178	`value: 84.20673798475734`
179	`- task:`
180	`type: Classification`
181	`dataset:`
182	`type: mteb/banking77`
183	`name: MTEB Banking77Classification`
184	`config: default`
185	`split: test`
186	`revision: 0fd18e25b25c072e09e0d92ab615fda904d66300`
187	`metrics:`
188	`- type: accuracy`
189	`value: 87.78896103896103`
190	`- type: f1`
191	`value: 87.77189310964883`
192	`- task:`
193	`type: Clustering`
194	`dataset:`
195	`type: mteb/biorxiv-clustering-p2p`
196	`name: MTEB BiorxivClusteringP2P`
197	`config: default`
198	`split: test`
199	`revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40`
200	`metrics:`
201	`- type: v_measure`
202	`value: 39.714538337650495`
203	`- task:`
204	`type: Clustering`
205	`dataset:`
206	`type: mteb/biorxiv-clustering-s2s`
207	`name: MTEB BiorxivClusteringS2S`
208	`config: default`
209	`split: test`
210	`revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908`
211	`metrics:`
212	`- type: v_measure`
213	`value: 36.90108349284447`
214	`- task:`
215	`type: Retrieval`
216	`dataset:`
217	`type: BeIR/cqadupstack`
218	`name: MTEB CQADupstackAndroidRetrieval`
219	`config: default`
220	`split: test`
221	`revision: None`
222	`metrics:`
223	`- type: map_at_1`
224	`value: 32.795`
225	`- type: map_at_10`
226	`value: 43.669000000000004`
227	`- type: map_at_100`
228	`value: 45.151`
229	`- type: map_at_1000`
230	`value: 45.278`
231	`- type: map_at_3`
232	`value: 40.006`
233	`- type: map_at_5`
234	`value: 42.059999999999995`
235	`- type: mrr_at_1`
236	`value: 39.771`
237	`- type: mrr_at_10`
238	`value: 49.826`
239	`- type: mrr_at_100`
240	`value: 50.504000000000005`
241	`- type: mrr_at_1000`
242	`value: 50.549`
243	`- type: mrr_at_3`
244	`value: 47.115`
245	`- type: mrr_at_5`
246	`value: 48.832`
247	`- type: ndcg_at_1`
248	`value: 39.771`
249	`- type: ndcg_at_10`
250	`value: 50.217999999999996`
251	`- type: ndcg_at_100`
252	`value: 55.454`
253	`- type: ndcg_at_1000`
254	`value: 57.37`
255	`- type: ndcg_at_3`
256	`value: 44.885000000000005`
257	`- type: ndcg_at_5`
258	`value: 47.419`
259	`- type: precision_at_1`
260	`value: 39.771`
261	`- type: precision_at_10`
262	`value: 9.642000000000001`
263	`- type: precision_at_100`
264	`value: 1.538`
265	`- type: precision_at_1000`
266	`value: 0.198`
267	`- type: precision_at_3`
268	`value: 21.268`
269	`- type: precision_at_5`
270	`value: 15.536`
271	`- type: recall_at_1`
272	`value: 32.795`
273	`- type: recall_at_10`
274	`value: 62.580999999999996`
275	`- type: recall_at_100`
276	`value: 84.438`
277	`- type: recall_at_1000`
278	`value: 96.492`
279	`- type: recall_at_3`
280	`value: 47.071000000000005`
281	`- type: recall_at_5`
282	`value: 54.079`
283	`- task:`
284	`type: Retrieval`
285	`dataset:`
286	`type: BeIR/cqadupstack`
287	`name: MTEB CQADupstackEnglishRetrieval`
288	`config: default`
289	`split: test`
290	`revision: None`
291	`metrics:`
292	`- type: map_at_1`
293	`value: 32.671`
294	`- type: map_at_10`
295	`value: 43.334`
296	`- type: map_at_100`
297	`value: 44.566`
298	`- type: map_at_1000`
299	`value: 44.702999999999996`
300	`- type: map_at_3`
301	`value: 40.343`
302	`- type: map_at_5`
303	`value: 41.983`
304	`- type: mrr_at_1`
305	`value: 40.764`
306	`- type: mrr_at_10`
307	`value: 49.382`
308	`- type: mrr_at_100`
309	`value: 49.988`
310	`- type: mrr_at_1000`
311	`value: 50.03300000000001`
312	`- type: mrr_at_3`
313	`value: 47.293`
314	`- type: mrr_at_5`
315	`value: 48.51`
316	`- type: ndcg_at_1`
317	`value: 40.764`
318	`- type: ndcg_at_10`
319	`value: 49.039`
320	`- type: ndcg_at_100`
321	`value: 53.259`
322	`- type: ndcg_at_1000`
323	`value: 55.253`
324	`- type: ndcg_at_3`
325	`value: 45.091`
326	`- type: ndcg_at_5`
327	`value: 46.839999999999996`
328	`- type: precision_at_1`
329	`value: 40.764`
330	`- type: precision_at_10`
331	`value: 9.191`
332	`- type: precision_at_100`
333	`value: 1.476`
334	`- type: precision_at_1000`
335	`value: 0.19499999999999998`
336	`- type: precision_at_3`
337	`value: 21.72`
338	`- type: precision_at_5`
339	`value: 15.299`
340	`- type: recall_at_1`
341	`value: 32.671`
342	`- type: recall_at_10`
343	`value: 58.816`
344	`- type: recall_at_100`
345	`value: 76.654`
346	`- type: recall_at_1000`
347	`value: 89.05999999999999`
348	`- type: recall_at_3`
349	`value: 46.743`
350	`- type: recall_at_5`
351	`value: 51.783`
352	`- task:`
353	`type: Retrieval`
354	`dataset:`
355	`type: BeIR/cqadupstack`
356	`name: MTEB CQADupstackGamingRetrieval`
357	`config: default`
358	`split: test`
359	`revision: None`
360	`metrics:`
361	`- type: map_at_1`
362	`value: 40.328`
363	`- type: map_at_10`
364	`value: 53.32599999999999`
365	`- type: map_at_100`
366	`value: 54.37499999999999`
367	`- type: map_at_1000`
368	`value: 54.429`
369	`- type: map_at_3`
370	`value: 49.902`
371	`- type: map_at_5`
372	`value: 52.002`
373	`- type: mrr_at_1`
374	`value: 46.332`
375	`- type: mrr_at_10`
376	`value: 56.858`
377	`- type: mrr_at_100`
378	`value: 57.522`
379	`- type: mrr_at_1000`
380	`value: 57.54899999999999`
381	`- type: mrr_at_3`
382	`value: 54.472`
383	`- type: mrr_at_5`
384	`value: 55.996`
385	`- type: ndcg_at_1`
386	`value: 46.332`
387	`- type: ndcg_at_10`
388	`value: 59.313`
389	`- type: ndcg_at_100`
390	`value: 63.266999999999996`
391	`- type: ndcg_at_1000`
392	`value: 64.36`
393	`- type: ndcg_at_3`
394	`value: 53.815000000000005`
395	`- type: ndcg_at_5`
396	`value: 56.814`
397	`- type: precision_at_1`
398	`value: 46.332`
399	`- type: precision_at_10`
400	`value: 9.53`
401	`- type: precision_at_100`
402	`value: 1.238`
403	`- type: precision_at_1000`
404	`value: 0.13699999999999998`
405	`- type: precision_at_3`
406	`value: 24.054000000000002`
407	`- type: precision_at_5`
408	`value: 16.589000000000002`
409	`- type: recall_at_1`
410	`value: 40.328`
411	`- type: recall_at_10`
412	`value: 73.421`
413	`- type: recall_at_100`
414	`value: 90.059`
415	`- type: recall_at_1000`
416	`value: 97.81`
417	`- type: recall_at_3`
418	`value: 59.009`
419	`- type: recall_at_5`
420	`value: 66.352`
421	`- task:`
422	`type: Retrieval`
423	`dataset:`
424	`type: BeIR/cqadupstack`
425	`name: MTEB CQADupstackGisRetrieval`
426	`config: default`
427	`split: test`
428	`revision: None`
429	`metrics:`
430	`- type: map_at_1`
431	`value: 27.424`
432	`- type: map_at_10`
433	`value: 36.332`
434	`- type: map_at_100`
435	`value: 37.347`
436	`- type: map_at_1000`
437	`value: 37.422`
438	`- type: map_at_3`
439	`value: 33.743`
440	`- type: map_at_5`
441	`value: 35.176`
442	`- type: mrr_at_1`
443	`value: 29.153000000000002`
444	`- type: mrr_at_10`
445	`value: 38.233`
446	`- type: mrr_at_100`
447	`value: 39.109`
448	`- type: mrr_at_1000`
449	`value: 39.164`
450	`- type: mrr_at_3`
451	`value: 35.876000000000005`
452	`- type: mrr_at_5`
453	`value: 37.169000000000004`
454	`- type: ndcg_at_1`
455	`value: 29.153000000000002`
456	`- type: ndcg_at_10`
457	`value: 41.439`
458	`- type: ndcg_at_100`
459	`value: 46.42`
460	`- type: ndcg_at_1000`
461	`value: 48.242000000000004`
462	`- type: ndcg_at_3`
463	`value: 36.362`
464	`- type: ndcg_at_5`
465	`value: 38.743`
466	`- type: precision_at_1`
467	`value: 29.153000000000002`
468	`- type: precision_at_10`
469	`value: 6.315999999999999`
470	`- type: precision_at_100`
471	`value: 0.927`
472	`- type: precision_at_1000`
473	`value: 0.11199999999999999`
474	`- type: precision_at_3`
475	`value: 15.443000000000001`
476	`- type: precision_at_5`
477	`value: 10.644`
478	`- type: recall_at_1`
479	`value: 27.424`
480	`- type: recall_at_10`
481	`value: 55.364000000000004`
482	`- type: recall_at_100`
483	`value: 78.211`
484	`- type: recall_at_1000`
485	`value: 91.74600000000001`
486	`- type: recall_at_3`
487	`value: 41.379`
488	`- type: recall_at_5`
489	`value: 47.14`
490	`- task:`
491	`type: Retrieval`
492	`dataset:`
493	`type: BeIR/cqadupstack`
494	`name: MTEB CQADupstackMathematicaRetrieval`
495	`config: default`
496	`split: test`
497	`revision: None`
498	`metrics:`
499	`- type: map_at_1`
500	`value: 19.601`
501	`- type: map_at_10`
502	`value: 27.826`
503	`- type: map_at_100`
504	`value: 29.017`
505	`- type: map_at_1000`
506	`value: 29.137`
507	`- type: map_at_3`
508	`value: 25.125999999999998`
509	`- type: map_at_5`
510	`value: 26.765`
511	`- type: mrr_at_1`
512	`value: 24.005000000000003`
513	`- type: mrr_at_10`
514	`value: 32.716`
515	`- type: mrr_at_100`
516	`value: 33.631`
517	`- type: mrr_at_1000`
518	`value: 33.694`
519	`- type: mrr_at_3`
520	`value: 29.934`
521	`- type: mrr_at_5`
522	`value: 31.630999999999997`
523	`- type: ndcg_at_1`
524	`value: 24.005000000000003`
525	`- type: ndcg_at_10`
526	`value: 33.158`
527	`- type: ndcg_at_100`
528	`value: 38.739000000000004`
529	`- type: ndcg_at_1000`
530	`value: 41.495`
531	`- type: ndcg_at_3`
532	`value: 28.185`
533	`- type: ndcg_at_5`
534	`value: 30.796`
535	`- type: precision_at_1`
536	`value: 24.005000000000003`
537	`- type: precision_at_10`
538	`value: 5.908`
539	`- type: precision_at_100`
540	`value: 1.005`
541	`- type: precision_at_1000`
542	`value: 0.13899999999999998`
543	`- type: precision_at_3`
544	`value: 13.391`
545	`- type: precision_at_5`
546	`value: 9.876`
547	`- type: recall_at_1`
548	`value: 19.601`
549	`- type: recall_at_10`
550	`value: 44.746`
551	`- type: recall_at_100`
552	`value: 68.82300000000001`
553	`- type: recall_at_1000`
554	`value: 88.215`
555	`- type: recall_at_3`
556	`value: 31.239`
557	`- type: recall_at_5`
558	`value: 37.695`
559	`- task:`
560	`type: Retrieval`
561	`dataset:`
562	`type: BeIR/cqadupstack`
563	`name: MTEB CQADupstackPhysicsRetrieval`
564	`config: default`
565	`split: test`
566	`revision: None`
567	`metrics:`
568	`- type: map_at_1`
569	`value: 30.130000000000003`
570	`- type: map_at_10`
571	`value: 40.96`
572	`- type: map_at_100`
573	`value: 42.282`
574	`- type: map_at_1000`
575	`value: 42.392`
576	`- type: map_at_3`
577	`value: 37.889`
578	`- type: map_at_5`
579	`value: 39.661`
580	`- type: mrr_at_1`
581	`value: 36.958999999999996`
582	`- type: mrr_at_10`
583	`value: 46.835`
584	`- type: mrr_at_100`
585	`value: 47.644`
586	`- type: mrr_at_1000`
587	`value: 47.688`
588	`- type: mrr_at_3`
589	`value: 44.562000000000005`
590	`- type: mrr_at_5`
591	`value: 45.938`
592	`- type: ndcg_at_1`
593	`value: 36.958999999999996`
594	`- type: ndcg_at_10`
595	`value: 47.06`
596	`- type: ndcg_at_100`
597	`value: 52.345`
598	`- type: ndcg_at_1000`
599	`value: 54.35`
600	`- type: ndcg_at_3`
601	`value: 42.301`
602	`- type: ndcg_at_5`
603	`value: 44.635999999999996`
604	`- type: precision_at_1`
605	`value: 36.958999999999996`
606	`- type: precision_at_10`
607	`value: 8.479000000000001`
608	`- type: precision_at_100`
609	`value: 1.284`
610	`- type: precision_at_1000`
611	`value: 0.163`
612	`- type: precision_at_3`
613	`value: 20.244`
614	`- type: precision_at_5`
615	`value: 14.224999999999998`
616	`- type: recall_at_1`
617	`value: 30.130000000000003`
618	`- type: recall_at_10`
619	`value: 59.27`
620	`- type: recall_at_100`
621	`value: 81.195`
622	`- type: recall_at_1000`
623	`value: 94.21199999999999`
624	`- type: recall_at_3`
625	`value: 45.885`
626	`- type: recall_at_5`
627	`value: 52.016`
628	`- task:`
629	`type: Retrieval`
630	`dataset:`
631	`type: BeIR/cqadupstack`
632	`name: MTEB CQADupstackProgrammersRetrieval`
633	`config: default`
634	`split: test`
635	`revision: None`
636	`metrics:`
637	`- type: map_at_1`
638	`value: 26.169999999999998`
639	`- type: map_at_10`
640	`value: 36.451`
641	`- type: map_at_100`
642	`value: 37.791000000000004`
643	`- type: map_at_1000`
644	`value: 37.897`
645	`- type: map_at_3`
646	`value: 33.109`
647	`- type: map_at_5`
648	`value: 34.937000000000005`
649	`- type: mrr_at_1`
650	`value: 32.877`
651	`- type: mrr_at_10`
652	`value: 42.368`
653	`- type: mrr_at_100`
654	`value: 43.201`
655	`- type: mrr_at_1000`
656	`value: 43.259`
657	`- type: mrr_at_3`
658	`value: 39.763999999999996`
659	`- type: mrr_at_5`
660	`value: 41.260000000000005`
661	`- type: ndcg_at_1`
662	`value: 32.877`
663	`- type: ndcg_at_10`
664	`value: 42.659000000000006`
665	`- type: ndcg_at_100`
666	`value: 48.161`
667	`- type: ndcg_at_1000`
668	`value: 50.345`
669	`- type: ndcg_at_3`
670	`value: 37.302`
671	`- type: ndcg_at_5`
672	`value: 39.722`
673	`- type: precision_at_1`
674	`value: 32.877`
675	`- type: precision_at_10`
676	`value: 7.9`
677	`- type: precision_at_100`
678	`value: 1.236`
679	`- type: precision_at_1000`
680	`value: 0.158`
681	`- type: precision_at_3`
682	`value: 17.846`
683	`- type: precision_at_5`
684	`value: 12.9`
685	`- type: recall_at_1`
686	`value: 26.169999999999998`
687	`- type: recall_at_10`
688	`value: 55.35`
689	`- type: recall_at_100`
690	`value: 78.755`
691	`- type: recall_at_1000`
692	`value: 93.518`
693	`- type: recall_at_3`
694	`value: 40.176`
695	`- type: recall_at_5`
696	`value: 46.589000000000006`
697	`- task:`
698	`type: Retrieval`
699	`dataset:`
700	`type: BeIR/cqadupstack`
701	`name: MTEB CQADupstackRetrieval`
702	`config: default`
703	`split: test`
704	`revision: None`
705	`metrics:`
706	`- type: map_at_1`
707	`value: 27.15516666666667`
708	`- type: map_at_10`
709	`value: 36.65741666666667`
710	`- type: map_at_100`
711	`value: 37.84991666666666`
712	`- type: map_at_1000`
713	`value: 37.96316666666667`
714	`- type: map_at_3`
715	`value: 33.74974999999999`
716	`- type: map_at_5`
717	`value: 35.3765`
718	`- type: mrr_at_1`
719	`value: 32.08233333333334`
720	`- type: mrr_at_10`
721	`value: 41.033833333333334`
722	`- type: mrr_at_100`
723	`value: 41.84524999999999`
724	`- type: mrr_at_1000`
725	`value: 41.89983333333333`
726	`- type: mrr_at_3`
727	`value: 38.62008333333333`
728	`- type: mrr_at_5`
729	`value: 40.03441666666666`
730	`- type: ndcg_at_1`
731	`value: 32.08233333333334`
732	`- type: ndcg_at_10`
733	`value: 42.229`
734	`- type: ndcg_at_100`
735	`value: 47.26716666666667`
736	`- type: ndcg_at_1000`
737	`value: 49.43466666666667`
738	`- type: ndcg_at_3`
739	`value: 37.36408333333333`
740	`- type: ndcg_at_5`
741	`value: 39.6715`
742	`- type: precision_at_1`
743	`value: 32.08233333333334`
744	`- type: precision_at_10`
745	`value: 7.382583333333334`
746	`- type: precision_at_100`
747	`value: 1.16625`
748	`- type: precision_at_1000`
749	`value: 0.15408333333333332`
750	`- type: precision_at_3`
751	`value: 17.218`
752	`- type: precision_at_5`
753	`value: 12.21875`
754	`- type: recall_at_1`
755	`value: 27.15516666666667`
756	`- type: recall_at_10`
757	`value: 54.36683333333333`
758	`- type: recall_at_100`
759	`value: 76.37183333333333`
760	`- type: recall_at_1000`
761	`value: 91.26183333333333`
762	`- type: recall_at_3`
763	`value: 40.769916666666674`
764	`- type: recall_at_5`
765	`value: 46.702333333333335`
766	`- task:`
767	`type: Retrieval`
768	`dataset:`
769	`type: BeIR/cqadupstack`
770	`name: MTEB CQADupstackStatsRetrieval`
771	`config: default`
772	`split: test`
773	`revision: None`
774	`metrics:`
775	`- type: map_at_1`
776	`value: 25.749`
777	`- type: map_at_10`
778	`value: 33.001999999999995`
779	`- type: map_at_100`
780	`value: 33.891`
781	`- type: map_at_1000`
782	`value: 33.993`
783	`- type: map_at_3`
784	`value: 30.703999999999997`
785	`- type: map_at_5`
786	`value: 31.959`
787	`- type: mrr_at_1`
788	`value: 28.834`
789	`- type: mrr_at_10`
790	`value: 35.955`
791	`- type: mrr_at_100`
792	`value: 36.709`
793	`- type: mrr_at_1000`
794	`value: 36.779`
795	`- type: mrr_at_3`
796	`value: 33.947`
797	`- type: mrr_at_5`
798	`value: 35.089`
799	`- type: ndcg_at_1`
800	`value: 28.834`
801	`- type: ndcg_at_10`
802	`value: 37.329`
803	`- type: ndcg_at_100`
804	`value: 41.79`
805	`- type: ndcg_at_1000`
806	`value: 44.169000000000004`
807	`- type: ndcg_at_3`
808	`value: 33.184999999999995`
809	`- type: ndcg_at_5`
810	`value: 35.107`
811	`- type: precision_at_1`
812	`value: 28.834`
813	`- type: precision_at_10`
814	`value: 5.7669999999999995`
815	`- type: precision_at_100`
816	`value: 0.876`
817	`- type: precision_at_1000`
818	`value: 0.11399999999999999`
819	`- type: precision_at_3`
820	`value: 14.213000000000001`
821	`- type: precision_at_5`
822	`value: 9.754999999999999`
823	`- type: recall_at_1`
824	`value: 25.749`
825	`- type: recall_at_10`
826	`value: 47.791`
827	`- type: recall_at_100`
828	`value: 68.255`
829	`- type: recall_at_1000`
830	`value: 85.749`
831	`- type: recall_at_3`
832	`value: 36.199`
833	`- type: recall_at_5`
834	`value: 41.071999999999996`
835	`- task:`
836	`type: Retrieval`
837	`dataset:`
838	`type: BeIR/cqadupstack`
839	`name: MTEB CQADupstackTexRetrieval`
840	`config: default`
841	`split: test`
842	`revision: None`
843	`metrics:`
844	`- type: map_at_1`
845	`value: 17.777`
846	`- type: map_at_10`
847	`value: 25.201`
848	`- type: map_at_100`
849	`value: 26.423999999999996`
850	`- type: map_at_1000`
851	`value: 26.544`
852	`- type: map_at_3`
853	`value: 22.869`
854	`- type: map_at_5`
855	`value: 24.023`
856	`- type: mrr_at_1`
857	`value: 21.473`
858	`- type: mrr_at_10`
859	`value: 29.12`
860	`- type: mrr_at_100`
861	`value: 30.144`
862	`- type: mrr_at_1000`
863	`value: 30.215999999999998`
864	`- type: mrr_at_3`
865	`value: 26.933`
866	`- type: mrr_at_5`
867	`value: 28.051`
868	`- type: ndcg_at_1`
869	`value: 21.473`
870	`- type: ndcg_at_10`
871	`value: 30.003`
872	`- type: ndcg_at_100`
873	`value: 35.766`
874	`- type: ndcg_at_1000`
875	`value: 38.501000000000005`
876	`- type: ndcg_at_3`
877	`value: 25.773000000000003`
878	`- type: ndcg_at_5`
879	`value: 27.462999999999997`
880	`- type: precision_at_1`
881	`value: 21.473`
882	`- type: precision_at_10`
883	`value: 5.482`
884	`- type: precision_at_100`
885	`value: 0.975`
886	`- type: precision_at_1000`
887	`value: 0.13799999999999998`
888	`- type: precision_at_3`
889	`value: 12.205`
890	`- type: precision_at_5`
891	`value: 8.692`
892	`- type: recall_at_1`
893	`value: 17.777`
894	`- type: recall_at_10`
895	`value: 40.582`
896	`- type: recall_at_100`
897	`value: 66.305`
898	`- type: recall_at_1000`
899	`value: 85.636`
900	`- type: recall_at_3`
901	`value: 28.687`
902	`- type: recall_at_5`
903	`value: 33.089`
904	`- task:`
905	`type: Retrieval`
906	`dataset:`
907	`type: BeIR/cqadupstack`
908	`name: MTEB CQADupstackUnixRetrieval`
909	`config: default`
910	`split: test`
911	`revision: None`
912	`metrics:`
913	`- type: map_at_1`
914	`value: 26.677`
915	`- type: map_at_10`
916	`value: 36.309000000000005`
917	`- type: map_at_100`
918	`value: 37.403999999999996`
919	`- type: map_at_1000`
920	`value: 37.496`
921	`- type: map_at_3`
922	`value: 33.382`
923	`- type: map_at_5`
924	`value: 34.98`
925	`- type: mrr_at_1`
926	`value: 31.343`
927	`- type: mrr_at_10`
928	`value: 40.549`
929	`- type: mrr_at_100`
930	`value: 41.342`
931	`- type: mrr_at_1000`
932	`value: 41.397`
933	`- type: mrr_at_3`
934	`value: 38.029`
935	`- type: mrr_at_5`
936	`value: 39.451`
937	`- type: ndcg_at_1`
938	`value: 31.343`
939	`- type: ndcg_at_10`
940	`value: 42.1`
941	`- type: ndcg_at_100`
942	`value: 47.089999999999996`
943	`- type: ndcg_at_1000`
944	`value: 49.222`
945	`- type: ndcg_at_3`
946	`value: 36.836999999999996`
947	`- type: ndcg_at_5`
948	`value: 39.21`
949	`- type: precision_at_1`
950	`value: 31.343`
951	`- type: precision_at_10`
952	`value: 7.164`
953	`- type: precision_at_100`
954	`value: 1.0959999999999999`
955	`- type: precision_at_1000`
956	`value: 0.13899999999999998`
957	`- type: precision_at_3`
958	`value: 16.915`
959	`- type: precision_at_5`
960	`value: 11.940000000000001`
961	`- type: recall_at_1`
962	`value: 26.677`
963	`- type: recall_at_10`
964	`value: 55.54599999999999`
965	`- type: recall_at_100`
966	`value: 77.094`
967	`- type: recall_at_1000`
968	`value: 92.01`
969	`- type: recall_at_3`
970	`value: 41.191`
971	`- type: recall_at_5`
972	`value: 47.006`
973	`- task:`
974	`type: Retrieval`
975	`dataset:`
976	`type: BeIR/cqadupstack`
977	`name: MTEB CQADupstackWebmastersRetrieval`
978	`config: default`
979	`split: test`
980	`revision: None`
981	`metrics:`
982	`- type: map_at_1`
983	`value: 24.501`
984	`- type: map_at_10`
985	`value: 33.102`
986	`- type: map_at_100`
987	`value: 34.676`
988	`- type: map_at_1000`
989	`value: 34.888000000000005`
990	`- type: map_at_3`
991	`value: 29.944`
992	`- type: map_at_5`
993	`value: 31.613999999999997`
994	`- type: mrr_at_1`
995	`value: 29.447000000000003`
996	`- type: mrr_at_10`
997	`value: 37.996`
998	`- type: mrr_at_100`
999	`value: 38.946`
1000	`- type: mrr_at_1000`
1001	`value: 38.995000000000005`
1002	`- type: mrr_at_3`
1003	`value: 35.079`
1004	`- type: mrr_at_5`
1005	`value: 36.69`
1006	`- type: ndcg_at_1`
1007	`value: 29.447000000000003`
1008	`- type: ndcg_at_10`
1009	`value: 39.232`
1010	`- type: ndcg_at_100`
1011	`value: 45.247`
1012	`- type: ndcg_at_1000`
1013	`value: 47.613`
1014	`- type: ndcg_at_3`
1015	`value: 33.922999999999995`
1016	`- type: ndcg_at_5`
1017	`value: 36.284`
1018	`- type: precision_at_1`
1019	`value: 29.447000000000003`
1020	`- type: precision_at_10`
1021	`value: 7.648000000000001`
1022	`- type: precision_at_100`
1023	`value: 1.516`
1024	`- type: precision_at_1000`
1025	`value: 0.23900000000000002`
1026	`- type: precision_at_3`
1027	`value: 16.008`
1028	`- type: precision_at_5`
1029	`value: 11.779`
1030	`- type: recall_at_1`
1031	`value: 24.501`
1032	`- type: recall_at_10`
1033	`value: 51.18899999999999`
1034	`- type: recall_at_100`
1035	`value: 78.437`
1036	`- type: recall_at_1000`
1037	`value: 92.842`
1038	`- type: recall_at_3`
1039	`value: 35.808`
1040	`- type: recall_at_5`
1041	`value: 42.197`
1042	`- task:`
1043	`type: Retrieval`
1044	`dataset:`
1045	`type: BeIR/cqadupstack`
1046	`name: MTEB CQADupstackWordpressRetrieval`
1047	`config: default`
1048	`split: test`
1049	`revision: None`
1050	`metrics:`
1051	`- type: map_at_1`
1052	`value: 22.039`
1053	`- type: map_at_10`
1054	`value: 30.377`
1055	`- type: map_at_100`
1056	`value: 31.275`
1057	`- type: map_at_1000`
1058	`value: 31.379`
1059	`- type: map_at_3`
1060	`value: 27.98`
1061	`- type: map_at_5`
1062	`value: 29.358`
1063	`- type: mrr_at_1`
1064	`value: 24.03`
1065	`- type: mrr_at_10`
1066	`value: 32.568000000000005`
1067	`- type: mrr_at_100`
1068	`value: 33.403`
1069	`- type: mrr_at_1000`
1070	`value: 33.475`
1071	`- type: mrr_at_3`
1072	`value: 30.436999999999998`
1073	`- type: mrr_at_5`
1074	`value: 31.796000000000003`
1075	`- type: ndcg_at_1`
1076	`value: 24.03`
1077	`- type: ndcg_at_10`
1078	`value: 35.198`
1079	`- type: ndcg_at_100`
1080	`value: 39.668`
1081	`- type: ndcg_at_1000`
1082	`value: 42.296`
1083	`- type: ndcg_at_3`
1084	`value: 30.709999999999997`
1085	`- type: ndcg_at_5`
1086	`value: 33.024`
1087	`- type: precision_at_1`
1088	`value: 24.03`
1089	`- type: precision_at_10`
1090	`value: 5.564`
1091	`- type: precision_at_100`
1092	`value: 0.828`
1093	`- type: precision_at_1000`
1094	`value: 0.117`
1095	`- type: precision_at_3`
1096	`value: 13.309000000000001`
1097	`- type: precision_at_5`
1098	`value: 9.39`
1099	`- type: recall_at_1`
1100	`value: 22.039`
1101	`- type: recall_at_10`
1102	`value: 47.746`
1103	`- type: recall_at_100`
1104	`value: 68.23599999999999`
1105	`- type: recall_at_1000`
1106	`value: 87.852`
1107	`- type: recall_at_3`
1108	`value: 35.852000000000004`
1109	`- type: recall_at_5`
1110	`value: 41.410000000000004`
1111	`- task:`
1112	`type: Retrieval`
1113	`dataset:`
1114	`type: climate-fever`
1115	`name: MTEB ClimateFEVER`
1116	`config: default`
1117	`split: test`
1118	`revision: None`
1119	`metrics:`
1120	`- type: map_at_1`
1121	`value: 15.692999999999998`
1122	`- type: map_at_10`
1123	`value: 26.903`
1124	`- type: map_at_100`
1125	`value: 28.987000000000002`
1126	`- type: map_at_1000`
1127	`value: 29.176999999999996`
1128	`- type: map_at_3`
1129	`value: 22.137`
1130	`- type: map_at_5`
1131	`value: 24.758`
1132	`- type: mrr_at_1`
1133	`value: 35.57`
1134	`- type: mrr_at_10`
1135	`value: 47.821999999999996`
1136	`- type: mrr_at_100`
1137	`value: 48.608000000000004`
1138	`- type: mrr_at_1000`
1139	`value: 48.638999999999996`
1140	`- type: mrr_at_3`
1141	`value: 44.452000000000005`
1142	`- type: mrr_at_5`
1143	`value: 46.546`
1144	`- type: ndcg_at_1`
1145	`value: 35.57`
1146	`- type: ndcg_at_10`
1147	`value: 36.567`
1148	`- type: ndcg_at_100`
1149	`value: 44.085`
1150	`- type: ndcg_at_1000`
1151	`value: 47.24`
1152	`- type: ndcg_at_3`
1153	`value: 29.964000000000002`
1154	`- type: ndcg_at_5`
1155	`value: 32.511`
1156	`- type: precision_at_1`
1157	`value: 35.57`
1158	`- type: precision_at_10`
1159	`value: 11.485`
1160	`- type: precision_at_100`
1161	`value: 1.9619999999999997`
1162	`- type: precision_at_1000`
1163	`value: 0.256`
1164	`- type: precision_at_3`
1165	`value: 22.237000000000002`
1166	`- type: precision_at_5`
1167	`value: 17.471999999999998`
1168	`- type: recall_at_1`
1169	`value: 15.692999999999998`
1170	`- type: recall_at_10`
1171	`value: 43.056`
1172	`- type: recall_at_100`
1173	`value: 68.628`
1174	`- type: recall_at_1000`
1175	`value: 86.075`
1176	`- type: recall_at_3`
1177	`value: 26.918999999999997`
1178	`- type: recall_at_5`
1179	`value: 34.14`
1180	`- task:`
1181	`type: Retrieval`
1182	`dataset:`
1183	`type: dbpedia-entity`
1184	`name: MTEB DBPedia`
1185	`config: default`
1186	`split: test`
1187	`revision: None`
1188	`metrics:`
1189	`- type: map_at_1`
1190	`value: 9.53`
1191	`- type: map_at_10`
1192	`value: 20.951`
1193	`- type: map_at_100`
1194	`value: 30.136000000000003`
1195	`- type: map_at_1000`
1196	`value: 31.801000000000002`
1197	`- type: map_at_3`
1198	`value: 15.021`
1199	`- type: map_at_5`
1200	`value: 17.471999999999998`
1201	`- type: mrr_at_1`
1202	`value: 71.0`
1203	`- type: mrr_at_10`
1204	`value: 79.176`
1205	`- type: mrr_at_100`
1206	`value: 79.418`
1207	`- type: mrr_at_1000`
1208	`value: 79.426`
1209	`- type: mrr_at_3`
1210	`value: 78.125`
1211	`- type: mrr_at_5`
1212	`value: 78.61200000000001`
1213	`- type: ndcg_at_1`
1214	`value: 58.5`
1215	`- type: ndcg_at_10`
1216	`value: 44.106`
1217	`- type: ndcg_at_100`
1218	`value: 49.268`
1219	`- type: ndcg_at_1000`
1220	`value: 56.711999999999996`
1221	`- type: ndcg_at_3`
1222	`value: 48.934`
1223	`- type: ndcg_at_5`
1224	`value: 45.826`
1225	`- type: precision_at_1`
1226	`value: 71.0`
1227	`- type: precision_at_10`
1228	`value: 35.0`
1229	`- type: precision_at_100`
1230	`value: 11.360000000000001`
1231	`- type: precision_at_1000`
1232	`value: 2.046`
1233	`- type: precision_at_3`
1234	`value: 52.833`
1235	`- type: precision_at_5`
1236	`value: 44.15`
1237	`- type: recall_at_1`
1238	`value: 9.53`
1239	`- type: recall_at_10`
1240	`value: 26.811`
1241	`- type: recall_at_100`
1242	`value: 55.916999999999994`
1243	`- type: recall_at_1000`
1244	`value: 79.973`
1245	`- type: recall_at_3`
1246	`value: 16.413`
1247	`- type: recall_at_5`
1248	`value: 19.980999999999998`
1249	`- task:`
1250	`type: Classification`
1251	`dataset:`
1252	`type: mteb/emotion`
1253	`name: MTEB EmotionClassification`
1254	`config: default`
1255	`split: test`
1256	`revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37`
1257	`metrics:`
1258	`- type: accuracy`
1259	`value: 51.519999999999996`
1260	`- type: f1`
1261	`value: 46.36601294761231`
1262	`- task:`
1263	`type: Retrieval`
1264	`dataset:`
1265	`type: fever`
1266	`name: MTEB FEVER`
1267	`config: default`
1268	`split: test`
1269	`revision: None`
1270	`metrics:`
1271	`- type: map_at_1`
1272	`value: 74.413`
1273	`- type: map_at_10`
1274	`value: 83.414`
1275	`- type: map_at_100`
1276	`value: 83.621`
1277	`- type: map_at_1000`
1278	`value: 83.635`
1279	`- type: map_at_3`
1280	`value: 82.337`
1281	`- type: map_at_5`
1282	`value: 83.039`
1283	`- type: mrr_at_1`
1284	`value: 80.19800000000001`
1285	`- type: mrr_at_10`
1286	`value: 87.715`
1287	`- type: mrr_at_100`
1288	`value: 87.778`
1289	`- type: mrr_at_1000`
1290	`value: 87.779`
1291	`- type: mrr_at_3`
1292	`value: 87.106`
1293	`- type: mrr_at_5`
1294	`value: 87.555`
1295	`- type: ndcg_at_1`
1296	`value: 80.19800000000001`
1297	`- type: ndcg_at_10`
1298	`value: 87.182`
1299	`- type: ndcg_at_100`
1300	`value: 87.90299999999999`
1301	`- type: ndcg_at_1000`
1302	`value: 88.143`
1303	`- type: ndcg_at_3`
1304	`value: 85.60600000000001`
1305	`- type: ndcg_at_5`
1306	`value: 86.541`
1307	`- type: precision_at_1`
1308	`value: 80.19800000000001`
1309	`- type: precision_at_10`
1310	`value: 10.531`
1311	`- type: precision_at_100`
1312	`value: 1.113`
1313	`- type: precision_at_1000`
1314	`value: 0.11499999999999999`
1315	`- type: precision_at_3`
1316	`value: 32.933`
1317	`- type: precision_at_5`
1318	`value: 20.429`
1319	`- type: recall_at_1`
1320	`value: 74.413`
1321	`- type: recall_at_10`
1322	`value: 94.363`
1323	`- type: recall_at_100`
1324	`value: 97.165`
1325	`- type: recall_at_1000`
1326	`value: 98.668`
1327	`- type: recall_at_3`
1328	`value: 90.108`
1329	`- type: recall_at_5`
1330	`value: 92.52`
1331	`- task:`
1332	`type: Retrieval`
1333	`dataset:`
1334	`type: fiqa`
1335	`name: MTEB FiQA2018`
1336	`config: default`
1337	`split: test`
1338	`revision: None`
1339	`metrics:`
1340	`- type: map_at_1`
1341	`value: 22.701`
1342	`- type: map_at_10`
1343	`value: 37.122`
1344	`- type: map_at_100`
1345	`value: 39.178000000000004`
1346	`- type: map_at_1000`
1347	`value: 39.326`
1348	`- type: map_at_3`
1349	`value: 32.971000000000004`
1350	`- type: map_at_5`
1351	`value: 35.332`
1352	`- type: mrr_at_1`
1353	`value: 44.753`
1354	`- type: mrr_at_10`
1355	`value: 53.452`
1356	`- type: mrr_at_100`
1357	`value: 54.198`
1358	`- type: mrr_at_1000`
1359	`value: 54.225`
1360	`- type: mrr_at_3`
1361	`value: 50.952`
1362	`- type: mrr_at_5`
1363	`value: 52.464`
1364	`- type: ndcg_at_1`
1365	`value: 44.753`
1366	`- type: ndcg_at_10`
1367	`value: 45.021`
1368	`- type: ndcg_at_100`
1369	`value: 52.028`
1370	`- type: ndcg_at_1000`
1371	`value: 54.596000000000004`
1372	`- type: ndcg_at_3`
1373	`value: 41.622`
1374	`- type: ndcg_at_5`
1375	`value: 42.736000000000004`
1376	`- type: precision_at_1`
1377	`value: 44.753`
1378	`- type: precision_at_10`
1379	`value: 12.284`
1380	`- type: precision_at_100`
1381	`value: 1.955`
1382	`- type: precision_at_1000`
1383	`value: 0.243`
1384	`- type: precision_at_3`
1385	`value: 27.828999999999997`
1386	`- type: precision_at_5`
1387	`value: 20.061999999999998`
1388	`- type: recall_at_1`
1389	`value: 22.701`
1390	`- type: recall_at_10`
1391	`value: 51.432`
1392	`- type: recall_at_100`
1393	`value: 77.009`
1394	`- type: recall_at_1000`
1395	`value: 92.511`
1396	`- type: recall_at_3`
1397	`value: 37.919000000000004`
1398	`- type: recall_at_5`
1399	`value: 44.131`
1400	`- task:`
1401	`type: Retrieval`
1402	`dataset:`
1403	`type: hotpotqa`
1404	`name: MTEB HotpotQA`
1405	`config: default`
1406	`split: test`
1407	`revision: None`
1408	`metrics:`
1409	`- type: map_at_1`
1410	`value: 40.189`
1411	`- type: map_at_10`
1412	`value: 66.24600000000001`
1413	`- type: map_at_100`
1414	`value: 67.098`
1415	`- type: map_at_1000`
1416	`value: 67.149`
1417	`- type: map_at_3`
1418	`value: 62.684`
1419	`- type: map_at_5`
1420	`value: 64.974`
1421	`- type: mrr_at_1`
1422	`value: 80.378`
1423	`- type: mrr_at_10`
1424	`value: 86.127`
1425	`- type: mrr_at_100`
1426	`value: 86.29299999999999`
1427	`- type: mrr_at_1000`
1428	`value: 86.297`
1429	`- type: mrr_at_3`
1430	`value: 85.31400000000001`
1431	`- type: mrr_at_5`
1432	`value: 85.858`
1433	`- type: ndcg_at_1`
1434	`value: 80.378`
1435	`- type: ndcg_at_10`
1436	`value: 74.101`
1437	`- type: ndcg_at_100`
1438	`value: 76.993`
1439	`- type: ndcg_at_1000`
1440	`value: 77.948`
1441	`- type: ndcg_at_3`
1442	`value: 69.232`
1443	`- type: ndcg_at_5`
1444	`value: 72.04599999999999`
1445	`- type: precision_at_1`
1446	`value: 80.378`
1447	`- type: precision_at_10`
1448	`value: 15.595999999999998`
1449	`- type: precision_at_100`
1450	`value: 1.7840000000000003`
1451	`- type: precision_at_1000`
1452	`value: 0.191`
1453	`- type: precision_at_3`
1454	`value: 44.884`
1455	`- type: precision_at_5`
1456	`value: 29.145`
1457	`- type: recall_at_1`
1458	`value: 40.189`
1459	`- type: recall_at_10`
1460	`value: 77.981`
1461	`- type: recall_at_100`
1462	`value: 89.21`
1463	`- type: recall_at_1000`
1464	`value: 95.48299999999999`
1465	`- type: recall_at_3`
1466	`value: 67.326`
1467	`- type: recall_at_5`
1468	`value: 72.863`
1469	`- task:`
1470	`type: Classification`
1471	`dataset:`
1472	`type: mteb/imdb`
1473	`name: MTEB ImdbClassification`
1474	`config: default`
1475	`split: test`
1476	`revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7`
1477	`metrics:`
1478	`- type: accuracy`
1479	`value: 92.84599999999999`
1480	`- type: ap`
1481	`value: 89.4710787567357`
1482	`- type: f1`
1483	`value: 92.83752676932258`
1484	`- task:`
1485	`type: Retrieval`
1486	`dataset:`
1487	`type: msmarco`
1488	`name: MTEB MSMARCO`
1489	`config: default`
1490	`split: dev`
1491	`revision: None`
1492	`metrics:`
1493	`- type: map_at_1`
1494	`value: 23.132`
1495	`- type: map_at_10`
1496	`value: 35.543`
1497	`- type: map_at_100`
1498	`value: 36.702`
1499	`- type: map_at_1000`
1500	`value: 36.748999999999995`
1501	`- type: map_at_3`
1502	`value: 31.737`
1503	`- type: map_at_5`
1504	`value: 33.927`
1505	`- type: mrr_at_1`
1506	`value: 23.782`
1507	`- type: mrr_at_10`
1508	`value: 36.204`
1509	`- type: mrr_at_100`
1510	`value: 37.29`
1511	`- type: mrr_at_1000`
1512	`value: 37.330999999999996`
1513	`- type: mrr_at_3`
1514	`value: 32.458999999999996`
1515	`- type: mrr_at_5`
1516	`value: 34.631`
1517	`- type: ndcg_at_1`
1518	`value: 23.782`
1519	`- type: ndcg_at_10`
1520	`value: 42.492999999999995`
1521	`- type: ndcg_at_100`
1522	`value: 47.985`
1523	`- type: ndcg_at_1000`
1524	`value: 49.141`
1525	`- type: ndcg_at_3`
1526	`value: 34.748000000000005`
1527	`- type: ndcg_at_5`
1528	`value: 38.651`
1529	`- type: precision_at_1`
1530	`value: 23.782`
1531	`- type: precision_at_10`
1532	`value: 6.665`
1533	`- type: precision_at_100`
1534	`value: 0.941`
1535	`- type: precision_at_1000`
1536	`value: 0.104`
1537	`- type: precision_at_3`
1538	`value: 14.776`
1539	`- type: precision_at_5`
1540	`value: 10.84`
1541	`- type: recall_at_1`
1542	`value: 23.132`
1543	`- type: recall_at_10`
1544	`value: 63.794`
1545	`- type: recall_at_100`
1546	`value: 89.027`
1547	`- type: recall_at_1000`
1548	`value: 97.807`
1549	`- type: recall_at_3`
1550	`value: 42.765`
1551	`- type: recall_at_5`
1552	`value: 52.11`
1553	`- task:`
1554	`type: Classification`
1555	`dataset:`
1556	`type: mteb/mtop_domain`
1557	`name: MTEB MTOPDomainClassification (en)`
1558	`config: en`
1559	`split: test`
1560	`revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf`
1561	`metrics:`
1562	`- type: accuracy`
1563	`value: 94.59188326493388`
1564	`- type: f1`
1565	`value: 94.3842594786827`
1566	`- task:`
1567	`type: Classification`
1568	`dataset:`
1569	`type: mteb/mtop_intent`
1570	`name: MTEB MTOPIntentClassification (en)`
1571	`config: en`
1572	`split: test`
1573	`revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba`
1574	`metrics:`
1575	`- type: accuracy`
1576	`value: 79.49384404924761`
1577	`- type: f1`
1578	`value: 59.7580539534629`
1579	`- task:`
1580	`type: Classification`
1581	`dataset:`
1582	`type: mteb/amazon_massive_intent`
1583	`name: MTEB MassiveIntentClassification (en)`
1584	`config: en`
1585	`split: test`
1586	`revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7`
1587	`metrics:`
1588	`- type: accuracy`
1589	`value: 77.56220578345663`
1590	`- type: f1`
1591	`value: 75.27228165561478`
1592	`- task:`
1593	`type: Classification`
1594	`dataset:`
1595	`type: mteb/amazon_massive_scenario`
1596	`name: MTEB MassiveScenarioClassification (en)`
1597	`config: en`
1598	`split: test`
1599	`revision: 7d571f92784cd94a019292a1f45445077d0ef634`
1600	`metrics:`
1601	`- type: accuracy`
1602	`value: 80.53463349024884`
1603	`- type: f1`
1604	`value: 80.4893958236536`
1605	`- task:`
1606	`type: Clustering`
1607	`dataset:`
1608	`type: mteb/medrxiv-clustering-p2p`
1609	`name: MTEB MedrxivClusteringP2P`
1610	`config: default`
1611	`split: test`
1612	`revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73`
1613	`metrics:`
1614	`- type: v_measure`
1615	`value: 32.56100273484962`
1616	`- task:`
1617	`type: Clustering`
1618	`dataset:`
1619	`type: mteb/medrxiv-clustering-s2s`
1620	`name: MTEB MedrxivClusteringS2S`
1621	`config: default`
1622	`split: test`
1623	`revision: 35191c8c0dca72d8ff3efcd72aa802307d469663`
1624	`metrics:`
1625	`- type: v_measure`
1626	`value: 31.470380028839607`
1627	`- task:`
1628	`type: Reranking`
1629	`dataset:`
1630	`type: mteb/mind_small`
1631	`name: MTEB MindSmallReranking`
1632	`config: default`
1633	`split: test`
1634	`revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69`
1635	`metrics:`
1636	`- type: map`
1637	`value: 32.06102792457849`
1638	`- type: mrr`
1639	`value: 33.30709199672238`
1640	`- task:`
1641	`type: Retrieval`
1642	`dataset:`
1643	`type: nfcorpus`
1644	`name: MTEB NFCorpus`
1645	`config: default`
1646	`split: test`
1647	`revision: None`
1648	`metrics:`
1649	`- type: map_at_1`
1650	`value: 6.776999999999999`
1651	`- type: map_at_10`
1652	`value: 14.924000000000001`
1653	`- type: map_at_100`
1654	`value: 18.955`
1655	`- type: map_at_1000`
1656	`value: 20.538999999999998`
1657	`- type: map_at_3`
1658	`value: 10.982`
1659	`- type: map_at_5`
1660	`value: 12.679000000000002`
1661	`- type: mrr_at_1`
1662	`value: 47.988`
1663	`- type: mrr_at_10`
1664	`value: 57.232000000000006`
1665	`- type: mrr_at_100`
1666	`value: 57.818999999999996`
1667	`- type: mrr_at_1000`
1668	`value: 57.847`
1669	`- type: mrr_at_3`
1670	`value: 54.901999999999994`
1671	`- type: mrr_at_5`
1672	`value: 56.481`
1673	`- type: ndcg_at_1`
1674	`value: 46.594`
1675	`- type: ndcg_at_10`
1676	`value: 38.129000000000005`
1677	`- type: ndcg_at_100`
1678	`value: 35.54`
1679	`- type: ndcg_at_1000`
1680	`value: 44.172`
1681	`- type: ndcg_at_3`
1682	`value: 43.025999999999996`
1683	`- type: ndcg_at_5`
1684	`value: 41.052`
1685	`- type: precision_at_1`
1686	`value: 47.988`
1687	`- type: precision_at_10`
1688	`value: 28.111000000000004`
1689	`- type: precision_at_100`
1690	`value: 8.929`
1691	`- type: precision_at_1000`
1692	`value: 2.185`
1693	`- type: precision_at_3`
1694	`value: 40.144000000000005`
1695	`- type: precision_at_5`
1696	`value: 35.232`
1697	`- type: recall_at_1`
1698	`value: 6.776999999999999`
1699	`- type: recall_at_10`
1700	`value: 19.289`
1701	`- type: recall_at_100`
1702	`value: 36.359`
1703	`- type: recall_at_1000`
1704	`value: 67.54`
1705	`- type: recall_at_3`
1706	`value: 11.869`
1707	`- type: recall_at_5`
1708	`value: 14.999`
1709	`- task:`
1710	`type: Retrieval`
1711	`dataset:`
1712	`type: nq`
1713	`name: MTEB NQ`
1714	`config: default`
1715	`split: test`
1716	`revision: None`
1717	`metrics:`
1718	`- type: map_at_1`
1719	`value: 31.108000000000004`
1720	`- type: map_at_10`
1721	`value: 47.126000000000005`
1722	`- type: map_at_100`
1723	`value: 48.171`
1724	`- type: map_at_1000`
1725	`value: 48.199`
1726	`- type: map_at_3`
1727	`value: 42.734`
1728	`- type: map_at_5`
1729	`value: 45.362`
1730	`- type: mrr_at_1`
1731	`value: 34.936`
1732	`- type: mrr_at_10`
1733	`value: 49.571`
1734	`- type: mrr_at_100`
1735	`value: 50.345`
1736	`- type: mrr_at_1000`
1737	`value: 50.363`
1738	`- type: mrr_at_3`
1739	`value: 45.959`
1740	`- type: mrr_at_5`
1741	`value: 48.165`
1742	`- type: ndcg_at_1`
1743	`value: 34.936`
1744	`- type: ndcg_at_10`
1745	`value: 55.028999999999996`
1746	`- type: ndcg_at_100`
1747	`value: 59.244`
1748	`- type: ndcg_at_1000`
1749	`value: 59.861`
1750	`- type: ndcg_at_3`
1751	`value: 46.872`
1752	`- type: ndcg_at_5`
1753	`value: 51.217999999999996`
1754	`- type: precision_at_1`
1755	`value: 34.936`
1756	`- type: precision_at_10`
1757	`value: 9.099`
1758	`- type: precision_at_100`
1759	`value: 1.145`
1760	`- type: precision_at_1000`
1761	`value: 0.12`
1762	`- type: precision_at_3`
1763	`value: 21.456`
1764	`- type: precision_at_5`
1765	`value: 15.411`
1766	`- type: recall_at_1`
1767	`value: 31.108000000000004`
1768	`- type: recall_at_10`
1769	`value: 76.53999999999999`
1770	`- type: recall_at_100`
1771	`value: 94.39`
1772	`- type: recall_at_1000`
1773	`value: 98.947`
1774	`- type: recall_at_3`
1775	`value: 55.572`
1776	`- type: recall_at_5`
1777	`value: 65.525`
1778	`- task:`
1779	`type: Retrieval`
1780	`dataset:`
1781	`type: quora`
1782	`name: MTEB QuoraRetrieval`
1783	`config: default`
1784	`split: test`
1785	`revision: None`
1786	`metrics:`
1787	`- type: map_at_1`
1788	`value: 71.56400000000001`
1789	`- type: map_at_10`
1790	`value: 85.482`
1791	`- type: map_at_100`
1792	`value: 86.114`
1793	`- type: map_at_1000`
1794	`value: 86.13`
1795	`- type: map_at_3`
1796	`value: 82.607`
1797	`- type: map_at_5`
1798	`value: 84.405`
1799	`- type: mrr_at_1`
1800	`value: 82.42`
1801	`- type: mrr_at_10`
1802	`value: 88.304`
1803	`- type: mrr_at_100`
1804	`value: 88.399`
1805	`- type: mrr_at_1000`
1806	`value: 88.399`
1807	`- type: mrr_at_3`
1808	`value: 87.37`
1809	`- type: mrr_at_5`
1810	`value: 88.024`
1811	`- type: ndcg_at_1`
1812	`value: 82.45`
1813	`- type: ndcg_at_10`
1814	`value: 89.06500000000001`
1815	`- type: ndcg_at_100`
1816	`value: 90.232`
1817	`- type: ndcg_at_1000`
1818	`value: 90.305`
1819	`- type: ndcg_at_3`
1820	`value: 86.375`
1821	`- type: ndcg_at_5`
1822	`value: 87.85300000000001`
1823	`- type: precision_at_1`
1824	`value: 82.45`
1825	`- type: precision_at_10`
1826	`value: 13.486999999999998`
1827	`- type: precision_at_100`
1828	`value: 1.534`
1829	`- type: precision_at_1000`
1830	`value: 0.157`
1831	`- type: precision_at_3`
1832	`value: 37.813`
1833	`- type: precision_at_5`
1834	`value: 24.773999999999997`
1835	`- type: recall_at_1`
1836	`value: 71.56400000000001`
1837	`- type: recall_at_10`
1838	`value: 95.812`
1839	`- type: recall_at_100`
1840	`value: 99.7`
1841	`- type: recall_at_1000`
1842	`value: 99.979`
1843	`- type: recall_at_3`
1844	`value: 87.966`
1845	`- type: recall_at_5`
1846	`value: 92.268`
1847	`- task:`
1848	`type: Clustering`
1849	`dataset:`
1850	`type: mteb/reddit-clustering`
1851	`name: MTEB RedditClustering`
1852	`config: default`
1853	`split: test`
1854	`revision: 24640382cdbf8abc73003fb0fa6d111a705499eb`
1855	`metrics:`
1856	`- type: v_measure`
1857	`value: 57.241876648614145`
1858	`- task:`
1859	`type: Clustering`
1860	`dataset:`
1861	`type: mteb/reddit-clustering-p2p`
1862	`name: MTEB RedditClusteringP2P`
1863	`config: default`
1864	`split: test`
1865	`revision: 282350215ef01743dc01b456c7f5241fa8937f16`
1866	`metrics:`
1867	`- type: v_measure`
1868	`value: 64.66212576446223`
1869	`- task:`
1870	`type: Retrieval`
1871	`dataset:`
1872	`type: scidocs`
1873	`name: MTEB SCIDOCS`
1874	`config: default`
1875	`split: test`
1876	`revision: None`
1877	`metrics:`
1878	`- type: map_at_1`
1879	`value: 5.308`
1880	`- type: map_at_10`
1881	`value: 13.803`
1882	`- type: map_at_100`
1883	`value: 16.176`
1884	`- type: map_at_1000`
1885	`value: 16.561`
1886	`- type: map_at_3`
1887	`value: 9.761000000000001`
1888	`- type: map_at_5`
1889	`value: 11.802`
1890	`- type: mrr_at_1`
1891	`value: 26.200000000000003`
1892	`- type: mrr_at_10`
1893	`value: 37.621`
1894	`- type: mrr_at_100`
1895	`value: 38.767`
1896	`- type: mrr_at_1000`
1897	`value: 38.815`
1898	`- type: mrr_at_3`
1899	`value: 34.117`
1900	`- type: mrr_at_5`
1901	`value: 36.107`
1902	`- type: ndcg_at_1`
1903	`value: 26.200000000000003`
1904	`- type: ndcg_at_10`
1905	`value: 22.64`
1906	`- type: ndcg_at_100`
1907	`value: 31.567`
1908	`- type: ndcg_at_1000`
1909	`value: 37.623`
1910	`- type: ndcg_at_3`
1911	`value: 21.435000000000002`
1912	`- type: ndcg_at_5`
1913	`value: 18.87`
1914	`- type: precision_at_1`
1915	`value: 26.200000000000003`
1916	`- type: precision_at_10`
1917	`value: 11.74`
1918	`- type: precision_at_100`
1919	`value: 2.465`
1920	`- type: precision_at_1000`
1921	`value: 0.391`
1922	`- type: precision_at_3`
1923	`value: 20.033`
1924	`- type: precision_at_5`
1925	`value: 16.64`
1926	`- type: recall_at_1`
1927	`value: 5.308`
1928	`- type: recall_at_10`
1929	`value: 23.794999999999998`
1930	`- type: recall_at_100`
1931	`value: 50.015`
1932	`- type: recall_at_1000`
1933	`value: 79.283`
1934	`- type: recall_at_3`
1935	`value: 12.178`
1936	`- type: recall_at_5`
1937	`value: 16.882`
1938	`- task:`
1939	`type: STS`
1940	`dataset:`
1941	`type: mteb/sickr-sts`
1942	`name: MTEB SICK-R`
1943	`config: default`
1944	`split: test`
1945	`revision: a6ea5a8cab320b040a23452cc28066d9beae2cee`
1946	`metrics:`
1947	`- type: cos_sim_pearson`
1948	`value: 84.93231134675553`
1949	`- type: cos_sim_spearman`
1950	`value: 81.68319292603205`
1951	`- type: euclidean_pearson`
1952	`value: 81.8396814380367`
1953	`- type: euclidean_spearman`
1954	`value: 81.24641903349945`
1955	`- type: manhattan_pearson`
1956	`value: 81.84698799204274`
1957	`- type: manhattan_spearman`
1958	`value: 81.24269997904105`
1959	`- task:`
1960	`type: STS`
1961	`dataset:`
1962	`type: mteb/sts12-sts`
1963	`name: MTEB STS12`
1964	`config: default`
1965	`split: test`
1966	`revision: a0d554a64d88156834ff5ae9920b964011b16384`
1967	`metrics:`
1968	`- type: cos_sim_pearson`
1969	`value: 86.73241671587446`
1970	`- type: cos_sim_spearman`
1971	`value: 79.05091082971826`
1972	`- type: euclidean_pearson`
1973	`value: 83.91146869578044`
1974	`- type: euclidean_spearman`
1975	`value: 79.87978465370936`
1976	`- type: manhattan_pearson`
1977	`value: 83.90888338917678`
1978	`- type: manhattan_spearman`
1979	`value: 79.87482848584241`
1980	`- task:`
1981	`type: STS`
1982	`dataset:`
1983	`type: mteb/sts13-sts`
1984	`name: MTEB STS13`
1985	`config: default`
1986	`split: test`
1987	`revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca`
1988	`metrics:`
1989	`- type: cos_sim_pearson`
1990	`value: 85.14970731146177`
1991	`- type: cos_sim_spearman`
1992	`value: 86.37363490084627`
1993	`- type: euclidean_pearson`
1994	`value: 83.02154218530433`
1995	`- type: euclidean_spearman`
1996	`value: 83.80258761957367`
1997	`- type: manhattan_pearson`
1998	`value: 83.01664495119347`
1999	`- type: manhattan_spearman`
2000	`value: 83.77567458007952`
2001	`- task:`
2002	`type: STS`
2003	`dataset:`
2004	`type: mteb/sts14-sts`
2005	`name: MTEB STS14`
2006	`config: default`
2007	`split: test`
2008	`revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375`
2009	`metrics:`
2010	`- type: cos_sim_pearson`
2011	`value: 83.40474139886784`
2012	`- type: cos_sim_spearman`
2013	`value: 82.77768789165984`
2014	`- type: euclidean_pearson`
2015	`value: 80.7065877443695`
2016	`- type: euclidean_spearman`
2017	`value: 81.375940662505`
2018	`- type: manhattan_pearson`
2019	`value: 80.6507552270278`
2020	`- type: manhattan_spearman`
2021	`value: 81.32782179098741`
2022	`- task:`
2023	`type: STS`
2024	`dataset:`
2025	`type: mteb/sts15-sts`
2026	`name: MTEB STS15`
2027	`config: default`
2028	`split: test`
2029	`revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3`
2030	`metrics:`
2031	`- type: cos_sim_pearson`
2032	`value: 87.08585968722274`
2033	`- type: cos_sim_spearman`
2034	`value: 88.03110031451399`
2035	`- type: euclidean_pearson`
2036	`value: 85.74012019602384`
2037	`- type: euclidean_spearman`
2038	`value: 86.13592849438209`
2039	`- type: manhattan_pearson`
2040	`value: 85.74404842369206`
2041	`- type: manhattan_spearman`
2042	`value: 86.14492318960154`
2043	`- task:`
2044	`type: STS`
2045	`dataset:`
2046	`type: mteb/sts16-sts`
2047	`name: MTEB STS16`
2048	`config: default`
2049	`split: test`
2050	`revision: 4d8694f8f0e0100860b497b999b3dbed754a0513`
2051	`metrics:`
2052	`- type: cos_sim_pearson`
2053	`value: 84.95069052788875`
2054	`- type: cos_sim_spearman`
2055	`value: 86.4867991595147`
2056	`- type: euclidean_pearson`
2057	`value: 84.31013325754635`
2058	`- type: euclidean_spearman`
2059	`value: 85.01529258006482`
2060	`- type: manhattan_pearson`
2061	`value: 84.26995570085374`
2062	`- type: manhattan_spearman`
2063	`value: 84.96982104986162`
2064	`- task:`
2065	`type: STS`
2066	`dataset:`
2067	`type: mteb/sts17-crosslingual-sts`
2068	`name: MTEB STS17 (en-en)`
2069	`config: en-en`
2070	`split: test`
2071	`revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d`
2072	`metrics:`
2073	`- type: cos_sim_pearson`
2074	`value: 87.54617647971897`
2075	`- type: cos_sim_spearman`
2076	`value: 87.49834181751034`
2077	`- type: euclidean_pearson`
2078	`value: 86.01015322577122`
2079	`- type: euclidean_spearman`
2080	`value: 84.63362652063199`
2081	`- type: manhattan_pearson`
2082	`value: 86.13807574475706`
2083	`- type: manhattan_spearman`
2084	`value: 84.7772370721132`
2085	`- task:`
2086	`type: STS`
2087	`dataset:`
2088	`type: mteb/sts22-crosslingual-sts`
2089	`name: MTEB STS22 (en)`
2090	`config: en`
2091	`split: test`
2092	`revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80`
2093	`metrics:`
2094	`- type: cos_sim_pearson`
2095	`value: 67.20047755786615`
2096	`- type: cos_sim_spearman`
2097	`value: 67.05324077987636`
2098	`- type: euclidean_pearson`
2099	`value: 66.91930642976601`
2100	`- type: euclidean_spearman`
2101	`value: 65.21491856099105`
2102	`- type: manhattan_pearson`
2103	`value: 66.78756851976624`
2104	`- type: manhattan_spearman`
2105	`value: 65.12356257740728`
2106	`- task:`
2107	`type: STS`
2108	`dataset:`
2109	`type: mteb/stsbenchmark-sts`
2110	`name: MTEB STSBenchmark`
2111	`config: default`
2112	`split: test`
2113	`revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831`
2114	`metrics:`
2115	`- type: cos_sim_pearson`
2116	`value: 86.19852871539686`
2117	`- type: cos_sim_spearman`
2118	`value: 87.5161895296395`
2119	`- type: euclidean_pearson`
2120	`value: 84.59848645207485`
2121	`- type: euclidean_spearman`
2122	`value: 85.26427328757919`
2123	`- type: manhattan_pearson`
2124	`value: 84.59747366996524`
2125	`- type: manhattan_spearman`
2126	`value: 85.24045855146915`
2127	`- task:`
2128	`type: Reranking`
2129	`dataset:`
2130	`type: mteb/scidocs-reranking`
2131	`name: MTEB SciDocsRR`
2132	`config: default`
2133	`split: test`
2134	`revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab`
2135	`metrics:`
2136	`- type: map`
2137	`value: 87.63320317811032`
2138	`- type: mrr`
2139	`value: 96.26242947321379`
2140	`- task:`
2141	`type: Retrieval`
2142	`dataset:`
2143	`type: scifact`
2144	`name: MTEB SciFact`
2145	`config: default`
2146	`split: test`
2147	`revision: None`
2148	`metrics:`
2149	`- type: map_at_1`
2150	`value: 60.928000000000004`
2151	`- type: map_at_10`
2152	`value: 70.112`
2153	`- type: map_at_100`
2154	`value: 70.59299999999999`
2155	`- type: map_at_1000`
2156	`value: 70.623`
2157	`- type: map_at_3`
2158	`value: 66.846`
2159	`- type: map_at_5`
2160	`value: 68.447`
2161	`- type: mrr_at_1`
2162	`value: 64.0`
2163	`- type: mrr_at_10`
2164	`value: 71.212`
2165	`- type: mrr_at_100`
2166	`value: 71.616`
2167	`- type: mrr_at_1000`
2168	`value: 71.64500000000001`
2169	`- type: mrr_at_3`
2170	`value: 68.77799999999999`
2171	`- type: mrr_at_5`
2172	`value: 70.094`
2173	`- type: ndcg_at_1`
2174	`value: 64.0`
2175	`- type: ndcg_at_10`
2176	`value: 74.607`
2177	`- type: ndcg_at_100`
2178	`value: 76.416`
2179	`- type: ndcg_at_1000`
2180	`value: 77.102`
2181	`- type: ndcg_at_3`
2182	`value: 69.126`
2183	`- type: ndcg_at_5`
2184	`value: 71.41300000000001`
2185	`- type: precision_at_1`
2186	`value: 64.0`
2187	`- type: precision_at_10`
2188	`value: 9.933`
2189	`- type: precision_at_100`
2190	`value: 1.077`
2191	`- type: precision_at_1000`
2192	`value: 0.11299999999999999`
2193	`- type: precision_at_3`
2194	`value: 26.556`
2195	`- type: precision_at_5`
2196	`value: 17.467`
2197	`- type: recall_at_1`
2198	`value: 60.928000000000004`
2199	`- type: recall_at_10`
2200	`value: 87.322`
2201	`- type: recall_at_100`
2202	`value: 94.833`
2203	`- type: recall_at_1000`
2204	`value: 100.0`
2205	`- type: recall_at_3`
2206	`value: 72.628`
2207	`- type: recall_at_5`
2208	`value: 78.428`
2209	`- task:`
2210	`type: PairClassification`
2211	`dataset:`
2212	`type: mteb/sprintduplicatequestions-pairclassification`
2213	`name: MTEB SprintDuplicateQuestions`
2214	`config: default`
2215	`split: test`
2216	`revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46`
2217	`metrics:`
2218	`- type: cos_sim_accuracy`
2219	`value: 99.86237623762376`
2220	`- type: cos_sim_ap`
2221	`value: 96.72586477206649`
2222	`- type: cos_sim_f1`
2223	`value: 93.01858362631845`
2224	`- type: cos_sim_precision`
2225	`value: 93.4409687184662`
2226	`- type: cos_sim_recall`
2227	`value: 92.60000000000001`
2228	`- type: dot_accuracy`
2229	`value: 99.78019801980199`
2230	`- type: dot_ap`
2231	`value: 93.72748205246228`
2232	`- type: dot_f1`
2233	`value: 89.04109589041096`
2234	`- type: dot_precision`
2235	`value: 87.16475095785441`
2236	`- type: dot_recall`
2237	`value: 91.0`
2238	`- type: euclidean_accuracy`
2239	`value: 99.85445544554456`
2240	`- type: euclidean_ap`
2241	`value: 96.6661459876145`
2242	`- type: euclidean_f1`
2243	`value: 92.58337481333997`
2244	`- type: euclidean_precision`
2245	`value: 92.17046580773042`
2246	`- type: euclidean_recall`
2247	`value: 93.0`
2248	`- type: manhattan_accuracy`
2249	`value: 99.85445544554456`
2250	`- type: manhattan_ap`
2251	`value: 96.6883549244056`
2252	`- type: manhattan_f1`
2253	`value: 92.57598405580468`
2254	`- type: manhattan_precision`
2255	`value: 92.25422045680239`
2256	`- type: manhattan_recall`
2257	`value: 92.9`
2258	`- type: max_accuracy`
2259	`value: 99.86237623762376`
2260	`- type: max_ap`
2261	`value: 96.72586477206649`
2262	`- type: max_f1`
2263	`value: 93.01858362631845`
2264	`- task:`
2265	`type: Clustering`
2266	`dataset:`
2267	`type: mteb/stackexchange-clustering`
2268	`name: MTEB StackExchangeClustering`
2269	`config: default`
2270	`split: test`
2271	`revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259`
2272	`metrics:`
2273	`- type: v_measure`
2274	`value: 66.39930057069995`
2275	`- task:`
2276	`type: Clustering`
2277	`dataset:`
2278	`type: mteb/stackexchange-clustering-p2p`
2279	`name: MTEB StackExchangeClusteringP2P`
2280	`config: default`
2281	`split: test`
2282	`revision: 815ca46b2622cec33ccafc3735d572c266efdb44`
2283	`metrics:`
2284	`- type: v_measure`
2285	`value: 34.96398659903402`
2286	`- task:`
2287	`type: Reranking`
2288	`dataset:`
2289	`type: mteb/stackoverflowdupquestions-reranking`
2290	`name: MTEB StackOverflowDupQuestions`
2291	`config: default`
2292	`split: test`
2293	`revision: e185fbe320c72810689fc5848eb6114e1ef5ec69`
2294	`metrics:`
2295	`- type: map`
2296	`value: 55.946944700355395`
2297	`- type: mrr`
2298	`value: 56.97151398438164`
2299	`- task:`
2300	`type: Summarization`
2301	`dataset:`
2302	`type: mteb/summeval`
2303	`name: MTEB SummEval`
2304	`config: default`
2305	`split: test`
2306	`revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c`
2307	`metrics:`
2308	`- type: cos_sim_pearson`
2309	`value: 31.541657650692905`
2310	`- type: cos_sim_spearman`
2311	`value: 31.605804192286303`
2312	`- type: dot_pearson`
2313	`value: 28.26905996736398`
2314	`- type: dot_spearman`
2315	`value: 27.864801765851187`
2316	`- task:`
2317	`type: Retrieval`
2318	`dataset:`
2319	`type: trec-covid`
2320	`name: MTEB TRECCOVID`
2321	`config: default`
2322	`split: test`
2323	`revision: None`
2324	`metrics:`
2325	`- type: map_at_1`
2326	`value: 0.22599999999999998`
2327	`- type: map_at_10`
2328	`value: 1.8870000000000002`
2329	`- type: map_at_100`
2330	`value: 9.78`
2331	`- type: map_at_1000`
2332	`value: 22.514`
2333	`- type: map_at_3`
2334	`value: 0.6669999999999999`
2335	`- type: map_at_5`
2336	`value: 1.077`
2337	`- type: mrr_at_1`
2338	`value: 82.0`
2339	`- type: mrr_at_10`
2340	`value: 89.86699999999999`
2341	`- type: mrr_at_100`
2342	`value: 89.86699999999999`
2343	`- type: mrr_at_1000`
2344	`value: 89.86699999999999`
2345	`- type: mrr_at_3`
2346	`value: 89.667`
2347	`- type: mrr_at_5`
2348	`value: 89.667`
2349	`- type: ndcg_at_1`
2350	`value: 79.0`
2351	`- type: ndcg_at_10`
2352	`value: 74.818`
2353	`- type: ndcg_at_100`
2354	`value: 53.715999999999994`
2355	`- type: ndcg_at_1000`
2356	`value: 47.082`
2357	`- type: ndcg_at_3`
2358	`value: 82.134`
2359	`- type: ndcg_at_5`
2360	`value: 79.81899999999999`
2361	`- type: precision_at_1`
2362	`value: 82.0`
2363	`- type: precision_at_10`
2364	`value: 78.0`
2365	`- type: precision_at_100`
2366	`value: 54.48`
2367	`- type: precision_at_1000`
2368	`value: 20.518`
2369	`- type: precision_at_3`
2370	`value: 87.333`
2371	`- type: precision_at_5`
2372	`value: 85.2`
2373	`- type: recall_at_1`
2374	`value: 0.22599999999999998`
2375	`- type: recall_at_10`
2376	`value: 2.072`
2377	`- type: recall_at_100`
2378	`value: 13.013`
2379	`- type: recall_at_1000`
2380	`value: 43.462`
2381	`- type: recall_at_3`
2382	`value: 0.695`
2383	`- type: recall_at_5`
2384	`value: 1.139`
2385	`- task:`
2386	`type: Retrieval`
2387	`dataset:`
2388	`type: webis-touche2020`
2389	`name: MTEB Touche2020`
2390	`config: default`
2391	`split: test`
2392	`revision: None`
2393	`metrics:`
2394	`- type: map_at_1`
2395	`value: 2.328`
2396	`- type: map_at_10`
2397	`value: 9.795`
2398	`- type: map_at_100`
2399	`value: 15.801000000000002`
2400	`- type: map_at_1000`
2401	`value: 17.23`
2402	`- type: map_at_3`
2403	`value: 4.734`
2404	`- type: map_at_5`
2405	`value: 6.644`
2406	`- type: mrr_at_1`
2407	`value: 30.612000000000002`
2408	`- type: mrr_at_10`
2409	`value: 46.902`
2410	`- type: mrr_at_100`
2411	`value: 47.495`
2412	`- type: mrr_at_1000`
2413	`value: 47.495`
2414	`- type: mrr_at_3`
2415	`value: 41.156`
2416	`- type: mrr_at_5`
2417	`value: 44.218`
2418	`- type: ndcg_at_1`
2419	`value: 28.571`
2420	`- type: ndcg_at_10`
2421	`value: 24.806`
2422	`- type: ndcg_at_100`
2423	`value: 36.419000000000004`
2424	`- type: ndcg_at_1000`
2425	`value: 47.272999999999996`
2426	`- type: ndcg_at_3`
2427	`value: 25.666`
2428	`- type: ndcg_at_5`
2429	`value: 25.448999999999998`
2430	`- type: precision_at_1`
2431	`value: 30.612000000000002`
2432	`- type: precision_at_10`
2433	`value: 23.061`
2434	`- type: precision_at_100`
2435	`value: 7.714`
2436	`- type: precision_at_1000`
2437	`value: 1.484`
2438	`- type: precision_at_3`
2439	`value: 26.531`
2440	`- type: precision_at_5`
2441	`value: 26.122`
2442	`- type: recall_at_1`
2443	`value: 2.328`
2444	`- type: recall_at_10`
2445	`value: 16.524`
2446	`- type: recall_at_100`
2447	`value: 47.179`
2448	`- type: recall_at_1000`
2449	`value: 81.22200000000001`
2450	`- type: recall_at_3`
2451	`value: 5.745`
2452	`- type: recall_at_5`
2453	`value: 9.339`
2454	`- task:`
2455	`type: Classification`
2456	`dataset:`
2457	`type: mteb/toxic_conversations_50k`
2458	`name: MTEB ToxicConversationsClassification`
2459	`config: default`
2460	`split: test`
2461	`revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c`
2462	`metrics:`
2463	`- type: accuracy`
2464	`value: 70.9142`
2465	`- type: ap`
2466	`value: 14.335574772555415`
2467	`- type: f1`
2468	`value: 54.62839595194111`
2469	`- task:`
2470	`type: Classification`
2471	`dataset:`
2472	`type: mteb/tweet_sentiment_extraction`
2473	`name: MTEB TweetSentimentExtractionClassification`
2474	`config: default`
2475	`split: test`
2476	`revision: d604517c81ca91fe16a244d1248fc021f9ecee7a`
2477	`metrics:`
2478	`- type: accuracy`
2479	`value: 59.94340690435768`
2480	`- type: f1`
2481	`value: 60.286487936731916`
2482	`- task:`
2483	`type: Clustering`
2484	`dataset:`
2485	`type: mteb/twentynewsgroups-clustering`
2486	`name: MTEB TwentyNewsgroupsClustering`
2487	`config: default`
2488	`split: test`
2489	`revision: 6125ec4e24fa026cec8a478383ee943acfbd5449`
2490	`metrics:`
2491	`- type: v_measure`
2492	`value: 51.26597708987974`
2493	`- task:`
2494	`type: PairClassification`
2495	`dataset:`
2496	`type: mteb/twittersemeval2015-pairclassification`
2497	`name: MTEB TwitterSemEval2015`
2498	`config: default`
2499	`split: test`
2500	`revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1`
2501	`metrics:`
2502	`- type: cos_sim_accuracy`
2503	`value: 87.48882398521786`
2504	`- type: cos_sim_ap`
2505	`value: 79.04326607602204`
2506	`- type: cos_sim_f1`
2507	`value: 71.64566826860633`
2508	`- type: cos_sim_precision`
2509	`value: 70.55512918905092`
2510	`- type: cos_sim_recall`
2511	`value: 72.77044854881267`
2512	`- type: dot_accuracy`
2513	`value: 84.19264469213805`
2514	`- type: dot_ap`
2515	`value: 67.96360043562528`
2516	`- type: dot_f1`
2517	`value: 64.06418393006827`
2518	`- type: dot_precision`
2519	`value: 58.64941898706424`
2520	`- type: dot_recall`
2521	`value: 70.58047493403694`
2522	`- type: euclidean_accuracy`
2523	`value: 87.45902127913214`
2524	`- type: euclidean_ap`
2525	`value: 78.9742237648272`
2526	`- type: euclidean_f1`
2527	`value: 71.5553235908142`
2528	`- type: euclidean_precision`
2529	`value: 70.77955601445535`
2530	`- type: euclidean_recall`
2531	`value: 72.34828496042216`
2532	`- type: manhattan_accuracy`
2533	`value: 87.41729749061214`
2534	`- type: manhattan_ap`
2535	`value: 78.90073137580596`
2536	`- type: manhattan_f1`
2537	`value: 71.3942611553533`
2538	`- type: manhattan_precision`
2539	`value: 68.52705653967483`
2540	`- type: manhattan_recall`
2541	`value: 74.51187335092348`
2542	`- type: max_accuracy`
2543	`value: 87.48882398521786`
2544	`- type: max_ap`
2545	`value: 79.04326607602204`
2546	`- type: max_f1`
2547	`value: 71.64566826860633`
2548	`- task:`
2549	`type: PairClassification`
2550	`dataset:`
2551	`type: mteb/twitterurlcorpus-pairclassification`
2552	`name: MTEB TwitterURLCorpus`
2553	`config: default`
2554	`split: test`
2555	`revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf`
2556	`metrics:`
2557	`- type: cos_sim_accuracy`
2558	`value: 88.68125897465751`
2559	`- type: cos_sim_ap`
2560	`value: 85.6003454431979`
2561	`- type: cos_sim_f1`
2562	`value: 77.6957163958641`
2563	`- type: cos_sim_precision`
2564	`value: 73.0110366307807`
2565	`- type: cos_sim_recall`
2566	`value: 83.02279026793964`
2567	`- type: dot_accuracy`
2568	`value: 87.7672992587418`
2569	`- type: dot_ap`
2570	`value: 82.4971301112899`
2571	`- type: dot_f1`
2572	`value: 75.90528233151184`
2573	`- type: dot_precision`
2574	`value: 72.0370626469368`
2575	`- type: dot_recall`
2576	`value: 80.21250384970742`
2577	`- type: euclidean_accuracy`
2578	`value: 88.4503434625684`
2579	`- type: euclidean_ap`
2580	`value: 84.91949884748384`
2581	`- type: euclidean_f1`
2582	`value: 76.92365018444684`
2583	`- type: euclidean_precision`
2584	`value: 74.53245721712759`
2585	`- type: euclidean_recall`
2586	`value: 79.47336002463813`
2587	`- type: manhattan_accuracy`
2588	`value: 88.47556952691427`
2589	`- type: manhattan_ap`
2590	`value: 84.8963689101517`
2591	`- type: manhattan_f1`
2592	`value: 76.85901249256395`
2593	`- type: manhattan_precision`
2594	`value: 74.31693989071039`
2595	`- type: manhattan_recall`
2596	`value: 79.58115183246073`
2597	`- type: max_accuracy`
2598	`value: 88.68125897465751`
2599	`- type: max_ap`
2600	`value: 85.6003454431979`
2601	`- type: max_f1`
2602	`value: 77.6957163958641`
2603	`license: mit`
2604	`language:`
2605	`- en`
2606	`---`
2607
2608
2609	`<h1 align="center">FlagEmbedding</h1>`
2610
2611
2612	`<h4 align="center">`
2613	`<p>`
2614	`<a href=#model-list>Model List</a> \|`
2615	`<a href=#frequently-asked-questions>FAQ</a> \|`
2616	`<a href=#usage>Usage</a> \|`
2617	`<a href="#evaluation">Evaluation</a> \|`
2618	`<a href="#train">Train</a> \|`
2619	`<a href="#contact">Contact</a> \|`
2620	`<a href="#citation">Citation</a> \|`
2621	`<a href="#license">License</a>`
2622	`<p>`
2623	`</h4>`
2624
2625	`For more details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).`
2626
2627	`If you are looking for a model that supports more languages, longer texts, and other retrieval methods, you can try using [bge-m3](https://huggingface.co/BAAI/bge-m3).`
2628
2629
2630	`[English](README.md) \| [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)`
2631
2632	`FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:`
2633
2634	`- Long-Context LLM: [Activation Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon)`
2635	`- Fine-tuning of LM : [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)`
2636	`- Dense Retrieval: [BGE-M3](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3), [LLM Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), [BGE Embedding](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding)`
2637	`- Reranker Model: [BGE Reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)`
2638	`- Benchmark: [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)`
2639
2640	`## News`
2641	`- 1/30/2024: Release BGE-M3, a new member to BGE model series! M3 stands for Multi-linguality (100+ languages), Multi-granularities (input length up to 8192), Multi-Functionality (unification of dense, lexical, multi-vec/colbert retrieval).`
2642	`It is the first embedding model that supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks.`
2643	`[Technical Report](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/BGE_M3/BGE_M3.pdf) and [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3). :fire:`
2644	`- 1/9/2024: Release [Activation-Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon), an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. [Technical Report](https://arxiv.org/abs/2401.03462) :fire:`
2645	`- 12/24/2023: Release LLaRA, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. [Technical Report](https://arxiv.org/abs/2312.15503) :fire:`
2646	`- 11/23/2023: Release [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail), a method to maintain general capabilities during fine-tuning by merging multiple language models. [Technical Report](https://arxiv.org/abs/2311.13534) :fire:`
2647	`- 10/12/2023: Release [LLM-Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Technical Report](https://arxiv.org/pdf/2310.07554.pdf)`
2648	`- 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) and [massive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released`
2649	`- 09/12/2023: New models:`
2650	- New reranker model: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
2651	- update embedding model: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
2652
2653
2654	`<details>`
2655	`<summary>More</summary>`
2656	`<!-- ### More -->`
2657
2658	`- 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.`
2659	`- 08/09/2023: BGE Models are integrated into Langchain, you can use it like [this](#using-langchain); C-MTEB leaderboard is [available](https://huggingface.co/spaces/mteb/leaderboard).`
2660	`- 08/05/2023: Release base-scale and small-scale models, best performance among the models of the same size 🤗`
2661	- 08/02/2023: Release `bge-large-`(short for BAAI General Embedding) Models, rank 1st on MTEB and C-MTEB benchmark!* :tada: :tada:
2662	`- 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (C-MTEB), consisting of 31 test dataset.`
2663
2664	`</details>`
2665
2666
2667	`## Model List`
2668
2669	`bge` is short for `BAAI general embedding`.
2670
2671	`\| Model \| Language \| \| Description \| query instruction for retrieval [1] \|`
2672	`\|:-------------------------------\|:--------:\| :--------:\| :--------:\|:--------:\|`
2673	`\| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) \| Multilingual \| [Inference](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3#usage) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3) \| Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) \| \|`
2674	`\| [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) \| English \| [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) \| a unified embedding model to support diverse retrieval augmentation needs for LLMs \| See [README](./FlagEmbedding/llm_embedder/README.md) \|`
2675	`\| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) \| Chinese and English \| [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) \| a cross-encoder model which is more accurate but less efficient [2] \| \|`
2676	`\| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) \| Chinese and English \| [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) \| a cross-encoder model which is more accurate but less efficient [2] \| \|`
2677	\| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2678	\| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2679	\| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `Represent this sentence for searching relevant passages: ` \|
2680	\| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2681	\| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2682	\| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| version 1.5 with more reasonable similarity distribution \| `为这个句子生成表示以用于检索相关文章：` \|
2683	\| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| :trophy: rank 1st in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard \| `Represent this sentence for searching relevant passages: ` \|
2684	\| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a base-scale model but with similar ability to `bge-large-en` \| `Represent this sentence for searching relevant passages: ` \|
2685	\| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) \| English \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \|a small-scale model but with competitive performance \| `Represent this sentence for searching relevant passages: ` \|
2686	\| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| :trophy: rank 1st in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark \| `为这个句子生成表示以用于检索相关文章：` \|
2687	\| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a base-scale model but with similar ability to `bge-large-zh` \| `为这个句子生成表示以用于检索相关文章：` \|
2688	\| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) \| Chinese \| [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) \| a small-scale model but with competitive performance \| `为这个句子生成表示以用于检索相关文章：` \|
2689
2690	`[1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, no instruction needs to be added to passages.`
2691
2692	`[2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.`
2693	`For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.`
2694
2695	`All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.`
2696	`If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .`
2697
2698
2699	`## Frequently asked questions`
2700
2701	`<details>`
2702	`<summary>1. How to fine-tune bge embedding model?</summary>`
2703
2704	`<!-- ### How to fine-tune bge embedding model? -->`
2705	`Following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to prepare data and fine-tune your model.`
2706	`Some suggestions:`
2707	`- Mine hard negatives following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives), which can improve the retrieval performance.`
2708	`- If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.`
2709	`- If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.`
2710
2711
2712	`</details>`
2713
2714	`<details>`
2715	`<summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary>`
2716
2717	`<!-- ### The similarity score between two dissimilar sentences is higher than 0.5 -->`
2718	`Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.`
2719
2720	`Since we finetune the models by contrastive learning with a temperature of 0.01,`
2721	`the similarity distribution of the current BGE model is about in the interval \[0.6, 1\].`
2722	`So a similarity score greater than 0.5 does not indicate that the two sentences are similar.`
2723
2724	`For downstream tasks, such as passage retrieval or semantic similarity,`
2725	`what matters is the relative order of the scores, not the absolute value.`
2726	`If you need to filter similar sentences based on a similarity threshold,`
2727	`please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).`
2728
2729	`</details>`
2730
2731	`<details>`
2732	`<summary>3. When does the query instruction need to be used</summary>`
2733
2734	`<!-- ### When does the query instruction need to be used -->`
2735
2736	For the `bge-*-v1.5`, we improve its retrieval ability when not using instruction.
2737	`No instruction only has a slight degradation in retrieval performance compared with using instruction.`
2738	`So you can generate embedding without instruction in all cases for convenience.`
2739
2740	`For a retrieval task that uses short queries to find long related documents,`
2741	`it is recommended to add instructions for these short queries.`
2742	`The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.`
2743	`In all cases, the documents/passages do not need to add the instruction.`
2744
2745	`</details>`
2746
2747
2748	`## Usage`
2749
2750	`### Usage for Embedding Model`
2751
2752	Here are some examples for using `bge` models with
2753	`[FlagEmbedding](#using-flagembedding), [Sentence-Transformers](#using-sentence-transformers), [Langchain](#using-langchain), or [Huggingface Transformers](#using-huggingface-transformers).`
2754
2755	`#### Using FlagEmbedding`
2756	```
2757	`pip install -U FlagEmbedding`
2758	```
2759	`If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding.`
2760
2761	```python
2762	`from FlagEmbedding import FlagModel`
2763	`sentences_1 = ["样例数据-1", "样例数据-2"]`
2764	`sentences_2 = ["样例数据-3", "样例数据-4"]`
2765	`model = FlagModel('BAAI/bge-large-zh-v1.5',`
2766	`query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章：",`
2767	`use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation`
2768	`embeddings_1 = model.encode(sentences_1)`
2769	`embeddings_2 = model.encode(sentences_2)`
2770	`similarity = embeddings_1 @ embeddings_2.T`
2771	`print(similarity)`
2772
2773	`# for s2p(short query to long passage) retrieval task, suggest to use encode_queries() which will automatically add the instruction to each query`
2774	`# corpus in retrieval task can still use encode() or encode_corpus(), since they don't need instruction`
2775	`queries = ['query_1', 'query_2']`
2776	`passages = ["样例文档-1", "样例文档-2"]`
2777	`q_embeddings = model.encode_queries(queries)`
2778	`p_embeddings = model.encode(passages)`
2779	`scores = q_embeddings @ p_embeddings.T`
2780	```
2781	For the value of the argument `query_instruction_for_retrieval`, see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list).
2782
2783	By default, FlagModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
2784	You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.
2785
2786
2787	`#### Using Sentence-Transformers`
2788
2789	You can also use the `bge` models with [sentence-transformers](https://www.SBERT.net):
2790
2791	```
2792	`pip install -U sentence-transformers`
2793	```
2794	```python
2795	`from sentence_transformers import SentenceTransformer`
2796	`sentences_1 = ["样例数据-1", "样例数据-2"]`
2797	`sentences_2 = ["样例数据-3", "样例数据-4"]`
2798	`model = SentenceTransformer('BAAI/bge-large-zh-v1.5')`
2799	`embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)`
2800	`embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)`
2801	`similarity = embeddings_1 @ embeddings_2.T`
2802	`print(similarity)`
2803	```
2804	`For s2p(short query to long passage) retrieval task,`
2805	`each short query should start with an instruction (instructions see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list)).`
2806	`But the instruction is not needed for passages.`
2807	```python
2808	`from sentence_transformers import SentenceTransformer`
2809	`queries = ['query_1', 'query_2']`
2810	`passages = ["样例文档-1", "样例文档-2"]`
2811	`instruction = "为这个句子生成表示以用于检索相关文章："`
2812
2813	`model = SentenceTransformer('BAAI/bge-large-zh-v1.5')`
2814	`q_embeddings = model.encode([instruction+q for q in queries], normalize_embeddings=True)`
2815	`p_embeddings = model.encode(passages, normalize_embeddings=True)`
2816	`scores = q_embeddings @ p_embeddings.T`
2817	```
2818
2819	`#### Using Langchain`
2820
2821	You can use `bge` in langchain like this:
2822	```python
2823	`from langchain.embeddings import HuggingFaceBgeEmbeddings`
2824	`model_name = "BAAI/bge-large-en-v1.5"`
2825	`model_kwargs = {'device': 'cuda'}`
2826	`encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity`
2827	`model = HuggingFaceBgeEmbeddings(`
2828	`model_name=model_name,`
2829	`model_kwargs=model_kwargs,`
2830	`encode_kwargs=encode_kwargs,`
2831	`query_instruction="为这个句子生成表示以用于检索相关文章："`
2832	`)`
2833	`model.query_instruction = "为这个句子生成表示以用于检索相关文章："`
2834	```
2835
2836
2837	`#### Using HuggingFace Transformers`
2838
2839	`With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.`
2840
2841	```python
2842	`from transformers import AutoTokenizer, AutoModel`
2843	`import torch`
2844	`# Sentences we want sentence embeddings for`
2845	`sentences = ["样例数据-1", "样例数据-2"]`
2846
2847	`# Load model from HuggingFace Hub`
2848	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')`
2849	`model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')`
2850	`model.eval()`
2851
2852	`# Tokenize sentences`
2853	`encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')`
2854	`# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)`
2855	`# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')`
2856
2857	`# Compute token embeddings`
2858	`with torch.no_grad():`
2859	`model_output = model(**encoded_input)`
2860	`# Perform pooling. In this case, cls pooling.`
2861	`sentence_embeddings = model_output[0][:, 0]`
2862	`# normalize embeddings`
2863	`sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)`
2864	`print("Sentence embeddings:", sentence_embeddings)`
2865	```
2866
2867	`#### Usage of the ONNX files`
2868
2869	```python
2870	`from optimum.onnxruntime import ORTModelForFeatureExtraction # type: ignore`
2871
2872	`import torch`
2873	`from transformers import AutoModel, AutoTokenizer`
2874
2875	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-en-v1.5')`
2876	`model = AutoModel.from_pretrained('BAAI/bge-large-en-v1.5', revision="refs/pr/13")`
2877	`model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-large-en-v1.5', revision="refs/pr/13",file_name="onnx/model.onnx")`
2878
2879	`# Sentences we want sentence embeddings for`
2880	`sentences = ["样例数据-1", "样例数据-2"]`
2881
2882	`# Tokenize sentences`
2883	`encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')`
2884	`# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)`
2885	`# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')`
2886
2887	`model_output_ort = model_ort(**encoded_input)`
2888	`# Compute token embeddings`
2889	`with torch.no_grad():`
2890	`model_output = model(**encoded_input)`
2891
2892	`# model_output and model_output_ort are identical`
2893
2894	```
2895
2896	`Its also possible to deploy the onnx files with the [infinity_emb](https://github.com/michaelfeil/infinity) pip package.`
2897	```python
2898	`import asyncio`
2899	`from infinity_emb import AsyncEmbeddingEngine, EngineArgs`
2900
2901	`sentences = ["Embed this is sentence via Infinity.", "Paris is in France."]`
2902	`engine = AsyncEmbeddingEngine.from_args(`
2903	`EngineArgs(model_name_or_path = "BAAI/bge-large-en-v1.5", device="cpu", engine="optimum" # or engine="torch"`
2904	`))`
2905
2906	`async def main():`
2907	`async with engine:`
2908	`embeddings, usage = await engine.embed(sentences=sentences)`
2909	`asyncio.run(main())`
2910	```
2911
2912	`### Usage for Reranker`
2913
2914	`Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.`
2915	`You can get a relevance score by inputting query and passage to the reranker.`
2916	`The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.`
2917
2918
2919	`#### Using FlagEmbedding`
2920	```
2921	`pip install -U FlagEmbedding`
2922	```
2923
2924	`Get relevance scores (higher scores indicate more relevance):`
2925	```python
2926	`from FlagEmbedding import FlagReranker`
2927	`reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation`
2928
2929	`score = reranker.compute_score(['query', 'passage'])`
2930	`print(score)`
2931
2932	`scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])`
2933	`print(scores)`
2934	```
2935
2936
2937	`#### Using Huggingface transformers`
2938
2939	```python
2940	`import torch`
2941	`from transformers import AutoModelForSequenceClassification, AutoTokenizer`
2942
2943	`tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')`
2944	`model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')`
2945	`model.eval()`
2946
2947	`pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]`
2948	`with torch.no_grad():`
2949	`inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)`
2950	`scores = model(**inputs, return_dict=True).logits.view(-1, ).float()`
2951	`print(scores)`
2952	```
2953
2954	`## Evaluation`
2955
2956	`baai-general-embedding` models achieve state-of-the-art performance on both MTEB and C-MTEB leaderboard!
2957	`For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md).`
2958
2959	`- MTEB:`
2960
2961	`\| Model Name \| Dimension \| Sequence Length \| Average (56) \| Retrieval (15) \|Clustering (11) \| Pair Classification (3) \| Reranking (4) \| STS (10) \| Summarization (1) \| Classification (12) \|`
2962	`\|:----:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|`
2963	`\| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) \| 1024 \| 512 \| 64.23 \| 54.29 \| 46.08 \| 87.12 \| 60.03 \| 83.11 \| 31.61 \| 75.97 \|`
2964	`\| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) \| 768 \| 512 \| 63.55 \| 53.25 \| 45.77 \| 86.55 \| 58.86 \| 82.4 \| 31.07 \| 75.53 \|`
2965	`\| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) \| 384 \| 512 \| 62.17 \|51.68 \| 43.82 \| 84.92 \| 58.36 \| 81.59 \| 30.12 \| 74.14 \|`
2966	`\| [bge-large-en](https://huggingface.co/BAAI/bge-large-en) \| 1024 \| 512 \| 63.98 \| 53.9 \| 46.98 \| 85.8 \| 59.48 \| 81.56 \| 32.06 \| 76.21 \|`
2967	`\| [bge-base-en](https://huggingface.co/BAAI/bge-base-en) \| 768 \| 512 \| 63.36 \| 53.0 \| 46.32 \| 85.86 \| 58.7 \| 81.84 \| 29.27 \| 75.27 \|`
2968	`\| [gte-large](https://huggingface.co/thenlper/gte-large) \| 1024 \| 512 \| 63.13 \| 52.22 \| 46.84 \| 85.00 \| 59.13 \| 83.35 \| 31.66 \| 73.33 \|`
2969	`\| [gte-base](https://huggingface.co/thenlper/gte-base) \| 768 \| 512 \| 62.39 \| 51.14 \| 46.2 \| 84.57 \| 58.61 \| 82.3 \| 31.17 \| 73.01 \|`
2970	`\| [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) \| 1024\| 512 \| 62.25 \| 50.56 \| 44.49 \| 86.03 \| 56.61 \| 82.05 \| 30.19 \| 75.24 \|`
2971	`\| [bge-small-en](https://huggingface.co/BAAI/bge-small-en) \| 384 \| 512 \| 62.11 \| 51.82 \| 44.31 \| 83.78 \| 57.97 \| 80.72 \| 30.53 \| 74.37 \|`
2972	`\| [instructor-xl](https://huggingface.co/hkunlp/instructor-xl) \| 768 \| 512 \| 61.79 \| 49.26 \| 44.74 \| 86.62 \| 57.29 \| 83.06 \| 32.32 \| 61.79 \|`
2973	`\| [e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) \| 768 \| 512 \| 61.5 \| 50.29 \| 43.80 \| 85.73 \| 55.91 \| 81.05 \| 30.28 \| 73.84 \|`
2974	`\| [gte-small](https://huggingface.co/thenlper/gte-small) \| 384 \| 512 \| 61.36 \| 49.46 \| 44.89 \| 83.54 \| 57.7 \| 82.07 \| 30.42 \| 72.31 \|`
2975	`\| [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings) \| 1536 \| 8192 \| 60.99 \| 49.25 \| 45.9 \| 84.89 \| 56.32 \| 80.97 \| 30.8 \| 70.93 \|`
2976	`\| [e5-small-v2](https://huggingface.co/intfloat/e5-base-v2) \| 384 \| 512 \| 59.93 \| 49.04 \| 39.92 \| 84.67 \| 54.32 \| 80.39 \| 31.16 \| 72.94 \|`
2977	`\| [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) \| 768 \| 512 \| 59.51 \| 42.24 \| 43.72 \| 85.06 \| 56.42 \| 82.63 \| 30.08 \| 73.42 \|`
2978	`\| [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) \| 768 \| 514 \| 57.78 \| 43.81 \| 43.69 \| 83.04 \| 59.36 \| 80.28 \| 27.49 \| 65.07 \|`
2979	`\| [sgpt-bloom-7b1-msmarco](https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco) \| 4096 \| 2048 \| 57.59 \| 48.22 \| 38.93 \| 81.9 \| 55.65 \| 77.74 \| 33.6 \| 66.19 \|`
2980
2981
2982
2983	`- C-MTEB:`
2984	`We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.`
2985	`Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md) for a detailed introduction.`
2986
2987	`\| Model \| Embedding dimension \| Avg \| Retrieval \| STS \| PairClassification \| Classification \| Reranking \| Clustering \|`
2988	`\|:-------------------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|`
2989	`\| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) \| 1024 \| 64.53 \| 70.46 \| 56.25 \| 81.6 \| 69.13 \| 65.84 \| 48.99 \|`
2990	`\| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) \| 768 \| 63.13 \| 69.49 \| 53.72 \| 79.75 \| 68.07 \| 65.39 \| 47.53 \|`
2991	`\| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) \| 512 \| 57.82 \| 61.77 \| 49.11 \| 70.41 \| 63.96 \| 60.92 \| 44.18 \|`
2992	`\| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) \| 1024 \| 64.20 \| 71.53 \| 54.98 \| 78.94 \| 68.32 \| 65.11 \| 48.39 \|`
2993	`\| [bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) \| 1024 \| 63.53 \| 70.55 \| 53 \| 76.77 \| 68.58 \| 64.91 \| 50.01 \|`
2994	`\| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) \| 768 \| 62.96 \| 69.53 \| 54.12 \| 77.5 \| 67.07 \| 64.91 \| 47.63 \|`
2995	`\| [multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) \| 1024 \| 58.79 \| 63.66 \| 48.44 \| 69.89 \| 67.34 \| 56.00 \| 48.23 \|`
2996	`\| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) \| 512 \| 58.27 \| 63.07 \| 49.45 \| 70.35 \| 63.64 \| 61.48 \| 45.09 \|`
2997	`\| [m3e-base](https://huggingface.co/moka-ai/m3e-base) \| 768 \| 57.10 \| 56.91 \| 50.47 \| 63.99 \| 67.52 \| 59.34 \| 47.68 \|`
2998	`\| [m3e-large](https://huggingface.co/moka-ai/m3e-large) \| 1024 \| 57.05 \| 54.75 \| 50.42 \| 64.3 \| 68.2 \| 59.66 \| 48.88 \|`
2999	`\| [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) \| 768 \| 55.48 \| 61.63 \| 46.49 \| 67.07 \| 65.35 \| 54.35 \| 40.68 \|`
3000	`\| [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) \| 384 \| 55.38 \| 59.95 \| 45.27 \| 66.45 \| 65.85 \| 53.86 \| 45.26 \|`
3001	`\| [text-embedding-ada-002(OpenAI)](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) \| 1536 \| 53.02 \| 52.0 \| 43.35 \| 69.56 \| 64.31 \| 54.28 \| 45.68 \|`
3002	`\| [luotuo](https://huggingface.co/silk-road/luotuo-bert-medium) \| 1024 \| 49.37 \| 44.4 \| 42.78 \| 66.62 \| 61 \| 49.25 \| 44.39 \|`
3003	`\| [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) \| 768 \| 47.63 \| 38.79 \| 43.41 \| 67.41 \| 62.19 \| 49.45 \| 37.66 \|`
3004	`\| [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) \| 1024 \| 47.36 \| 41.94 \| 44.97 \| 70.86 \| 60.66 \| 49.16 \| 30.02 \|`
3005
3006
3007	`- Reranking:`
3008	`See [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/) for evaluation script.`
3009
3010	`\| Model \| T2Reranking \| T2RerankingZh2En\* \| T2RerankingEn2Zh\* \| MMarcoReranking \| CMedQAv1 \| CMedQAv2 \| Avg \|`
3011	`\|:-------------------------------\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|`
3012	`\| text2vec-base-multilingual \| 64.66 \| 62.94 \| 62.51 \| 14.37 \| 48.46 \| 48.6 \| 50.26 \|`
3013	`\| multilingual-e5-small \| 65.62 \| 60.94 \| 56.41 \| 29.91 \| 67.26 \| 66.54 \| 57.78 \|`
3014	`\| multilingual-e5-large \| 64.55 \| 61.61 \| 54.28 \| 28.6 \| 67.42 \| 67.92 \| 57.4 \|`
3015	`\| multilingual-e5-base \| 64.21 \| 62.13 \| 54.68 \| 29.5 \| 66.23 \| 66.98 \| 57.29 \|`
3016	`\| m3e-base \| 66.03 \| 62.74 \| 56.07 \| 17.51 \| 77.05 \| 76.76 \| 59.36 \|`
3017	`\| m3e-large \| 66.13 \| 62.72 \| 56.1 \| 16.46 \| 77.76 \| 78.27 \| 59.57 \|`
3018	`\| bge-base-zh-v1.5 \| 66.49 \| 63.25 \| 57.02 \| 29.74 \| 80.47 \| 84.88 \| 63.64 \|`
3019	`\| bge-large-zh-v1.5 \| 65.74 \| 63.39 \| 57.03 \| 28.74 \| 83.45 \| 85.44 \| 63.97 \|`
3020	`\| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) \| 67.28 \| 63.95 \| 60.45 \| 35.46 \| 81.26 \| 84.1 \| 65.42 \|`
3021	`\| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) \| 67.6 \| 64.03 \| 61.44 \| 37.16 \| 82.15 \| 84.18 \| 66.09 \|`
3022
3023	`\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks`
3024
3025	`## Train`
3026
3027	`### BAAI Embedding`
3028
3029	`We pre-train the models using [retromae](https://github.com/staoxiao/RetroMAE) and train them on large-scale pairs data using contrastive learning.`
3030	`You can fine-tune the embedding model on your data following our [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune).`
3031	`We also provide a [pre-train example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain).`
3032	`Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.`
3033	`More training details for bge see [baai_general_embedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md).`
3034
3035
3036
3037	`### BGE Reranker`
3038
3039	`Cross-encoder will perform full-attention over the input pair,`
3040	`which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.`
3041	`Therefore, it can be used to re-rank the top-k documents returned by embedding model.`
3042	`We train the cross-encoder on a multilingual pair data,`
3043	`The data format is the same as embedding model, so you can fine-tune it easily following our [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker).`
3044	`More details please refer to [./FlagEmbedding/reranker/README.md](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)`
3045
3046
3047	`## Contact`
3048	`If you have any question or suggestion related to this project, feel free to open an issue or pull request.`
3049	`You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn).`
3050
3051
3052	`## Citation`
3053
3054	`If you find this repository useful, please consider giving a star :star: and citation`
3055
3056	```
3057	`@misc{bge_embedding,`
3058	`title={C-Pack: Packaged Resources To Advance General Chinese Embedding},`
3059	`author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},`
3060	`year={2023},`
3061	`eprint={2309.07597},`
3062	`archivePrefix={arXiv},`
3063	`primaryClass={cs.CL}`
3064	`}`
3065	```
3066
3067	`## License`
3068	`FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge.`
3069
3070