README.md
121.6 KB · 4854 lines · markdown Raw
1 ---
2 tags:
3 - mteb
4 - sentence-transformers
5 - transformers
6 - multilingual
7 - sentence-similarity
8 - text-embeddings-inference
9 license: apache-2.0
10 language:
11 - af
12 - ar
13 - az
14 - be
15 - bg
16 - bn
17 - ca
18 - ceb
19 - cs
20 - cy
21 - da
22 - de
23 - el
24 - en
25 - es
26 - et
27 - eu
28 - fa
29 - fi
30 - fr
31 - gl
32 - gu
33 - he
34 - hi
35 - hr
36 - ht
37 - hu
38 - hy
39 - id
40 - is
41 - it
42 - ja
43 - jv
44 - ka
45 - kk
46 - km
47 - kn
48 - ko
49 - ky
50 - lo
51 - lt
52 - lv
53 - mk
54 - ml
55 - mn
56 - mr
57 - ms
58 - my
59 - ne
60 - nl
61 - 'no'
62 - pa
63 - pl
64 - pt
65 - qu
66 - ro
67 - ru
68 - si
69 - sk
70 - sl
71 - so
72 - sq
73 - sr
74 - sv
75 - sw
76 - ta
77 - te
78 - th
79 - tl
80 - tr
81 - uk
82 - ur
83 - vi
84 - yo
85 - zh
86 model-index:
87 - name: gte-multilingual-base (dense)
88 results:
89 - task:
90 type: Clustering
91 dataset:
92 type: PL-MTEB/8tags-clustering
93 name: MTEB 8TagsClustering
94 config: default
95 split: test
96 revision: None
97 metrics:
98 - type: v_measure
99 value: 33.66681726329994
100 - task:
101 type: STS
102 dataset:
103 type: C-MTEB/AFQMC
104 name: MTEB AFQMC
105 config: default
106 split: validation
107 revision: b44c3b011063adb25877c13823db83bb193913c4
108 metrics:
109 - type: cos_sim_spearman
110 value: 43.54760696384009
111 - task:
112 type: STS
113 dataset:
114 type: C-MTEB/ATEC
115 name: MTEB ATEC
116 config: default
117 split: test
118 revision: 0f319b1142f28d00e055a6770f3f726ae9b7d865
119 metrics:
120 - type: cos_sim_spearman
121 value: 48.91186363417501
122 - task:
123 type: Classification
124 dataset:
125 type: PL-MTEB/allegro-reviews
126 name: MTEB AllegroReviews
127 config: default
128 split: test
129 revision: None
130 metrics:
131 - type: accuracy
132 value: 41.689860834990064
133 - task:
134 type: Clustering
135 dataset:
136 type: lyon-nlp/alloprof
137 name: MTEB AlloProfClusteringP2P
138 config: default
139 split: test
140 revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
141 metrics:
142 - type: v_measure
143 value: 54.20241337977897
144 - task:
145 type: Clustering
146 dataset:
147 type: lyon-nlp/alloprof
148 name: MTEB AlloProfClusteringS2S
149 config: default
150 split: test
151 revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
152 metrics:
153 - type: v_measure
154 value: 44.34083695608643
155 - task:
156 type: Reranking
157 dataset:
158 type: lyon-nlp/mteb-fr-reranking-alloprof-s2p
159 name: MTEB AlloprofReranking
160 config: default
161 split: test
162 revision: 666fdacebe0291776e86f29345663dfaf80a0db9
163 metrics:
164 - type: map
165 value: 64.91495250072002
166 - task:
167 type: Retrieval
168 dataset:
169 type: lyon-nlp/alloprof
170 name: MTEB AlloprofRetrieval
171 config: default
172 split: test
173 revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
174 metrics:
175 - type: ndcg_at_10
176 value: 53.638
177 - task:
178 type: Classification
179 dataset:
180 type: mteb/amazon_counterfactual
181 name: MTEB AmazonCounterfactualClassification (en)
182 config: en
183 split: test
184 revision: e8379541af4e31359cca9fbcf4b00f2671dba205
185 metrics:
186 - type: accuracy
187 value: 75.95522388059702
188 - task:
189 type: Classification
190 dataset:
191 type: mteb/amazon_polarity
192 name: MTEB AmazonPolarityClassification
193 config: default
194 split: test
195 revision: e2d317d38cd51312af73b3d32a06d1a08b442046
196 metrics:
197 - type: accuracy
198 value: 80.717625
199 - task:
200 type: Classification
201 dataset:
202 type: mteb/amazon_reviews_multi
203 name: MTEB AmazonReviewsClassification (en)
204 config: en
205 split: test
206 revision: 1399c76144fd37290681b995c656ef9b2e06e26d
207 metrics:
208 - type: accuracy
209 value: 43.64199999999999
210 - task:
211 type: Classification
212 dataset:
213 type: mteb/amazon_reviews_multi
214 name: MTEB AmazonReviewsClassification (de)
215 config: de
216 split: test
217 revision: 1399c76144fd37290681b995c656ef9b2e06e26d
218 metrics:
219 - type: accuracy
220 value: 40.108
221 - task:
222 type: Classification
223 dataset:
224 type: mteb/amazon_reviews_multi
225 name: MTEB AmazonReviewsClassification (es)
226 config: es
227 split: test
228 revision: 1399c76144fd37290681b995c656ef9b2e06e26d
229 metrics:
230 - type: accuracy
231 value: 40.169999999999995
232 - task:
233 type: Classification
234 dataset:
235 type: mteb/amazon_reviews_multi
236 name: MTEB AmazonReviewsClassification (fr)
237 config: fr
238 split: test
239 revision: 1399c76144fd37290681b995c656ef9b2e06e26d
240 metrics:
241 - type: accuracy
242 value: 39.56799999999999
243 - task:
244 type: Classification
245 dataset:
246 type: mteb/amazon_reviews_multi
247 name: MTEB AmazonReviewsClassification (ja)
248 config: ja
249 split: test
250 revision: 1399c76144fd37290681b995c656ef9b2e06e26d
251 metrics:
252 - type: accuracy
253 value: 35.75000000000001
254 - task:
255 type: Classification
256 dataset:
257 type: mteb/amazon_reviews_multi
258 name: MTEB AmazonReviewsClassification (zh)
259 config: zh
260 split: test
261 revision: 1399c76144fd37290681b995c656ef9b2e06e26d
262 metrics:
263 - type: accuracy
264 value: 33.342000000000006
265 - task:
266 type: Retrieval
267 dataset:
268 type: mteb/arguana
269 name: MTEB ArguAna
270 config: default
271 split: test
272 revision: c22ab2a51041ffd869aaddef7af8d8215647e41a
273 metrics:
274 - type: ndcg_at_10
275 value: 58.231
276 - task:
277 type: Retrieval
278 dataset:
279 type: clarin-knext/arguana-pl
280 name: MTEB ArguAna-PL
281 config: default
282 split: test
283 revision: 63fc86750af76253e8c760fc9e534bbf24d260a2
284 metrics:
285 - type: ndcg_at_10
286 value: 53.166000000000004
287 - task:
288 type: Clustering
289 dataset:
290 type: mteb/arxiv-clustering-p2p
291 name: MTEB ArxivClusteringP2P
292 config: default
293 split: test
294 revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
295 metrics:
296 - type: v_measure
297 value: 46.01900557959478
298 - task:
299 type: Clustering
300 dataset:
301 type: mteb/arxiv-clustering-s2s
302 name: MTEB ArxivClusteringS2S
303 config: default
304 split: test
305 revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
306 metrics:
307 - type: v_measure
308 value: 41.06626465345723
309 - task:
310 type: Reranking
311 dataset:
312 type: mteb/askubuntudupquestions-reranking
313 name: MTEB AskUbuntuDupQuestions
314 config: default
315 split: test
316 revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
317 metrics:
318 - type: map
319 value: 61.87514497610431
320 - task:
321 type: STS
322 dataset:
323 type: mteb/biosses-sts
324 name: MTEB BIOSSES
325 config: default
326 split: test
327 revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
328 metrics:
329 - type: cos_sim_spearman
330 value: 81.21450112991194
331 - task:
332 type: STS
333 dataset:
334 type: C-MTEB/BQ
335 name: MTEB BQ
336 config: default
337 split: test
338 revision: e3dda5e115e487b39ec7e618c0c6a29137052a55
339 metrics:
340 - type: cos_sim_spearman
341 value: 51.71589543397271
342 - task:
343 type: Retrieval
344 dataset:
345 type: maastrichtlawtech/bsard
346 name: MTEB BSARDRetrieval
347 config: default
348 split: test
349 revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59
350 metrics:
351 - type: ndcg_at_10
352 value: 26.115
353 - task:
354 type: BitextMining
355 dataset:
356 type: mteb/bucc-bitext-mining
357 name: MTEB BUCC (de-en)
358 config: de-en
359 split: test
360 revision: d51519689f32196a32af33b075a01d0e7c51e252
361 metrics:
362 - type: f1
363 value: 98.6169102296451
364 - task:
365 type: BitextMining
366 dataset:
367 type: mteb/bucc-bitext-mining
368 name: MTEB BUCC (fr-en)
369 config: fr-en
370 split: test
371 revision: d51519689f32196a32af33b075a01d0e7c51e252
372 metrics:
373 - type: f1
374 value: 97.89603052314916
375 - task:
376 type: BitextMining
377 dataset:
378 type: mteb/bucc-bitext-mining
379 name: MTEB BUCC (ru-en)
380 config: ru-en
381 split: test
382 revision: d51519689f32196a32af33b075a01d0e7c51e252
383 metrics:
384 - type: f1
385 value: 97.12388869645537
386 - task:
387 type: BitextMining
388 dataset:
389 type: mteb/bucc-bitext-mining
390 name: MTEB BUCC (zh-en)
391 config: zh-en
392 split: test
393 revision: d51519689f32196a32af33b075a01d0e7c51e252
394 metrics:
395 - type: f1
396 value: 98.15692469720906
397 - task:
398 type: Classification
399 dataset:
400 type: mteb/banking77
401 name: MTEB Banking77Classification
402 config: default
403 split: test
404 revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
405 metrics:
406 - type: accuracy
407 value: 85.36038961038962
408 - task:
409 type: Clustering
410 dataset:
411 type: mteb/biorxiv-clustering-p2p
412 name: MTEB BiorxivClusteringP2P
413 config: default
414 split: test
415 revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
416 metrics:
417 - type: v_measure
418 value: 37.5903826674123
419 - task:
420 type: Clustering
421 dataset:
422 type: mteb/biorxiv-clustering-s2s
423 name: MTEB BiorxivClusteringS2S
424 config: default
425 split: test
426 revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
427 metrics:
428 - type: v_measure
429 value: 34.21474277151329
430 - task:
431 type: Classification
432 dataset:
433 type: PL-MTEB/cbd
434 name: MTEB CBD
435 config: default
436 split: test
437 revision: None
438 metrics:
439 - type: accuracy
440 value: 62.519999999999996
441 - task:
442 type: PairClassification
443 dataset:
444 type: PL-MTEB/cdsce-pairclassification
445 name: MTEB CDSC-E
446 config: default
447 split: test
448 revision: None
449 metrics:
450 - type: cos_sim_ap
451 value: 74.90132799162956
452 - task:
453 type: STS
454 dataset:
455 type: PL-MTEB/cdscr-sts
456 name: MTEB CDSC-R
457 config: default
458 split: test
459 revision: None
460 metrics:
461 - type: cos_sim_spearman
462 value: 90.30727955142524
463 - task:
464 type: Clustering
465 dataset:
466 type: C-MTEB/CLSClusteringP2P
467 name: MTEB CLSClusteringP2P
468 config: default
469 split: test
470 revision: 4b6227591c6c1a73bc76b1055f3b7f3588e72476
471 metrics:
472 - type: v_measure
473 value: 37.94850105022274
474 - task:
475 type: Clustering
476 dataset:
477 type: C-MTEB/CLSClusteringS2S
478 name: MTEB CLSClusteringS2S
479 config: default
480 split: test
481 revision: e458b3f5414b62b7f9f83499ac1f5497ae2e869f
482 metrics:
483 - type: v_measure
484 value: 38.11958675421534
485 - task:
486 type: Reranking
487 dataset:
488 type: C-MTEB/CMedQAv1-reranking
489 name: MTEB CMedQAv1
490 config: default
491 split: test
492 revision: 8d7f1e942507dac42dc58017c1a001c3717da7df
493 metrics:
494 - type: map
495 value: 86.10950950485399
496 - task:
497 type: Reranking
498 dataset:
499 type: C-MTEB/CMedQAv2-reranking
500 name: MTEB CMedQAv2
501 config: default
502 split: test
503 revision: 23d186750531a14a0357ca22cd92d712fd512ea0
504 metrics:
505 - type: map
506 value: 87.28038294231966
507 - task:
508 type: Retrieval
509 dataset:
510 type: mteb/cqadupstack-android
511 name: MTEB CQADupstackAndroidRetrieval
512 config: default
513 split: test
514 revision: f46a197baaae43b4f621051089b82a364682dfeb
515 metrics:
516 - type: ndcg_at_10
517 value: 47.099000000000004
518 - task:
519 type: Retrieval
520 dataset:
521 type: mteb/cqadupstack-english
522 name: MTEB CQADupstackEnglishRetrieval
523 config: default
524 split: test
525 revision: ad9991cb51e31e31e430383c75ffb2885547b5f0
526 metrics:
527 - type: ndcg_at_10
528 value: 45.973000000000006
529 - task:
530 type: Retrieval
531 dataset:
532 type: mteb/cqadupstack-gaming
533 name: MTEB CQADupstackGamingRetrieval
534 config: default
535 split: test
536 revision: 4885aa143210c98657558c04aaf3dc47cfb54340
537 metrics:
538 - type: ndcg_at_10
539 value: 55.606
540 - task:
541 type: Retrieval
542 dataset:
543 type: mteb/cqadupstack-gis
544 name: MTEB CQADupstackGisRetrieval
545 config: default
546 split: test
547 revision: 5003b3064772da1887988e05400cf3806fe491f2
548 metrics:
549 - type: ndcg_at_10
550 value: 36.638
551 - task:
552 type: Retrieval
553 dataset:
554 type: mteb/cqadupstack-mathematica
555 name: MTEB CQADupstackMathematicaRetrieval
556 config: default
557 split: test
558 revision: 90fceea13679c63fe563ded68f3b6f06e50061de
559 metrics:
560 - type: ndcg_at_10
561 value: 30.711
562 - task:
563 type: Retrieval
564 dataset:
565 type: mteb/cqadupstack-physics
566 name: MTEB CQADupstackPhysicsRetrieval
567 config: default
568 split: test
569 revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4
570 metrics:
571 - type: ndcg_at_10
572 value: 44.523
573 - task:
574 type: Retrieval
575 dataset:
576 type: mteb/cqadupstack-programmers
577 name: MTEB CQADupstackProgrammersRetrieval
578 config: default
579 split: test
580 revision: 6184bc1440d2dbc7612be22b50686b8826d22b32
581 metrics:
582 - type: ndcg_at_10
583 value: 37.940000000000005
584 - task:
585 type: Retrieval
586 dataset:
587 type: mteb/cqadupstack
588 name: MTEB CQADupstackRetrieval
589 config: default
590 split: test
591 revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4
592 metrics:
593 - type: ndcg_at_10
594 value: 38.12183333333333
595 - task:
596 type: Retrieval
597 dataset:
598 type: mteb/cqadupstack-stats
599 name: MTEB CQADupstackStatsRetrieval
600 config: default
601 split: test
602 revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a
603 metrics:
604 - type: ndcg_at_10
605 value: 32.684000000000005
606 - task:
607 type: Retrieval
608 dataset:
609 type: mteb/cqadupstack-tex
610 name: MTEB CQADupstackTexRetrieval
611 config: default
612 split: test
613 revision: 46989137a86843e03a6195de44b09deda022eec7
614 metrics:
615 - type: ndcg_at_10
616 value: 26.735
617 - task:
618 type: Retrieval
619 dataset:
620 type: mteb/cqadupstack-unix
621 name: MTEB CQADupstackUnixRetrieval
622 config: default
623 split: test
624 revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53
625 metrics:
626 - type: ndcg_at_10
627 value: 36.933
628 - task:
629 type: Retrieval
630 dataset:
631 type: mteb/cqadupstack-webmasters
632 name: MTEB CQADupstackWebmastersRetrieval
633 config: default
634 split: test
635 revision: 160c094312a0e1facb97e55eeddb698c0abe3571
636 metrics:
637 - type: ndcg_at_10
638 value: 33.747
639 - task:
640 type: Retrieval
641 dataset:
642 type: mteb/cqadupstack-wordpress
643 name: MTEB CQADupstackWordpressRetrieval
644 config: default
645 split: test
646 revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4
647 metrics:
648 - type: ndcg_at_10
649 value: 28.872999999999998
650 - task:
651 type: Retrieval
652 dataset:
653 type: mteb/climate-fever
654 name: MTEB ClimateFEVER
655 config: default
656 split: test
657 revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380
658 metrics:
659 - type: ndcg_at_10
660 value: 34.833
661 - task:
662 type: Retrieval
663 dataset:
664 type: C-MTEB/CmedqaRetrieval
665 name: MTEB CmedqaRetrieval
666 config: default
667 split: dev
668 revision: cd540c506dae1cf9e9a59c3e06f42030d54e7301
669 metrics:
670 - type: ndcg_at_10
671 value: 43.78
672 - task:
673 type: PairClassification
674 dataset:
675 type: C-MTEB/CMNLI
676 name: MTEB Cmnli
677 config: default
678 split: validation
679 revision: 41bc36f332156f7adc9e38f53777c959b2ae9766
680 metrics:
681 - type: cos_sim_ap
682 value: 84.00640599186677
683 - task:
684 type: Retrieval
685 dataset:
686 type: C-MTEB/CovidRetrieval
687 name: MTEB CovidRetrieval
688 config: default
689 split: dev
690 revision: 1271c7809071a13532e05f25fb53511ffce77117
691 metrics:
692 - type: ndcg_at_10
693 value: 80.60000000000001
694 - task:
695 type: Retrieval
696 dataset:
697 type: mteb/dbpedia
698 name: MTEB DBPedia
699 config: default
700 split: test
701 revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659
702 metrics:
703 - type: ndcg_at_10
704 value: 40.116
705 - task:
706 type: Retrieval
707 dataset:
708 type: clarin-knext/dbpedia-pl
709 name: MTEB DBPedia-PL
710 config: default
711 split: test
712 revision: 76afe41d9af165cc40999fcaa92312b8b012064a
713 metrics:
714 - type: ndcg_at_10
715 value: 32.498
716 - task:
717 type: Retrieval
718 dataset:
719 type: C-MTEB/DuRetrieval
720 name: MTEB DuRetrieval
721 config: default
722 split: dev
723 revision: a1a333e290fe30b10f3f56498e3a0d911a693ced
724 metrics:
725 - type: ndcg_at_10
726 value: 87.547
727 - task:
728 type: Retrieval
729 dataset:
730 type: C-MTEB/EcomRetrieval
731 name: MTEB EcomRetrieval
732 config: default
733 split: dev
734 revision: 687de13dc7294d6fd9be10c6945f9e8fec8166b9
735 metrics:
736 - type: ndcg_at_10
737 value: 64.85
738 - task:
739 type: Classification
740 dataset:
741 type: mteb/emotion
742 name: MTEB EmotionClassification
743 config: default
744 split: test
745 revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
746 metrics:
747 - type: accuracy
748 value: 47.949999999999996
749 - task:
750 type: Retrieval
751 dataset:
752 type: mteb/fever
753 name: MTEB FEVER
754 config: default
755 split: test
756 revision: bea83ef9e8fb933d90a2f1d5515737465d613e12
757 metrics:
758 - type: ndcg_at_10
759 value: 92.111
760 - task:
761 type: Retrieval
762 dataset:
763 type: clarin-knext/fiqa-pl
764 name: MTEB FiQA-PL
765 config: default
766 split: test
767 revision: 2e535829717f8bf9dc829b7f911cc5bbd4e6608e
768 metrics:
769 - type: ndcg_at_10
770 value: 28.962
771 - task:
772 type: Retrieval
773 dataset:
774 type: mteb/fiqa
775 name: MTEB FiQA2018
776 config: default
777 split: test
778 revision: 27a168819829fe9bcd655c2df245fb19452e8e06
779 metrics:
780 - type: ndcg_at_10
781 value: 45.005
782 - task:
783 type: Clustering
784 dataset:
785 type: lyon-nlp/clustering-hal-s2s
786 name: MTEB HALClusteringS2S
787 config: default
788 split: test
789 revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915
790 metrics:
791 - type: v_measure
792 value: 25.133776435657595
793 - task:
794 type: Retrieval
795 dataset:
796 type: mteb/hotpotqa
797 name: MTEB HotpotQA
798 config: default
799 split: test
800 revision: ab518f4d6fcca38d87c25209f94beba119d02014
801 metrics:
802 - type: ndcg_at_10
803 value: 63.036
804 - task:
805 type: Retrieval
806 dataset:
807 type: clarin-knext/hotpotqa-pl
808 name: MTEB HotpotQA-PL
809 config: default
810 split: test
811 revision: a0bd479ac97b4ccb5bd6ce320c415d0bb4beb907
812 metrics:
813 - type: ndcg_at_10
814 value: 56.904999999999994
815 - task:
816 type: Classification
817 dataset:
818 type: C-MTEB/IFlyTek-classification
819 name: MTEB IFlyTek
820 config: default
821 split: validation
822 revision: 421605374b29664c5fc098418fe20ada9bd55f8a
823 metrics:
824 - type: accuracy
825 value: 44.59407464409388
826 - task:
827 type: Classification
828 dataset:
829 type: mteb/imdb
830 name: MTEB ImdbClassification
831 config: default
832 split: test
833 revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
834 metrics:
835 - type: accuracy
836 value: 74.912
837 - task:
838 type: Classification
839 dataset:
840 type: C-MTEB/JDReview-classification
841 name: MTEB JDReview
842 config: default
843 split: test
844 revision: b7c64bd89eb87f8ded463478346f76731f07bf8b
845 metrics:
846 - type: accuracy
847 value: 79.26829268292683
848 - task:
849 type: STS
850 dataset:
851 type: C-MTEB/LCQMC
852 name: MTEB LCQMC
853 config: default
854 split: test
855 revision: 17f9b096f80380fce5ed12a9be8be7784b337daf
856 metrics:
857 - type: cos_sim_spearman
858 value: 74.8601229809791
859 - task:
860 type: Clustering
861 dataset:
862 type: mlsum
863 name: MTEB MLSUMClusteringP2P
864 config: default
865 split: test
866 revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
867 metrics:
868 - type: v_measure
869 value: 42.331902754246556
870 - task:
871 type: Clustering
872 dataset:
873 type: mlsum
874 name: MTEB MLSUMClusteringS2S
875 config: default
876 split: test
877 revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
878 metrics:
879 - type: v_measure
880 value: 40.92029335502153
881 - task:
882 type: Reranking
883 dataset:
884 type: C-MTEB/Mmarco-reranking
885 name: MTEB MMarcoReranking
886 config: default
887 split: dev
888 revision: 8e0c766dbe9e16e1d221116a3f36795fbade07f6
889 metrics:
890 - type: map
891 value: 32.19266316591337
892 - task:
893 type: Retrieval
894 dataset:
895 type: C-MTEB/MMarcoRetrieval
896 name: MTEB MMarcoRetrieval
897 config: default
898 split: dev
899 revision: 539bbde593d947e2a124ba72651aafc09eb33fc2
900 metrics:
901 - type: ndcg_at_10
902 value: 79.346
903 - task:
904 type: Retrieval
905 dataset:
906 type: mteb/msmarco
907 name: MTEB MSMARCO
908 config: default
909 split: dev
910 revision: c5a29a104738b98a9e76336939199e264163d4a0
911 metrics:
912 - type: ndcg_at_10
913 value: 39.922999999999995
914 - task:
915 type: Retrieval
916 dataset:
917 type: clarin-knext/msmarco-pl
918 name: MTEB MSMARCO-PL
919 config: default
920 split: test
921 revision: 8634c07806d5cce3a6138e260e59b81760a0a640
922 metrics:
923 - type: ndcg_at_10
924 value: 55.620999999999995
925 - task:
926 type: Classification
927 dataset:
928 type: mteb/mtop_domain
929 name: MTEB MTOPDomainClassification (en)
930 config: en
931 split: test
932 revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
933 metrics:
934 - type: accuracy
935 value: 92.53989968080255
936 - task:
937 type: Classification
938 dataset:
939 type: mteb/mtop_domain
940 name: MTEB MTOPDomainClassification (de)
941 config: de
942 split: test
943 revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
944 metrics:
945 - type: accuracy
946 value: 88.26993519301212
947 - task:
948 type: Classification
949 dataset:
950 type: mteb/mtop_domain
951 name: MTEB MTOPDomainClassification (es)
952 config: es
953 split: test
954 revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
955 metrics:
956 - type: accuracy
957 value: 90.87725150100067
958 - task:
959 type: Classification
960 dataset:
961 type: mteb/mtop_domain
962 name: MTEB MTOPDomainClassification (fr)
963 config: fr
964 split: test
965 revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
966 metrics:
967 - type: accuracy
968 value: 87.48512370811149
969 - task:
970 type: Classification
971 dataset:
972 type: mteb/mtop_domain
973 name: MTEB MTOPDomainClassification (hi)
974 config: hi
975 split: test
976 revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
977 metrics:
978 - type: accuracy
979 value: 89.45141627823591
980 - task:
981 type: Classification
982 dataset:
983 type: mteb/mtop_domain
984 name: MTEB MTOPDomainClassification (th)
985 config: th
986 split: test
987 revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
988 metrics:
989 - type: accuracy
990 value: 83.45750452079565
991 - task:
992 type: Classification
993 dataset:
994 type: mteb/mtop_intent
995 name: MTEB MTOPIntentClassification (en)
996 config: en
997 split: test
998 revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
999 metrics:
1000 - type: accuracy
1001 value: 72.57637938896488
1002 - task:
1003 type: Classification
1004 dataset:
1005 type: mteb/mtop_intent
1006 name: MTEB MTOPIntentClassification (de)
1007 config: de
1008 split: test
1009 revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1010 metrics:
1011 - type: accuracy
1012 value: 63.50803043110736
1013 - task:
1014 type: Classification
1015 dataset:
1016 type: mteb/mtop_intent
1017 name: MTEB MTOPIntentClassification (es)
1018 config: es
1019 split: test
1020 revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1021 metrics:
1022 - type: accuracy
1023 value: 71.6577718478986
1024 - task:
1025 type: Classification
1026 dataset:
1027 type: mteb/mtop_intent
1028 name: MTEB MTOPIntentClassification (fr)
1029 config: fr
1030 split: test
1031 revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1032 metrics:
1033 - type: accuracy
1034 value: 64.05887879736925
1035 - task:
1036 type: Classification
1037 dataset:
1038 type: mteb/mtop_intent
1039 name: MTEB MTOPIntentClassification (hi)
1040 config: hi
1041 split: test
1042 revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1043 metrics:
1044 - type: accuracy
1045 value: 65.27070634636071
1046 - task:
1047 type: Classification
1048 dataset:
1049 type: mteb/mtop_intent
1050 name: MTEB MTOPIntentClassification (th)
1051 config: th
1052 split: test
1053 revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1054 metrics:
1055 - type: accuracy
1056 value: 63.04520795660037
1057 - task:
1058 type: Classification
1059 dataset:
1060 type: masakhane/masakhanews
1061 name: MTEB MasakhaNEWSClassification (fra)
1062 config: fra
1063 split: test
1064 revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
1065 metrics:
1066 - type: accuracy
1067 value: 80.66350710900474
1068 - task:
1069 type: Clustering
1070 dataset:
1071 type: masakhane/masakhanews
1072 name: MTEB MasakhaNEWSClusteringP2P (fra)
1073 config: fra
1074 split: test
1075 revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
1076 metrics:
1077 - type: v_measure
1078 value: 44.016506455899425
1079 - task:
1080 type: Clustering
1081 dataset:
1082 type: masakhane/masakhanews
1083 name: MTEB MasakhaNEWSClusteringS2S (fra)
1084 config: fra
1085 split: test
1086 revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
1087 metrics:
1088 - type: v_measure
1089 value: 40.67730129573544
1090 - task:
1091 type: Classification
1092 dataset:
1093 type: mteb/amazon_massive_intent
1094 name: MTEB MassiveIntentClassification (af)
1095 config: af
1096 split: test
1097 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1098 metrics:
1099 - type: accuracy
1100 value: 57.94552790854068
1101 - task:
1102 type: Classification
1103 dataset:
1104 type: mteb/amazon_massive_intent
1105 name: MTEB MassiveIntentClassification (am)
1106 config: am
1107 split: test
1108 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1109 metrics:
1110 - type: accuracy
1111 value: 49.273705447209146
1112 - task:
1113 type: Classification
1114 dataset:
1115 type: mteb/amazon_massive_intent
1116 name: MTEB MassiveIntentClassification (ar)
1117 config: ar
1118 split: test
1119 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1120 metrics:
1121 - type: accuracy
1122 value: 55.490921318090116
1123 - task:
1124 type: Classification
1125 dataset:
1126 type: mteb/amazon_massive_intent
1127 name: MTEB MassiveIntentClassification (az)
1128 config: az
1129 split: test
1130 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1131 metrics:
1132 - type: accuracy
1133 value: 60.97511768661733
1134 - task:
1135 type: Classification
1136 dataset:
1137 type: mteb/amazon_massive_intent
1138 name: MTEB MassiveIntentClassification (bn)
1139 config: bn
1140 split: test
1141 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1142 metrics:
1143 - type: accuracy
1144 value: 57.5689307330195
1145 - task:
1146 type: Classification
1147 dataset:
1148 type: mteb/amazon_massive_intent
1149 name: MTEB MassiveIntentClassification (cy)
1150 config: cy
1151 split: test
1152 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1153 metrics:
1154 - type: accuracy
1155 value: 48.34902488231337
1156 - task:
1157 type: Classification
1158 dataset:
1159 type: mteb/amazon_massive_intent
1160 name: MTEB MassiveIntentClassification (da)
1161 config: da
1162 split: test
1163 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1164 metrics:
1165 - type: accuracy
1166 value: 63.6684599865501
1167 - task:
1168 type: Classification
1169 dataset:
1170 type: mteb/amazon_massive_intent
1171 name: MTEB MassiveIntentClassification (de)
1172 config: de
1173 split: test
1174 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1175 metrics:
1176 - type: accuracy
1177 value: 62.54539340954942
1178 - task:
1179 type: Classification
1180 dataset:
1181 type: mteb/amazon_massive_intent
1182 name: MTEB MassiveIntentClassification (el)
1183 config: el
1184 split: test
1185 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1186 metrics:
1187 - type: accuracy
1188 value: 63.08675184936112
1189 - task:
1190 type: Classification
1191 dataset:
1192 type: mteb/amazon_massive_intent
1193 name: MTEB MassiveIntentClassification (en)
1194 config: en
1195 split: test
1196 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1197 metrics:
1198 - type: accuracy
1199 value: 72.12508406186953
1200 - task:
1201 type: Classification
1202 dataset:
1203 type: mteb/amazon_massive_intent
1204 name: MTEB MassiveIntentClassification (es)
1205 config: es
1206 split: test
1207 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1208 metrics:
1209 - type: accuracy
1210 value: 67.41425689307331
1211 - task:
1212 type: Classification
1213 dataset:
1214 type: mteb/amazon_massive_intent
1215 name: MTEB MassiveIntentClassification (fa)
1216 config: fa
1217 split: test
1218 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1219 metrics:
1220 - type: accuracy
1221 value: 65.59515803631474
1222 - task:
1223 type: Classification
1224 dataset:
1225 type: mteb/amazon_massive_intent
1226 name: MTEB MassiveIntentClassification (fi)
1227 config: fi
1228 split: test
1229 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1230 metrics:
1231 - type: accuracy
1232 value: 62.90517821116342
1233 - task:
1234 type: Classification
1235 dataset:
1236 type: mteb/amazon_massive_intent
1237 name: MTEB MassiveIntentClassification (fr)
1238 config: fr
1239 split: test
1240 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1241 metrics:
1242 - type: accuracy
1243 value: 67.91526563550774
1244 - task:
1245 type: Classification
1246 dataset:
1247 type: mteb/amazon_massive_intent
1248 name: MTEB MassiveIntentClassification (he)
1249 config: he
1250 split: test
1251 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1252 metrics:
1253 - type: accuracy
1254 value: 55.198386012104905
1255 - task:
1256 type: Classification
1257 dataset:
1258 type: mteb/amazon_massive_intent
1259 name: MTEB MassiveIntentClassification (hi)
1260 config: hi
1261 split: test
1262 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1263 metrics:
1264 - type: accuracy
1265 value: 65.04371217215869
1266 - task:
1267 type: Classification
1268 dataset:
1269 type: mteb/amazon_massive_intent
1270 name: MTEB MassiveIntentClassification (hu)
1271 config: hu
1272 split: test
1273 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1274 metrics:
1275 - type: accuracy
1276 value: 63.31203765971756
1277 - task:
1278 type: Classification
1279 dataset:
1280 type: mteb/amazon_massive_intent
1281 name: MTEB MassiveIntentClassification (hy)
1282 config: hy
1283 split: test
1284 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1285 metrics:
1286 - type: accuracy
1287 value: 55.521183591123055
1288 - task:
1289 type: Classification
1290 dataset:
1291 type: mteb/amazon_massive_intent
1292 name: MTEB MassiveIntentClassification (id)
1293 config: id
1294 split: test
1295 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1296 metrics:
1297 - type: accuracy
1298 value: 66.06254203093476
1299 - task:
1300 type: Classification
1301 dataset:
1302 type: mteb/amazon_massive_intent
1303 name: MTEB MassiveIntentClassification (is)
1304 config: is
1305 split: test
1306 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1307 metrics:
1308 - type: accuracy
1309 value: 56.01546738399461
1310 - task:
1311 type: Classification
1312 dataset:
1313 type: mteb/amazon_massive_intent
1314 name: MTEB MassiveIntentClassification (it)
1315 config: it
1316 split: test
1317 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1318 metrics:
1319 - type: accuracy
1320 value: 67.27975790181574
1321 - task:
1322 type: Classification
1323 dataset:
1324 type: mteb/amazon_massive_intent
1325 name: MTEB MassiveIntentClassification (ja)
1326 config: ja
1327 split: test
1328 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1329 metrics:
1330 - type: accuracy
1331 value: 66.79556153328849
1332 - task:
1333 type: Classification
1334 dataset:
1335 type: mteb/amazon_massive_intent
1336 name: MTEB MassiveIntentClassification (jv)
1337 config: jv
1338 split: test
1339 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1340 metrics:
1341 - type: accuracy
1342 value: 50.18493611297915
1343 - task:
1344 type: Classification
1345 dataset:
1346 type: mteb/amazon_massive_intent
1347 name: MTEB MassiveIntentClassification (ka)
1348 config: ka
1349 split: test
1350 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1351 metrics:
1352 - type: accuracy
1353 value: 47.888365837256224
1354 - task:
1355 type: Classification
1356 dataset:
1357 type: mteb/amazon_massive_intent
1358 name: MTEB MassiveIntentClassification (km)
1359 config: km
1360 split: test
1361 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1362 metrics:
1363 - type: accuracy
1364 value: 50.79690652320108
1365 - task:
1366 type: Classification
1367 dataset:
1368 type: mteb/amazon_massive_intent
1369 name: MTEB MassiveIntentClassification (kn)
1370 config: kn
1371 split: test
1372 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1373 metrics:
1374 - type: accuracy
1375 value: 57.225958305312716
1376 - task:
1377 type: Classification
1378 dataset:
1379 type: mteb/amazon_massive_intent
1380 name: MTEB MassiveIntentClassification (ko)
1381 config: ko
1382 split: test
1383 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1384 metrics:
1385 - type: accuracy
1386 value: 64.58641560188299
1387 - task:
1388 type: Classification
1389 dataset:
1390 type: mteb/amazon_massive_intent
1391 name: MTEB MassiveIntentClassification (lv)
1392 config: lv
1393 split: test
1394 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1395 metrics:
1396 - type: accuracy
1397 value: 59.08204438466711
1398 - task:
1399 type: Classification
1400 dataset:
1401 type: mteb/amazon_massive_intent
1402 name: MTEB MassiveIntentClassification (ml)
1403 config: ml
1404 split: test
1405 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1406 metrics:
1407 - type: accuracy
1408 value: 59.54606590450572
1409 - task:
1410 type: Classification
1411 dataset:
1412 type: mteb/amazon_massive_intent
1413 name: MTEB MassiveIntentClassification (mn)
1414 config: mn
1415 split: test
1416 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1417 metrics:
1418 - type: accuracy
1419 value: 53.443174176193665
1420 - task:
1421 type: Classification
1422 dataset:
1423 type: mteb/amazon_massive_intent
1424 name: MTEB MassiveIntentClassification (ms)
1425 config: ms
1426 split: test
1427 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1428 metrics:
1429 - type: accuracy
1430 value: 61.65097511768661
1431 - task:
1432 type: Classification
1433 dataset:
1434 type: mteb/amazon_massive_intent
1435 name: MTEB MassiveIntentClassification (my)
1436 config: my
1437 split: test
1438 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1439 metrics:
1440 - type: accuracy
1441 value: 53.45662407531944
1442 - task:
1443 type: Classification
1444 dataset:
1445 type: mteb/amazon_massive_intent
1446 name: MTEB MassiveIntentClassification (nb)
1447 config: nb
1448 split: test
1449 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1450 metrics:
1451 - type: accuracy
1452 value: 63.739071956960316
1453 - task:
1454 type: Classification
1455 dataset:
1456 type: mteb/amazon_massive_intent
1457 name: MTEB MassiveIntentClassification (nl)
1458 config: nl
1459 split: test
1460 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1461 metrics:
1462 - type: accuracy
1463 value: 66.36180228648286
1464 - task:
1465 type: Classification
1466 dataset:
1467 type: mteb/amazon_massive_intent
1468 name: MTEB MassiveIntentClassification (pl)
1469 config: pl
1470 split: test
1471 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1472 metrics:
1473 - type: accuracy
1474 value: 66.3920645595158
1475 - task:
1476 type: Classification
1477 dataset:
1478 type: mteb/amazon_massive_intent
1479 name: MTEB MassiveIntentClassification (pt)
1480 config: pt
1481 split: test
1482 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1483 metrics:
1484 - type: accuracy
1485 value: 68.06993947545395
1486 - task:
1487 type: Classification
1488 dataset:
1489 type: mteb/amazon_massive_intent
1490 name: MTEB MassiveIntentClassification (ro)
1491 config: ro
1492 split: test
1493 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1494 metrics:
1495 - type: accuracy
1496 value: 63.123739071956955
1497 - task:
1498 type: Classification
1499 dataset:
1500 type: mteb/amazon_massive_intent
1501 name: MTEB MassiveIntentClassification (ru)
1502 config: ru
1503 split: test
1504 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1505 metrics:
1506 - type: accuracy
1507 value: 67.46133154001346
1508 - task:
1509 type: Classification
1510 dataset:
1511 type: mteb/amazon_massive_intent
1512 name: MTEB MassiveIntentClassification (sl)
1513 config: sl
1514 split: test
1515 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1516 metrics:
1517 - type: accuracy
1518 value: 60.54472091459314
1519 - task:
1520 type: Classification
1521 dataset:
1522 type: mteb/amazon_massive_intent
1523 name: MTEB MassiveIntentClassification (sq)
1524 config: sq
1525 split: test
1526 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1527 metrics:
1528 - type: accuracy
1529 value: 58.204438466711494
1530 - task:
1531 type: Classification
1532 dataset:
1533 type: mteb/amazon_massive_intent
1534 name: MTEB MassiveIntentClassification (sv)
1535 config: sv
1536 split: test
1537 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1538 metrics:
1539 - type: accuracy
1540 value: 65.69603227975792
1541 - task:
1542 type: Classification
1543 dataset:
1544 type: mteb/amazon_massive_intent
1545 name: MTEB MassiveIntentClassification (sw)
1546 config: sw
1547 split: test
1548 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1549 metrics:
1550 - type: accuracy
1551 value: 51.684599865501
1552 - task:
1553 type: Classification
1554 dataset:
1555 type: mteb/amazon_massive_intent
1556 name: MTEB MassiveIntentClassification (ta)
1557 config: ta
1558 split: test
1559 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1560 metrics:
1561 - type: accuracy
1562 value: 58.523873570948226
1563 - task:
1564 type: Classification
1565 dataset:
1566 type: mteb/amazon_massive_intent
1567 name: MTEB MassiveIntentClassification (te)
1568 config: te
1569 split: test
1570 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1571 metrics:
1572 - type: accuracy
1573 value: 58.53396099529253
1574 - task:
1575 type: Classification
1576 dataset:
1577 type: mteb/amazon_massive_intent
1578 name: MTEB MassiveIntentClassification (th)
1579 config: th
1580 split: test
1581 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1582 metrics:
1583 - type: accuracy
1584 value: 61.88298587760591
1585 - task:
1586 type: Classification
1587 dataset:
1588 type: mteb/amazon_massive_intent
1589 name: MTEB MassiveIntentClassification (tl)
1590 config: tl
1591 split: test
1592 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1593 metrics:
1594 - type: accuracy
1595 value: 56.65097511768662
1596 - task:
1597 type: Classification
1598 dataset:
1599 type: mteb/amazon_massive_intent
1600 name: MTEB MassiveIntentClassification (tr)
1601 config: tr
1602 split: test
1603 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1604 metrics:
1605 - type: accuracy
1606 value: 64.8453261600538
1607 - task:
1608 type: Classification
1609 dataset:
1610 type: mteb/amazon_massive_intent
1611 name: MTEB MassiveIntentClassification (ur)
1612 config: ur
1613 split: test
1614 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1615 metrics:
1616 - type: accuracy
1617 value: 58.6247478143914
1618 - task:
1619 type: Classification
1620 dataset:
1621 type: mteb/amazon_massive_intent
1622 name: MTEB MassiveIntentClassification (vi)
1623 config: vi
1624 split: test
1625 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1626 metrics:
1627 - type: accuracy
1628 value: 64.16274377942166
1629 - task:
1630 type: Classification
1631 dataset:
1632 type: mteb/amazon_massive_intent
1633 name: MTEB MassiveIntentClassification (zh-CN)
1634 config: zh-CN
1635 split: test
1636 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1637 metrics:
1638 - type: accuracy
1639 value: 69.61667787491594
1640 - task:
1641 type: Classification
1642 dataset:
1643 type: mteb/amazon_massive_intent
1644 name: MTEB MassiveIntentClassification (zh-TW)
1645 config: zh-TW
1646 split: test
1647 revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1648 metrics:
1649 - type: accuracy
1650 value: 64.17283120376598
1651 - task:
1652 type: Classification
1653 dataset:
1654 type: mteb/amazon_massive_scenario
1655 name: MTEB MassiveScenarioClassification (af)
1656 config: af
1657 split: test
1658 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1659 metrics:
1660 - type: accuracy
1661 value: 64.89912575655683
1662 - task:
1663 type: Classification
1664 dataset:
1665 type: mteb/amazon_massive_scenario
1666 name: MTEB MassiveScenarioClassification (am)
1667 config: am
1668 split: test
1669 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1670 metrics:
1671 - type: accuracy
1672 value: 57.27975790181573
1673 - task:
1674 type: Classification
1675 dataset:
1676 type: mteb/amazon_massive_scenario
1677 name: MTEB MassiveScenarioClassification (ar)
1678 config: ar
1679 split: test
1680 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1681 metrics:
1682 - type: accuracy
1683 value: 62.269670477471415
1684 - task:
1685 type: Classification
1686 dataset:
1687 type: mteb/amazon_massive_scenario
1688 name: MTEB MassiveScenarioClassification (az)
1689 config: az
1690 split: test
1691 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1692 metrics:
1693 - type: accuracy
1694 value: 65.10423671822461
1695 - task:
1696 type: Classification
1697 dataset:
1698 type: mteb/amazon_massive_scenario
1699 name: MTEB MassiveScenarioClassification (bn)
1700 config: bn
1701 split: test
1702 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1703 metrics:
1704 - type: accuracy
1705 value: 62.40753194351043
1706 - task:
1707 type: Classification
1708 dataset:
1709 type: mteb/amazon_massive_scenario
1710 name: MTEB MassiveScenarioClassification (cy)
1711 config: cy
1712 split: test
1713 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1714 metrics:
1715 - type: accuracy
1716 value: 55.369872225958304
1717 - task:
1718 type: Classification
1719 dataset:
1720 type: mteb/amazon_massive_scenario
1721 name: MTEB MassiveScenarioClassification (da)
1722 config: da
1723 split: test
1724 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1725 metrics:
1726 - type: accuracy
1727 value: 71.60726294552792
1728 - task:
1729 type: Classification
1730 dataset:
1731 type: mteb/amazon_massive_scenario
1732 name: MTEB MassiveScenarioClassification (de)
1733 config: de
1734 split: test
1735 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1736 metrics:
1737 - type: accuracy
1738 value: 70.30262273032952
1739 - task:
1740 type: Classification
1741 dataset:
1742 type: mteb/amazon_massive_scenario
1743 name: MTEB MassiveScenarioClassification (el)
1744 config: el
1745 split: test
1746 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1747 metrics:
1748 - type: accuracy
1749 value: 69.52925353059851
1750 - task:
1751 type: Classification
1752 dataset:
1753 type: mteb/amazon_massive_scenario
1754 name: MTEB MassiveScenarioClassification (en)
1755 config: en
1756 split: test
1757 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1758 metrics:
1759 - type: accuracy
1760 value: 76.28446536650976
1761 - task:
1762 type: Classification
1763 dataset:
1764 type: mteb/amazon_massive_scenario
1765 name: MTEB MassiveScenarioClassification (es)
1766 config: es
1767 split: test
1768 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1769 metrics:
1770 - type: accuracy
1771 value: 72.45460659045058
1772 - task:
1773 type: Classification
1774 dataset:
1775 type: mteb/amazon_massive_scenario
1776 name: MTEB MassiveScenarioClassification (fa)
1777 config: fa
1778 split: test
1779 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1780 metrics:
1781 - type: accuracy
1782 value: 70.26563550773368
1783 - task:
1784 type: Classification
1785 dataset:
1786 type: mteb/amazon_massive_scenario
1787 name: MTEB MassiveScenarioClassification (fi)
1788 config: fi
1789 split: test
1790 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1791 metrics:
1792 - type: accuracy
1793 value: 67.20578345662408
1794 - task:
1795 type: Classification
1796 dataset:
1797 type: mteb/amazon_massive_scenario
1798 name: MTEB MassiveScenarioClassification (fr)
1799 config: fr
1800 split: test
1801 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1802 metrics:
1803 - type: accuracy
1804 value: 72.64963012777405
1805 - task:
1806 type: Classification
1807 dataset:
1808 type: mteb/amazon_massive_scenario
1809 name: MTEB MassiveScenarioClassification (he)
1810 config: he
1811 split: test
1812 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1813 metrics:
1814 - type: accuracy
1815 value: 61.698049764626774
1816 - task:
1817 type: Classification
1818 dataset:
1819 type: mteb/amazon_massive_scenario
1820 name: MTEB MassiveScenarioClassification (hi)
1821 config: hi
1822 split: test
1823 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1824 metrics:
1825 - type: accuracy
1826 value: 70.14458641560188
1827 - task:
1828 type: Classification
1829 dataset:
1830 type: mteb/amazon_massive_scenario
1831 name: MTEB MassiveScenarioClassification (hu)
1832 config: hu
1833 split: test
1834 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1835 metrics:
1836 - type: accuracy
1837 value: 70.51445864156018
1838 - task:
1839 type: Classification
1840 dataset:
1841 type: mteb/amazon_massive_scenario
1842 name: MTEB MassiveScenarioClassification (hy)
1843 config: hy
1844 split: test
1845 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1846 metrics:
1847 - type: accuracy
1848 value: 60.13786146603901
1849 - task:
1850 type: Classification
1851 dataset:
1852 type: mteb/amazon_massive_scenario
1853 name: MTEB MassiveScenarioClassification (id)
1854 config: id
1855 split: test
1856 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1857 metrics:
1858 - type: accuracy
1859 value: 70.61533288500337
1860 - task:
1861 type: Classification
1862 dataset:
1863 type: mteb/amazon_massive_scenario
1864 name: MTEB MassiveScenarioClassification (is)
1865 config: is
1866 split: test
1867 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1868 metrics:
1869 - type: accuracy
1870 value: 61.526563550773375
1871 - task:
1872 type: Classification
1873 dataset:
1874 type: mteb/amazon_massive_scenario
1875 name: MTEB MassiveScenarioClassification (it)
1876 config: it
1877 split: test
1878 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1879 metrics:
1880 - type: accuracy
1881 value: 71.99731002017484
1882 - task:
1883 type: Classification
1884 dataset:
1885 type: mteb/amazon_massive_scenario
1886 name: MTEB MassiveScenarioClassification (ja)
1887 config: ja
1888 split: test
1889 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1890 metrics:
1891 - type: accuracy
1892 value: 71.59381304640216
1893 - task:
1894 type: Classification
1895 dataset:
1896 type: mteb/amazon_massive_scenario
1897 name: MTEB MassiveScenarioClassification (jv)
1898 config: jv
1899 split: test
1900 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1901 metrics:
1902 - type: accuracy
1903 value: 57.010759919300604
1904 - task:
1905 type: Classification
1906 dataset:
1907 type: mteb/amazon_massive_scenario
1908 name: MTEB MassiveScenarioClassification (ka)
1909 config: ka
1910 split: test
1911 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1912 metrics:
1913 - type: accuracy
1914 value: 53.26160053799597
1915 - task:
1916 type: Classification
1917 dataset:
1918 type: mteb/amazon_massive_scenario
1919 name: MTEB MassiveScenarioClassification (km)
1920 config: km
1921 split: test
1922 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1923 metrics:
1924 - type: accuracy
1925 value: 57.800941492938804
1926 - task:
1927 type: Classification
1928 dataset:
1929 type: mteb/amazon_massive_scenario
1930 name: MTEB MassiveScenarioClassification (kn)
1931 config: kn
1932 split: test
1933 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1934 metrics:
1935 - type: accuracy
1936 value: 62.387357094821795
1937 - task:
1938 type: Classification
1939 dataset:
1940 type: mteb/amazon_massive_scenario
1941 name: MTEB MassiveScenarioClassification (ko)
1942 config: ko
1943 split: test
1944 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1945 metrics:
1946 - type: accuracy
1947 value: 69.5359784801614
1948 - task:
1949 type: Classification
1950 dataset:
1951 type: mteb/amazon_massive_scenario
1952 name: MTEB MassiveScenarioClassification (lv)
1953 config: lv
1954 split: test
1955 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1956 metrics:
1957 - type: accuracy
1958 value: 63.36919973100203
1959 - task:
1960 type: Classification
1961 dataset:
1962 type: mteb/amazon_massive_scenario
1963 name: MTEB MassiveScenarioClassification (ml)
1964 config: ml
1965 split: test
1966 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1967 metrics:
1968 - type: accuracy
1969 value: 64.81506388702084
1970 - task:
1971 type: Classification
1972 dataset:
1973 type: mteb/amazon_massive_scenario
1974 name: MTEB MassiveScenarioClassification (mn)
1975 config: mn
1976 split: test
1977 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1978 metrics:
1979 - type: accuracy
1980 value: 59.35104236718225
1981 - task:
1982 type: Classification
1983 dataset:
1984 type: mteb/amazon_massive_scenario
1985 name: MTEB MassiveScenarioClassification (ms)
1986 config: ms
1987 split: test
1988 revision: 7d571f92784cd94a019292a1f45445077d0ef634
1989 metrics:
1990 - type: accuracy
1991 value: 66.67787491593813
1992 - task:
1993 type: Classification
1994 dataset:
1995 type: mteb/amazon_massive_scenario
1996 name: MTEB MassiveScenarioClassification (my)
1997 config: my
1998 split: test
1999 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2000 metrics:
2001 - type: accuracy
2002 value: 59.4250168123739
2003 - task:
2004 type: Classification
2005 dataset:
2006 type: mteb/amazon_massive_scenario
2007 name: MTEB MassiveScenarioClassification (nb)
2008 config: nb
2009 split: test
2010 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2011 metrics:
2012 - type: accuracy
2013 value: 71.49630127774043
2014 - task:
2015 type: Classification
2016 dataset:
2017 type: mteb/amazon_massive_scenario
2018 name: MTEB MassiveScenarioClassification (nl)
2019 config: nl
2020 split: test
2021 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2022 metrics:
2023 - type: accuracy
2024 value: 71.95696032279758
2025 - task:
2026 type: Classification
2027 dataset:
2028 type: mteb/amazon_massive_scenario
2029 name: MTEB MassiveScenarioClassification (pl)
2030 config: pl
2031 split: test
2032 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2033 metrics:
2034 - type: accuracy
2035 value: 70.11768661735036
2036 - task:
2037 type: Classification
2038 dataset:
2039 type: mteb/amazon_massive_scenario
2040 name: MTEB MassiveScenarioClassification (pt)
2041 config: pt
2042 split: test
2043 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2044 metrics:
2045 - type: accuracy
2046 value: 71.86953597848016
2047 - task:
2048 type: Classification
2049 dataset:
2050 type: mteb/amazon_massive_scenario
2051 name: MTEB MassiveScenarioClassification (ro)
2052 config: ro
2053 split: test
2054 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2055 metrics:
2056 - type: accuracy
2057 value: 68.51042367182247
2058 - task:
2059 type: Classification
2060 dataset:
2061 type: mteb/amazon_massive_scenario
2062 name: MTEB MassiveScenarioClassification (ru)
2063 config: ru
2064 split: test
2065 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2066 metrics:
2067 - type: accuracy
2068 value: 71.65097511768661
2069 - task:
2070 type: Classification
2071 dataset:
2072 type: mteb/amazon_massive_scenario
2073 name: MTEB MassiveScenarioClassification (sl)
2074 config: sl
2075 split: test
2076 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2077 metrics:
2078 - type: accuracy
2079 value: 66.81573638197713
2080 - task:
2081 type: Classification
2082 dataset:
2083 type: mteb/amazon_massive_scenario
2084 name: MTEB MassiveScenarioClassification (sq)
2085 config: sq
2086 split: test
2087 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2088 metrics:
2089 - type: accuracy
2090 value: 65.26227303295225
2091 - task:
2092 type: Classification
2093 dataset:
2094 type: mteb/amazon_massive_scenario
2095 name: MTEB MassiveScenarioClassification (sv)
2096 config: sv
2097 split: test
2098 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2099 metrics:
2100 - type: accuracy
2101 value: 72.51513113651646
2102 - task:
2103 type: Classification
2104 dataset:
2105 type: mteb/amazon_massive_scenario
2106 name: MTEB MassiveScenarioClassification (sw)
2107 config: sw
2108 split: test
2109 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2110 metrics:
2111 - type: accuracy
2112 value: 58.29858776059179
2113 - task:
2114 type: Classification
2115 dataset:
2116 type: mteb/amazon_massive_scenario
2117 name: MTEB MassiveScenarioClassification (ta)
2118 config: ta
2119 split: test
2120 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2121 metrics:
2122 - type: accuracy
2123 value: 62.72696704774714
2124 - task:
2125 type: Classification
2126 dataset:
2127 type: mteb/amazon_massive_scenario
2128 name: MTEB MassiveScenarioClassification (te)
2129 config: te
2130 split: test
2131 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2132 metrics:
2133 - type: accuracy
2134 value: 66.57700067249496
2135 - task:
2136 type: Classification
2137 dataset:
2138 type: mteb/amazon_massive_scenario
2139 name: MTEB MassiveScenarioClassification (th)
2140 config: th
2141 split: test
2142 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2143 metrics:
2144 - type: accuracy
2145 value: 68.22797579018157
2146 - task:
2147 type: Classification
2148 dataset:
2149 type: mteb/amazon_massive_scenario
2150 name: MTEB MassiveScenarioClassification (tl)
2151 config: tl
2152 split: test
2153 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2154 metrics:
2155 - type: accuracy
2156 value: 61.97041022192333
2157 - task:
2158 type: Classification
2159 dataset:
2160 type: mteb/amazon_massive_scenario
2161 name: MTEB MassiveScenarioClassification (tr)
2162 config: tr
2163 split: test
2164 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2165 metrics:
2166 - type: accuracy
2167 value: 70.72629455279085
2168 - task:
2169 type: Classification
2170 dataset:
2171 type: mteb/amazon_massive_scenario
2172 name: MTEB MassiveScenarioClassification (ur)
2173 config: ur
2174 split: test
2175 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2176 metrics:
2177 - type: accuracy
2178 value: 63.16072629455278
2179 - task:
2180 type: Classification
2181 dataset:
2182 type: mteb/amazon_massive_scenario
2183 name: MTEB MassiveScenarioClassification (vi)
2184 config: vi
2185 split: test
2186 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2187 metrics:
2188 - type: accuracy
2189 value: 67.92199058507062
2190 - task:
2191 type: Classification
2192 dataset:
2193 type: mteb/amazon_massive_scenario
2194 name: MTEB MassiveScenarioClassification (zh-CN)
2195 config: zh-CN
2196 split: test
2197 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2198 metrics:
2199 - type: accuracy
2200 value: 74.40484196368527
2201 - task:
2202 type: Classification
2203 dataset:
2204 type: mteb/amazon_massive_scenario
2205 name: MTEB MassiveScenarioClassification (zh-TW)
2206 config: zh-TW
2207 split: test
2208 revision: 7d571f92784cd94a019292a1f45445077d0ef634
2209 metrics:
2210 - type: accuracy
2211 value: 71.61398789509079
2212 - task:
2213 type: Retrieval
2214 dataset:
2215 type: C-MTEB/MedicalRetrieval
2216 name: MTEB MedicalRetrieval
2217 config: default
2218 split: dev
2219 revision: 2039188fb5800a9803ba5048df7b76e6fb151fc6
2220 metrics:
2221 - type: ndcg_at_10
2222 value: 61.934999999999995
2223 - task:
2224 type: Clustering
2225 dataset:
2226 type: mteb/medrxiv-clustering-p2p
2227 name: MTEB MedrxivClusteringP2P
2228 config: default
2229 split: test
2230 revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
2231 metrics:
2232 - type: v_measure
2233 value: 33.052031054565205
2234 - task:
2235 type: Clustering
2236 dataset:
2237 type: mteb/medrxiv-clustering-s2s
2238 name: MTEB MedrxivClusteringS2S
2239 config: default
2240 split: test
2241 revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
2242 metrics:
2243 - type: v_measure
2244 value: 31.969909524076794
2245 - task:
2246 type: Reranking
2247 dataset:
2248 type: mteb/mind_small
2249 name: MTEB MindSmallReranking
2250 config: default
2251 split: test
2252 revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
2253 metrics:
2254 - type: map
2255 value: 31.7530992892652
2256 - task:
2257 type: Retrieval
2258 dataset:
2259 type: jinaai/mintakaqa
2260 name: MTEB MintakaRetrieval (fr)
2261 config: fr
2262 split: test
2263 revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
2264 metrics:
2265 - type: ndcg_at_10
2266 value: 34.705999999999996
2267 - task:
2268 type: Retrieval
2269 dataset:
2270 type: Shitao/MLDR
2271 name: MTEB MultiLongDocRetrieval (ar)
2272 config: ar
2273 split: test
2274 revision: None
2275 metrics:
2276 - type: ndcg_at_10
2277 value: 55.166000000000004
2278 - task:
2279 type: Retrieval
2280 dataset:
2281 type: Shitao/MLDR
2282 name: MTEB MultiLongDocRetrieval (de)
2283 config: de
2284 split: test
2285 revision: None
2286 metrics:
2287 - type: ndcg_at_10
2288 value: 55.155
2289 - task:
2290 type: Retrieval
2291 dataset:
2292 type: Shitao/MLDR
2293 name: MTEB MultiLongDocRetrieval (en)
2294 config: en
2295 split: test
2296 revision: None
2297 metrics:
2298 - type: ndcg_at_10
2299 value: 50.993
2300 - task:
2301 type: Retrieval
2302 dataset:
2303 type: Shitao/MLDR
2304 name: MTEB MultiLongDocRetrieval (es)
2305 config: es
2306 split: test
2307 revision: None
2308 metrics:
2309 - type: ndcg_at_10
2310 value: 81.228
2311 - task:
2312 type: Retrieval
2313 dataset:
2314 type: Shitao/MLDR
2315 name: MTEB MultiLongDocRetrieval (fr)
2316 config: fr
2317 split: test
2318 revision: None
2319 metrics:
2320 - type: ndcg_at_10
2321 value: 76.19
2322 - task:
2323 type: Retrieval
2324 dataset:
2325 type: Shitao/MLDR
2326 name: MTEB MultiLongDocRetrieval (hi)
2327 config: hi
2328 split: test
2329 revision: None
2330 metrics:
2331 - type: ndcg_at_10
2332 value: 45.206
2333 - task:
2334 type: Retrieval
2335 dataset:
2336 type: Shitao/MLDR
2337 name: MTEB MultiLongDocRetrieval (it)
2338 config: it
2339 split: test
2340 revision: None
2341 metrics:
2342 - type: ndcg_at_10
2343 value: 66.741
2344 - task:
2345 type: Retrieval
2346 dataset:
2347 type: Shitao/MLDR
2348 name: MTEB MultiLongDocRetrieval (ja)
2349 config: ja
2350 split: test
2351 revision: None
2352 metrics:
2353 - type: ndcg_at_10
2354 value: 52.111
2355 - task:
2356 type: Retrieval
2357 dataset:
2358 type: Shitao/MLDR
2359 name: MTEB MultiLongDocRetrieval (ko)
2360 config: ko
2361 split: test
2362 revision: None
2363 metrics:
2364 - type: ndcg_at_10
2365 value: 46.733000000000004
2366 - task:
2367 type: Retrieval
2368 dataset:
2369 type: Shitao/MLDR
2370 name: MTEB MultiLongDocRetrieval (pt)
2371 config: pt
2372 split: test
2373 revision: None
2374 metrics:
2375 - type: ndcg_at_10
2376 value: 79.105
2377 - task:
2378 type: Retrieval
2379 dataset:
2380 type: Shitao/MLDR
2381 name: MTEB MultiLongDocRetrieval (ru)
2382 config: ru
2383 split: test
2384 revision: None
2385 metrics:
2386 - type: ndcg_at_10
2387 value: 64.21
2388 - task:
2389 type: Retrieval
2390 dataset:
2391 type: Shitao/MLDR
2392 name: MTEB MultiLongDocRetrieval (th)
2393 config: th
2394 split: test
2395 revision: None
2396 metrics:
2397 - type: ndcg_at_10
2398 value: 35.467
2399 - task:
2400 type: Retrieval
2401 dataset:
2402 type: Shitao/MLDR
2403 name: MTEB MultiLongDocRetrieval (zh)
2404 config: zh
2405 split: test
2406 revision: None
2407 metrics:
2408 - type: ndcg_at_10
2409 value: 27.419
2410 - task:
2411 type: Classification
2412 dataset:
2413 type: C-MTEB/MultilingualSentiment-classification
2414 name: MTEB MultilingualSentiment
2415 config: default
2416 split: validation
2417 revision: 46958b007a63fdbf239b7672c25d0bea67b5ea1a
2418 metrics:
2419 - type: accuracy
2420 value: 61.02000000000001
2421 - task:
2422 type: Retrieval
2423 dataset:
2424 type: mteb/nfcorpus
2425 name: MTEB NFCorpus
2426 config: default
2427 split: test
2428 revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
2429 metrics:
2430 - type: ndcg_at_10
2431 value: 36.65
2432 - task:
2433 type: Retrieval
2434 dataset:
2435 type: clarin-knext/nfcorpus-pl
2436 name: MTEB NFCorpus-PL
2437 config: default
2438 split: test
2439 revision: 9a6f9567fda928260afed2de480d79c98bf0bec0
2440 metrics:
2441 - type: ndcg_at_10
2442 value: 26.831
2443 - task:
2444 type: Retrieval
2445 dataset:
2446 type: mteb/nq
2447 name: MTEB NQ
2448 config: default
2449 split: test
2450 revision: b774495ed302d8c44a3a7ea25c90dbce03968f31
2451 metrics:
2452 - type: ndcg_at_10
2453 value: 58.111000000000004
2454 - task:
2455 type: Retrieval
2456 dataset:
2457 type: clarin-knext/nq-pl
2458 name: MTEB NQ-PL
2459 config: default
2460 split: test
2461 revision: f171245712cf85dd4700b06bef18001578d0ca8d
2462 metrics:
2463 - type: ndcg_at_10
2464 value: 43.126999999999995
2465 - task:
2466 type: PairClassification
2467 dataset:
2468 type: C-MTEB/OCNLI
2469 name: MTEB Ocnli
2470 config: default
2471 split: validation
2472 revision: 66e76a618a34d6d565d5538088562851e6daa7ec
2473 metrics:
2474 - type: cos_sim_ap
2475 value: 72.67630697316041
2476 - task:
2477 type: Classification
2478 dataset:
2479 type: C-MTEB/OnlineShopping-classification
2480 name: MTEB OnlineShopping
2481 config: default
2482 split: test
2483 revision: e610f2ebd179a8fda30ae534c3878750a96db120
2484 metrics:
2485 - type: accuracy
2486 value: 84.85000000000001
2487 - task:
2488 type: PairClassification
2489 dataset:
2490 type: GEM/opusparcus
2491 name: MTEB OpusparcusPC (fr)
2492 config: fr
2493 split: test
2494 revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a
2495 metrics:
2496 - type: cos_sim_ap
2497 value: 100
2498 - task:
2499 type: Classification
2500 dataset:
2501 type: laugustyniak/abusive-clauses-pl
2502 name: MTEB PAC
2503 config: default
2504 split: test
2505 revision: None
2506 metrics:
2507 - type: accuracy
2508 value: 65.99189110918043
2509 - task:
2510 type: STS
2511 dataset:
2512 type: C-MTEB/PAWSX
2513 name: MTEB PAWSX
2514 config: default
2515 split: test
2516 revision: 9c6a90e430ac22b5779fb019a23e820b11a8b5e1
2517 metrics:
2518 - type: cos_sim_spearman
2519 value: 16.124364530596228
2520 - task:
2521 type: PairClassification
2522 dataset:
2523 type: PL-MTEB/ppc-pairclassification
2524 name: MTEB PPC
2525 config: default
2526 split: test
2527 revision: None
2528 metrics:
2529 - type: cos_sim_ap
2530 value: 92.43431057460192
2531 - task:
2532 type: PairClassification
2533 dataset:
2534 type: PL-MTEB/psc-pairclassification
2535 name: MTEB PSC
2536 config: default
2537 split: test
2538 revision: None
2539 metrics:
2540 - type: cos_sim_ap
2541 value: 99.06090138049724
2542 - task:
2543 type: PairClassification
2544 dataset:
2545 type: paws-x
2546 name: MTEB PawsX (fr)
2547 config: fr
2548 split: test
2549 revision: 8a04d940a42cd40658986fdd8e3da561533a3646
2550 metrics:
2551 - type: cos_sim_ap
2552 value: 58.9314954874314
2553 - task:
2554 type: Classification
2555 dataset:
2556 type: PL-MTEB/polemo2_in
2557 name: MTEB PolEmo2.0-IN
2558 config: default
2559 split: test
2560 revision: None
2561 metrics:
2562 - type: accuracy
2563 value: 69.59833795013851
2564 - task:
2565 type: Classification
2566 dataset:
2567 type: PL-MTEB/polemo2_out
2568 name: MTEB PolEmo2.0-OUT
2569 config: default
2570 split: test
2571 revision: None
2572 metrics:
2573 - type: accuracy
2574 value: 44.73684210526315
2575 - task:
2576 type: STS
2577 dataset:
2578 type: C-MTEB/QBQTC
2579 name: MTEB QBQTC
2580 config: default
2581 split: test
2582 revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7
2583 metrics:
2584 - type: cos_sim_spearman
2585 value: 39.36450754137984
2586 - task:
2587 type: Retrieval
2588 dataset:
2589 type: clarin-knext/quora-pl
2590 name: MTEB Quora-PL
2591 config: default
2592 split: test
2593 revision: 0be27e93455051e531182b85e85e425aba12e9d4
2594 metrics:
2595 - type: ndcg_at_10
2596 value: 80.76299999999999
2597 - task:
2598 type: Retrieval
2599 dataset:
2600 type: mteb/quora
2601 name: MTEB QuoraRetrieval
2602 config: default
2603 split: test
2604 revision: None
2605 metrics:
2606 - type: ndcg_at_10
2607 value: 88.022
2608 - task:
2609 type: Clustering
2610 dataset:
2611 type: mteb/reddit-clustering
2612 name: MTEB RedditClustering
2613 config: default
2614 split: test
2615 revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
2616 metrics:
2617 - type: v_measure
2618 value: 55.719165988934385
2619 - task:
2620 type: Clustering
2621 dataset:
2622 type: mteb/reddit-clustering-p2p
2623 name: MTEB RedditClusteringP2P
2624 config: default
2625 split: test
2626 revision: 282350215ef01743dc01b456c7f5241fa8937f16
2627 metrics:
2628 - type: v_measure
2629 value: 62.25390069273025
2630 - task:
2631 type: Retrieval
2632 dataset:
2633 type: mteb/scidocs
2634 name: MTEB SCIDOCS
2635 config: default
2636 split: test
2637 revision: None
2638 metrics:
2639 - type: ndcg_at_10
2640 value: 18.243000000000002
2641 - task:
2642 type: Retrieval
2643 dataset:
2644 type: clarin-knext/scidocs-pl
2645 name: MTEB SCIDOCS-PL
2646 config: default
2647 split: test
2648 revision: 45452b03f05560207ef19149545f168e596c9337
2649 metrics:
2650 - type: ndcg_at_10
2651 value: 14.219000000000001
2652 - task:
2653 type: PairClassification
2654 dataset:
2655 type: PL-MTEB/sicke-pl-pairclassification
2656 name: MTEB SICK-E-PL
2657 config: default
2658 split: test
2659 revision: None
2660 metrics:
2661 - type: cos_sim_ap
2662 value: 75.4022630307816
2663 - task:
2664 type: STS
2665 dataset:
2666 type: mteb/sickr-sts
2667 name: MTEB SICK-R
2668 config: default
2669 split: test
2670 revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
2671 metrics:
2672 - type: cos_sim_spearman
2673 value: 79.34269390198548
2674 - task:
2675 type: STS
2676 dataset:
2677 type: PL-MTEB/sickr-pl-sts
2678 name: MTEB SICK-R-PL
2679 config: default
2680 split: test
2681 revision: None
2682 metrics:
2683 - type: cos_sim_spearman
2684 value: 74.0651660446132
2685 - task:
2686 type: STS
2687 dataset:
2688 type: Lajavaness/SICK-fr
2689 name: MTEB SICKFr
2690 config: default
2691 split: test
2692 revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a
2693 metrics:
2694 - type: cos_sim_spearman
2695 value: 78.62693119733123
2696 - task:
2697 type: STS
2698 dataset:
2699 type: mteb/sts12-sts
2700 name: MTEB STS12
2701 config: default
2702 split: test
2703 revision: a0d554a64d88156834ff5ae9920b964011b16384
2704 metrics:
2705 - type: cos_sim_spearman
2706 value: 77.50660544631359
2707 - task:
2708 type: STS
2709 dataset:
2710 type: mteb/sts13-sts
2711 name: MTEB STS13
2712 config: default
2713 split: test
2714 revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
2715 metrics:
2716 - type: cos_sim_spearman
2717 value: 85.55415077723738
2718 - task:
2719 type: STS
2720 dataset:
2721 type: mteb/sts14-sts
2722 name: MTEB STS14
2723 config: default
2724 split: test
2725 revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2726 metrics:
2727 - type: cos_sim_spearman
2728 value: 81.67550814479077
2729 - task:
2730 type: STS
2731 dataset:
2732 type: mteb/sts15-sts
2733 name: MTEB STS15
2734 config: default
2735 split: test
2736 revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2737 metrics:
2738 - type: cos_sim_spearman
2739 value: 88.94601412322764
2740 - task:
2741 type: STS
2742 dataset:
2743 type: mteb/sts16-sts
2744 name: MTEB STS16
2745 config: default
2746 split: test
2747 revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2748 metrics:
2749 - type: cos_sim_spearman
2750 value: 84.33844259337481
2751 - task:
2752 type: STS
2753 dataset:
2754 type: mteb/sts17-crosslingual-sts
2755 name: MTEB STS17 (ko-ko)
2756 config: ko-ko
2757 split: test
2758 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2759 metrics:
2760 - type: cos_sim_spearman
2761 value: 81.58650681159105
2762 - task:
2763 type: STS
2764 dataset:
2765 type: mteb/sts17-crosslingual-sts
2766 name: MTEB STS17 (ar-ar)
2767 config: ar-ar
2768 split: test
2769 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2770 metrics:
2771 - type: cos_sim_spearman
2772 value: 78.82472265884256
2773 - task:
2774 type: STS
2775 dataset:
2776 type: mteb/sts17-crosslingual-sts
2777 name: MTEB STS17 (en-ar)
2778 config: en-ar
2779 split: test
2780 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2781 metrics:
2782 - type: cos_sim_spearman
2783 value: 76.43637938260397
2784 - task:
2785 type: STS
2786 dataset:
2787 type: mteb/sts17-crosslingual-sts
2788 name: MTEB STS17 (en-de)
2789 config: en-de
2790 split: test
2791 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2792 metrics:
2793 - type: cos_sim_spearman
2794 value: 84.71008299464059
2795 - task:
2796 type: STS
2797 dataset:
2798 type: mteb/sts17-crosslingual-sts
2799 name: MTEB STS17 (en-en)
2800 config: en-en
2801 split: test
2802 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2803 metrics:
2804 - type: cos_sim_spearman
2805 value: 88.88074713413747
2806 - task:
2807 type: STS
2808 dataset:
2809 type: mteb/sts17-crosslingual-sts
2810 name: MTEB STS17 (en-tr)
2811 config: en-tr
2812 split: test
2813 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2814 metrics:
2815 - type: cos_sim_spearman
2816 value: 76.36405640457285
2817 - task:
2818 type: STS
2819 dataset:
2820 type: mteb/sts17-crosslingual-sts
2821 name: MTEB STS17 (es-en)
2822 config: es-en
2823 split: test
2824 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2825 metrics:
2826 - type: cos_sim_spearman
2827 value: 83.84737910084762
2828 - task:
2829 type: STS
2830 dataset:
2831 type: mteb/sts17-crosslingual-sts
2832 name: MTEB STS17 (es-es)
2833 config: es-es
2834 split: test
2835 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2836 metrics:
2837 - type: cos_sim_spearman
2838 value: 87.03931621433031
2839 - task:
2840 type: STS
2841 dataset:
2842 type: mteb/sts17-crosslingual-sts
2843 name: MTEB STS17 (fr-en)
2844 config: fr-en
2845 split: test
2846 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2847 metrics:
2848 - type: cos_sim_spearman
2849 value: 84.43335591752246
2850 - task:
2851 type: STS
2852 dataset:
2853 type: mteb/sts17-crosslingual-sts
2854 name: MTEB STS17 (it-en)
2855 config: it-en
2856 split: test
2857 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2858 metrics:
2859 - type: cos_sim_spearman
2860 value: 83.85268648747021
2861 - task:
2862 type: STS
2863 dataset:
2864 type: mteb/sts17-crosslingual-sts
2865 name: MTEB STS17 (nl-en)
2866 config: nl-en
2867 split: test
2868 revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2869 metrics:
2870 - type: cos_sim_spearman
2871 value: 82.45786516224341
2872 - task:
2873 type: STS
2874 dataset:
2875 type: mteb/sts22-crosslingual-sts
2876 name: MTEB STS22 (en)
2877 config: en
2878 split: test
2879 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2880 metrics:
2881 - type: cos_sim_spearman
2882 value: 67.20227303970304
2883 - task:
2884 type: STS
2885 dataset:
2886 type: mteb/sts22-crosslingual-sts
2887 name: MTEB STS22 (de)
2888 config: de
2889 split: test
2890 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2891 metrics:
2892 - type: cos_sim_spearman
2893 value: 60.892838305537126
2894 - task:
2895 type: STS
2896 dataset:
2897 type: mteb/sts22-crosslingual-sts
2898 name: MTEB STS22 (es)
2899 config: es
2900 split: test
2901 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2902 metrics:
2903 - type: cos_sim_spearman
2904 value: 72.01876318464508
2905 - task:
2906 type: STS
2907 dataset:
2908 type: mteb/sts22-crosslingual-sts
2909 name: MTEB STS22 (pl)
2910 config: pl
2911 split: test
2912 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2913 metrics:
2914 - type: cos_sim_spearman
2915 value: 42.3879320510127
2916 - task:
2917 type: STS
2918 dataset:
2919 type: mteb/sts22-crosslingual-sts
2920 name: MTEB STS22 (tr)
2921 config: tr
2922 split: test
2923 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2924 metrics:
2925 - type: cos_sim_spearman
2926 value: 65.54048784845729
2927 - task:
2928 type: STS
2929 dataset:
2930 type: mteb/sts22-crosslingual-sts
2931 name: MTEB STS22 (ar)
2932 config: ar
2933 split: test
2934 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2935 metrics:
2936 - type: cos_sim_spearman
2937 value: 58.55244068334867
2938 - task:
2939 type: STS
2940 dataset:
2941 type: mteb/sts22-crosslingual-sts
2942 name: MTEB STS22 (ru)
2943 config: ru
2944 split: test
2945 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2946 metrics:
2947 - type: cos_sim_spearman
2948 value: 66.48710288440624
2949 - task:
2950 type: STS
2951 dataset:
2952 type: mteb/sts22-crosslingual-sts
2953 name: MTEB STS22 (zh)
2954 config: zh
2955 split: test
2956 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2957 metrics:
2958 - type: cos_sim_spearman
2959 value: 66.585754901838
2960 - task:
2961 type: STS
2962 dataset:
2963 type: mteb/sts22-crosslingual-sts
2964 name: MTEB STS22 (fr)
2965 config: fr
2966 split: test
2967 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2968 metrics:
2969 - type: cos_sim_spearman
2970 value: 81.03001290557805
2971 - task:
2972 type: STS
2973 dataset:
2974 type: mteb/sts22-crosslingual-sts
2975 name: MTEB STS22 (de-en)
2976 config: de-en
2977 split: test
2978 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2979 metrics:
2980 - type: cos_sim_spearman
2981 value: 62.28001859884359
2982 - task:
2983 type: STS
2984 dataset:
2985 type: mteb/sts22-crosslingual-sts
2986 name: MTEB STS22 (es-en)
2987 config: es-en
2988 split: test
2989 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2990 metrics:
2991 - type: cos_sim_spearman
2992 value: 79.64106342105019
2993 - task:
2994 type: STS
2995 dataset:
2996 type: mteb/sts22-crosslingual-sts
2997 name: MTEB STS22 (it)
2998 config: it
2999 split: test
3000 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
3001 metrics:
3002 - type: cos_sim_spearman
3003 value: 78.27915339361124
3004 - task:
3005 type: STS
3006 dataset:
3007 type: mteb/sts22-crosslingual-sts
3008 name: MTEB STS22 (pl-en)
3009 config: pl-en
3010 split: test
3011 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
3012 metrics:
3013 - type: cos_sim_spearman
3014 value: 78.28574268257462
3015 - task:
3016 type: STS
3017 dataset:
3018 type: mteb/sts22-crosslingual-sts
3019 name: MTEB STS22 (zh-en)
3020 config: zh-en
3021 split: test
3022 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
3023 metrics:
3024 - type: cos_sim_spearman
3025 value: 72.92658860751482
3026 - task:
3027 type: STS
3028 dataset:
3029 type: mteb/sts22-crosslingual-sts
3030 name: MTEB STS22 (es-it)
3031 config: es-it
3032 split: test
3033 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
3034 metrics:
3035 - type: cos_sim_spearman
3036 value: 74.83418886368217
3037 - task:
3038 type: STS
3039 dataset:
3040 type: mteb/sts22-crosslingual-sts
3041 name: MTEB STS22 (de-fr)
3042 config: de-fr
3043 split: test
3044 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
3045 metrics:
3046 - type: cos_sim_spearman
3047 value: 56.01064022625769
3048 - task:
3049 type: STS
3050 dataset:
3051 type: mteb/sts22-crosslingual-sts
3052 name: MTEB STS22 (de-pl)
3053 config: de-pl
3054 split: test
3055 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
3056 metrics:
3057 - type: cos_sim_spearman
3058 value: 53.64332829635126
3059 - task:
3060 type: STS
3061 dataset:
3062 type: mteb/sts22-crosslingual-sts
3063 name: MTEB STS22 (fr-pl)
3064 config: fr-pl
3065 split: test
3066 revision: eea2b4fe26a775864c896887d910b76a8098ad3f
3067 metrics:
3068 - type: cos_sim_spearman
3069 value: 73.24670207647144
3070 - task:
3071 type: STS
3072 dataset:
3073 type: C-MTEB/STSB
3074 name: MTEB STSB
3075 config: default
3076 split: test
3077 revision: 0cde68302b3541bb8b3c340dc0644b0b745b3dc0
3078 metrics:
3079 - type: cos_sim_spearman
3080 value: 80.7157790971544
3081 - task:
3082 type: STS
3083 dataset:
3084 type: mteb/stsbenchmark-sts
3085 name: MTEB STSBenchmark
3086 config: default
3087 split: test
3088 revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
3089 metrics:
3090 - type: cos_sim_spearman
3091 value: 86.45763616928973
3092 - task:
3093 type: STS
3094 dataset:
3095 type: stsb_multi_mt
3096 name: MTEB STSBenchmarkMultilingualSTS (fr)
3097 config: fr
3098 split: test
3099 revision: 93d57ef91790589e3ce9c365164337a8a78b7632
3100 metrics:
3101 - type: cos_sim_spearman
3102 value: 84.4335500335282
3103 - task:
3104 type: Reranking
3105 dataset:
3106 type: mteb/scidocs-reranking
3107 name: MTEB SciDocsRR
3108 config: default
3109 split: test
3110 revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
3111 metrics:
3112 - type: map
3113 value: 84.15276484499303
3114 - task:
3115 type: Retrieval
3116 dataset:
3117 type: mteb/scifact
3118 name: MTEB SciFact
3119 config: default
3120 split: test
3121 revision: 0228b52cf27578f30900b9e5271d331663a030d7
3122 metrics:
3123 - type: ndcg_at_10
3124 value: 73.433
3125 - task:
3126 type: Retrieval
3127 dataset:
3128 type: clarin-knext/scifact-pl
3129 name: MTEB SciFact-PL
3130 config: default
3131 split: test
3132 revision: 47932a35f045ef8ed01ba82bf9ff67f6e109207e
3133 metrics:
3134 - type: ndcg_at_10
3135 value: 58.919999999999995
3136 - task:
3137 type: PairClassification
3138 dataset:
3139 type: mteb/sprintduplicatequestions-pairclassification
3140 name: MTEB SprintDuplicateQuestions
3141 config: default
3142 split: test
3143 revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
3144 metrics:
3145 - type: cos_sim_ap
3146 value: 95.40564890916419
3147 - task:
3148 type: Clustering
3149 dataset:
3150 type: mteb/stackexchange-clustering
3151 name: MTEB StackExchangeClustering
3152 config: default
3153 split: test
3154 revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
3155 metrics:
3156 - type: v_measure
3157 value: 63.41856697730145
3158 - task:
3159 type: Clustering
3160 dataset:
3161 type: mteb/stackexchange-clustering-p2p
3162 name: MTEB StackExchangeClusteringP2P
3163 config: default
3164 split: test
3165 revision: 815ca46b2622cec33ccafc3735d572c266efdb44
3166 metrics:
3167 - type: v_measure
3168 value: 31.709285904909112
3169 - task:
3170 type: Reranking
3171 dataset:
3172 type: mteb/stackoverflowdupquestions-reranking
3173 name: MTEB StackOverflowDupQuestions
3174 config: default
3175 split: test
3176 revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
3177 metrics:
3178 - type: map
3179 value: 52.09341030060322
3180 - task:
3181 type: Summarization
3182 dataset:
3183 type: mteb/summeval
3184 name: MTEB SummEval
3185 config: default
3186 split: test
3187 revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
3188 metrics:
3189 - type: cos_sim_spearman
3190 value: 30.58262517835034
3191 - task:
3192 type: Summarization
3193 dataset:
3194 type: lyon-nlp/summarization-summeval-fr-p2p
3195 name: MTEB SummEvalFr
3196 config: default
3197 split: test
3198 revision: b385812de6a9577b6f4d0f88c6a6e35395a94054
3199 metrics:
3200 - type: cos_sim_spearman
3201 value: 29.744542072951358
3202 - task:
3203 type: Reranking
3204 dataset:
3205 type: lyon-nlp/mteb-fr-reranking-syntec-s2p
3206 name: MTEB SyntecReranking
3207 config: default
3208 split: test
3209 revision: b205c5084a0934ce8af14338bf03feb19499c84d
3210 metrics:
3211 - type: map
3212 value: 88.03333333333333
3213 - task:
3214 type: Retrieval
3215 dataset:
3216 type: lyon-nlp/mteb-fr-retrieval-syntec-s2p
3217 name: MTEB SyntecRetrieval
3218 config: default
3219 split: test
3220 revision: 77f7e271bf4a92b24fce5119f3486b583ca016ff
3221 metrics:
3222 - type: ndcg_at_10
3223 value: 83.043
3224 - task:
3225 type: Reranking
3226 dataset:
3227 type: C-MTEB/T2Reranking
3228 name: MTEB T2Reranking
3229 config: default
3230 split: dev
3231 revision: 76631901a18387f85eaa53e5450019b87ad58ef9
3232 metrics:
3233 - type: map
3234 value: 67.08577894804324
3235 - task:
3236 type: Retrieval
3237 dataset:
3238 type: C-MTEB/T2Retrieval
3239 name: MTEB T2Retrieval
3240 config: default
3241 split: dev
3242 revision: 8731a845f1bf500a4f111cf1070785c793d10e64
3243 metrics:
3244 - type: ndcg_at_10
3245 value: 84.718
3246 - task:
3247 type: Classification
3248 dataset:
3249 type: C-MTEB/TNews-classification
3250 name: MTEB TNews
3251 config: default
3252 split: validation
3253 revision: 317f262bf1e6126357bbe89e875451e4b0938fe4
3254 metrics:
3255 - type: accuracy
3256 value: 48.726
3257 - task:
3258 type: Retrieval
3259 dataset:
3260 type: mteb/trec-covid
3261 name: MTEB TRECCOVID
3262 config: default
3263 split: test
3264 revision: None
3265 metrics:
3266 - type: ndcg_at_10
3267 value: 57.56
3268 - task:
3269 type: Retrieval
3270 dataset:
3271 type: clarin-knext/trec-covid-pl
3272 name: MTEB TRECCOVID-PL
3273 config: default
3274 split: test
3275 revision: 81bcb408f33366c2a20ac54adafad1ae7e877fdd
3276 metrics:
3277 - type: ndcg_at_10
3278 value: 59.355999999999995
3279 - task:
3280 type: BitextMining
3281 dataset:
3282 type: mteb/tatoeba-bitext-mining
3283 name: MTEB Tatoeba (sqi-eng)
3284 config: sqi-eng
3285 split: test
3286 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3287 metrics:
3288 - type: f1
3289 value: 82.765
3290 - task:
3291 type: BitextMining
3292 dataset:
3293 type: mteb/tatoeba-bitext-mining
3294 name: MTEB Tatoeba (fry-eng)
3295 config: fry-eng
3296 split: test
3297 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3298 metrics:
3299 - type: f1
3300 value: 73.69942196531792
3301 - task:
3302 type: BitextMining
3303 dataset:
3304 type: mteb/tatoeba-bitext-mining
3305 name: MTEB Tatoeba (kur-eng)
3306 config: kur-eng
3307 split: test
3308 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3309 metrics:
3310 - type: f1
3311 value: 32.86585365853657
3312 - task:
3313 type: BitextMining
3314 dataset:
3315 type: mteb/tatoeba-bitext-mining
3316 name: MTEB Tatoeba (tur-eng)
3317 config: tur-eng
3318 split: test
3319 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3320 metrics:
3321 - type: f1
3322 value: 95.81666666666666
3323 - task:
3324 type: BitextMining
3325 dataset:
3326 type: mteb/tatoeba-bitext-mining
3327 name: MTEB Tatoeba (deu-eng)
3328 config: deu-eng
3329 split: test
3330 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3331 metrics:
3332 - type: f1
3333 value: 97.75
3334 - task:
3335 type: BitextMining
3336 dataset:
3337 type: mteb/tatoeba-bitext-mining
3338 name: MTEB Tatoeba (nld-eng)
3339 config: nld-eng
3340 split: test
3341 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3342 metrics:
3343 - type: f1
3344 value: 93.78333333333335
3345 - task:
3346 type: BitextMining
3347 dataset:
3348 type: mteb/tatoeba-bitext-mining
3349 name: MTEB Tatoeba (ron-eng)
3350 config: ron-eng
3351 split: test
3352 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3353 metrics:
3354 - type: f1
3355 value: 90.72333333333333
3356 - task:
3357 type: BitextMining
3358 dataset:
3359 type: mteb/tatoeba-bitext-mining
3360 name: MTEB Tatoeba (ang-eng)
3361 config: ang-eng
3362 split: test
3363 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3364 metrics:
3365 - type: f1
3366 value: 42.45202558635395
3367 - task:
3368 type: BitextMining
3369 dataset:
3370 type: mteb/tatoeba-bitext-mining
3371 name: MTEB Tatoeba (ido-eng)
3372 config: ido-eng
3373 split: test
3374 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3375 metrics:
3376 - type: f1
3377 value: 77.59238095238095
3378 - task:
3379 type: BitextMining
3380 dataset:
3381 type: mteb/tatoeba-bitext-mining
3382 name: MTEB Tatoeba (jav-eng)
3383 config: jav-eng
3384 split: test
3385 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3386 metrics:
3387 - type: f1
3388 value: 35.69686411149825
3389 - task:
3390 type: BitextMining
3391 dataset:
3392 type: mteb/tatoeba-bitext-mining
3393 name: MTEB Tatoeba (isl-eng)
3394 config: isl-eng
3395 split: test
3396 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3397 metrics:
3398 - type: f1
3399 value: 82.59333333333333
3400 - task:
3401 type: BitextMining
3402 dataset:
3403 type: mteb/tatoeba-bitext-mining
3404 name: MTEB Tatoeba (slv-eng)
3405 config: slv-eng
3406 split: test
3407 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3408 metrics:
3409 - type: f1
3410 value: 84.1456922987907
3411 - task:
3412 type: BitextMining
3413 dataset:
3414 type: mteb/tatoeba-bitext-mining
3415 name: MTEB Tatoeba (cym-eng)
3416 config: cym-eng
3417 split: test
3418 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3419 metrics:
3420 - type: f1
3421 value: 52.47462133594857
3422 - task:
3423 type: BitextMining
3424 dataset:
3425 type: mteb/tatoeba-bitext-mining
3426 name: MTEB Tatoeba (kaz-eng)
3427 config: kaz-eng
3428 split: test
3429 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3430 metrics:
3431 - type: f1
3432 value: 67.62965440356746
3433 - task:
3434 type: BitextMining
3435 dataset:
3436 type: mteb/tatoeba-bitext-mining
3437 name: MTEB Tatoeba (est-eng)
3438 config: est-eng
3439 split: test
3440 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3441 metrics:
3442 - type: f1
3443 value: 79.48412698412699
3444 - task:
3445 type: BitextMining
3446 dataset:
3447 type: mteb/tatoeba-bitext-mining
3448 name: MTEB Tatoeba (heb-eng)
3449 config: heb-eng
3450 split: test
3451 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3452 metrics:
3453 - type: f1
3454 value: 75.85
3455 - task:
3456 type: BitextMining
3457 dataset:
3458 type: mteb/tatoeba-bitext-mining
3459 name: MTEB Tatoeba (gla-eng)
3460 config: gla-eng
3461 split: test
3462 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3463 metrics:
3464 - type: f1
3465 value: 27.32600866497127
3466 - task:
3467 type: BitextMining
3468 dataset:
3469 type: mteb/tatoeba-bitext-mining
3470 name: MTEB Tatoeba (mar-eng)
3471 config: mar-eng
3472 split: test
3473 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3474 metrics:
3475 - type: f1
3476 value: 84.38
3477 - task:
3478 type: BitextMining
3479 dataset:
3480 type: mteb/tatoeba-bitext-mining
3481 name: MTEB Tatoeba (lat-eng)
3482 config: lat-eng
3483 split: test
3484 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3485 metrics:
3486 - type: f1
3487 value: 42.98888712165028
3488 - task:
3489 type: BitextMining
3490 dataset:
3491 type: mteb/tatoeba-bitext-mining
3492 name: MTEB Tatoeba (bel-eng)
3493 config: bel-eng
3494 split: test
3495 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3496 metrics:
3497 - type: f1
3498 value: 85.55690476190476
3499 - task:
3500 type: BitextMining
3501 dataset:
3502 type: mteb/tatoeba-bitext-mining
3503 name: MTEB Tatoeba (pms-eng)
3504 config: pms-eng
3505 split: test
3506 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3507 metrics:
3508 - type: f1
3509 value: 46.68466031323174
3510 - task:
3511 type: BitextMining
3512 dataset:
3513 type: mteb/tatoeba-bitext-mining
3514 name: MTEB Tatoeba (gle-eng)
3515 config: gle-eng
3516 split: test
3517 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3518 metrics:
3519 - type: f1
3520 value: 32.73071428571428
3521 - task:
3522 type: BitextMining
3523 dataset:
3524 type: mteb/tatoeba-bitext-mining
3525 name: MTEB Tatoeba (pes-eng)
3526 config: pes-eng
3527 split: test
3528 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3529 metrics:
3530 - type: f1
3531 value: 88.26333333333334
3532 - task:
3533 type: BitextMining
3534 dataset:
3535 type: mteb/tatoeba-bitext-mining
3536 name: MTEB Tatoeba (nob-eng)
3537 config: nob-eng
3538 split: test
3539 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3540 metrics:
3541 - type: f1
3542 value: 96.61666666666666
3543 - task:
3544 type: BitextMining
3545 dataset:
3546 type: mteb/tatoeba-bitext-mining
3547 name: MTEB Tatoeba (bul-eng)
3548 config: bul-eng
3549 split: test
3550 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3551 metrics:
3552 - type: f1
3553 value: 91.30666666666666
3554 - task:
3555 type: BitextMining
3556 dataset:
3557 type: mteb/tatoeba-bitext-mining
3558 name: MTEB Tatoeba (cbk-eng)
3559 config: cbk-eng
3560 split: test
3561 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3562 metrics:
3563 - type: f1
3564 value: 70.03714285714285
3565 - task:
3566 type: BitextMining
3567 dataset:
3568 type: mteb/tatoeba-bitext-mining
3569 name: MTEB Tatoeba (hun-eng)
3570 config: hun-eng
3571 split: test
3572 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3573 metrics:
3574 - type: f1
3575 value: 89.09
3576 - task:
3577 type: BitextMining
3578 dataset:
3579 type: mteb/tatoeba-bitext-mining
3580 name: MTEB Tatoeba (uig-eng)
3581 config: uig-eng
3582 split: test
3583 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3584 metrics:
3585 - type: f1
3586 value: 59.570476190476185
3587 - task:
3588 type: BitextMining
3589 dataset:
3590 type: mteb/tatoeba-bitext-mining
3591 name: MTEB Tatoeba (rus-eng)
3592 config: rus-eng
3593 split: test
3594 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3595 metrics:
3596 - type: f1
3597 value: 92.9
3598 - task:
3599 type: BitextMining
3600 dataset:
3601 type: mteb/tatoeba-bitext-mining
3602 name: MTEB Tatoeba (spa-eng)
3603 config: spa-eng
3604 split: test
3605 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3606 metrics:
3607 - type: f1
3608 value: 97.68333333333334
3609 - task:
3610 type: BitextMining
3611 dataset:
3612 type: mteb/tatoeba-bitext-mining
3613 name: MTEB Tatoeba (hye-eng)
3614 config: hye-eng
3615 split: test
3616 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3617 metrics:
3618 - type: f1
3619 value: 80.40880503144653
3620 - task:
3621 type: BitextMining
3622 dataset:
3623 type: mteb/tatoeba-bitext-mining
3624 name: MTEB Tatoeba (tel-eng)
3625 config: tel-eng
3626 split: test
3627 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3628 metrics:
3629 - type: f1
3630 value: 89.7008547008547
3631 - task:
3632 type: BitextMining
3633 dataset:
3634 type: mteb/tatoeba-bitext-mining
3635 name: MTEB Tatoeba (afr-eng)
3636 config: afr-eng
3637 split: test
3638 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3639 metrics:
3640 - type: f1
3641 value: 81.84833333333333
3642 - task:
3643 type: BitextMining
3644 dataset:
3645 type: mteb/tatoeba-bitext-mining
3646 name: MTEB Tatoeba (mon-eng)
3647 config: mon-eng
3648 split: test
3649 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3650 metrics:
3651 - type: f1
3652 value: 71.69696969696969
3653 - task:
3654 type: BitextMining
3655 dataset:
3656 type: mteb/tatoeba-bitext-mining
3657 name: MTEB Tatoeba (arz-eng)
3658 config: arz-eng
3659 split: test
3660 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3661 metrics:
3662 - type: f1
3663 value: 55.76985790822269
3664 - task:
3665 type: BitextMining
3666 dataset:
3667 type: mteb/tatoeba-bitext-mining
3668 name: MTEB Tatoeba (hrv-eng)
3669 config: hrv-eng
3670 split: test
3671 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3672 metrics:
3673 - type: f1
3674 value: 91.66666666666666
3675 - task:
3676 type: BitextMining
3677 dataset:
3678 type: mteb/tatoeba-bitext-mining
3679 name: MTEB Tatoeba (nov-eng)
3680 config: nov-eng
3681 split: test
3682 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3683 metrics:
3684 - type: f1
3685 value: 68.36668519547896
3686 - task:
3687 type: BitextMining
3688 dataset:
3689 type: mteb/tatoeba-bitext-mining
3690 name: MTEB Tatoeba (gsw-eng)
3691 config: gsw-eng
3692 split: test
3693 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3694 metrics:
3695 - type: f1
3696 value: 36.73992673992674
3697 - task:
3698 type: BitextMining
3699 dataset:
3700 type: mteb/tatoeba-bitext-mining
3701 name: MTEB Tatoeba (nds-eng)
3702 config: nds-eng
3703 split: test
3704 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3705 metrics:
3706 - type: f1
3707 value: 63.420952380952365
3708 - task:
3709 type: BitextMining
3710 dataset:
3711 type: mteb/tatoeba-bitext-mining
3712 name: MTEB Tatoeba (ukr-eng)
3713 config: ukr-eng
3714 split: test
3715 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3716 metrics:
3717 - type: f1
3718 value: 91.28999999999999
3719 - task:
3720 type: BitextMining
3721 dataset:
3722 type: mteb/tatoeba-bitext-mining
3723 name: MTEB Tatoeba (uzb-eng)
3724 config: uzb-eng
3725 split: test
3726 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3727 metrics:
3728 - type: f1
3729 value: 40.95392490046146
3730 - task:
3731 type: BitextMining
3732 dataset:
3733 type: mteb/tatoeba-bitext-mining
3734 name: MTEB Tatoeba (lit-eng)
3735 config: lit-eng
3736 split: test
3737 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3738 metrics:
3739 - type: f1
3740 value: 77.58936507936508
3741 - task:
3742 type: BitextMining
3743 dataset:
3744 type: mteb/tatoeba-bitext-mining
3745 name: MTEB Tatoeba (ina-eng)
3746 config: ina-eng
3747 split: test
3748 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3749 metrics:
3750 - type: f1
3751 value: 91.28999999999999
3752 - task:
3753 type: BitextMining
3754 dataset:
3755 type: mteb/tatoeba-bitext-mining
3756 name: MTEB Tatoeba (lfn-eng)
3757 config: lfn-eng
3758 split: test
3759 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3760 metrics:
3761 - type: f1
3762 value: 63.563650793650794
3763 - task:
3764 type: BitextMining
3765 dataset:
3766 type: mteb/tatoeba-bitext-mining
3767 name: MTEB Tatoeba (zsm-eng)
3768 config: zsm-eng
3769 split: test
3770 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3771 metrics:
3772 - type: f1
3773 value: 94.35
3774 - task:
3775 type: BitextMining
3776 dataset:
3777 type: mteb/tatoeba-bitext-mining
3778 name: MTEB Tatoeba (ita-eng)
3779 config: ita-eng
3780 split: test
3781 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3782 metrics:
3783 - type: f1
3784 value: 91.43
3785 - task:
3786 type: BitextMining
3787 dataset:
3788 type: mteb/tatoeba-bitext-mining
3789 name: MTEB Tatoeba (cmn-eng)
3790 config: cmn-eng
3791 split: test
3792 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3793 metrics:
3794 - type: f1
3795 value: 95.73333333333332
3796 - task:
3797 type: BitextMining
3798 dataset:
3799 type: mteb/tatoeba-bitext-mining
3800 name: MTEB Tatoeba (lvs-eng)
3801 config: lvs-eng
3802 split: test
3803 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3804 metrics:
3805 - type: f1
3806 value: 79.38666666666667
3807 - task:
3808 type: BitextMining
3809 dataset:
3810 type: mteb/tatoeba-bitext-mining
3811 name: MTEB Tatoeba (glg-eng)
3812 config: glg-eng
3813 split: test
3814 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3815 metrics:
3816 - type: f1
3817 value: 89.64
3818 - task:
3819 type: BitextMining
3820 dataset:
3821 type: mteb/tatoeba-bitext-mining
3822 name: MTEB Tatoeba (ceb-eng)
3823 config: ceb-eng
3824 split: test
3825 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3826 metrics:
3827 - type: f1
3828 value: 21.257184628237262
3829 - task:
3830 type: BitextMining
3831 dataset:
3832 type: mteb/tatoeba-bitext-mining
3833 name: MTEB Tatoeba (bre-eng)
3834 config: bre-eng
3835 split: test
3836 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3837 metrics:
3838 - type: f1
3839 value: 13.592316017316017
3840 - task:
3841 type: BitextMining
3842 dataset:
3843 type: mteb/tatoeba-bitext-mining
3844 name: MTEB Tatoeba (ben-eng)
3845 config: ben-eng
3846 split: test
3847 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3848 metrics:
3849 - type: f1
3850 value: 73.22666666666666
3851 - task:
3852 type: BitextMining
3853 dataset:
3854 type: mteb/tatoeba-bitext-mining
3855 name: MTEB Tatoeba (swg-eng)
3856 config: swg-eng
3857 split: test
3858 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3859 metrics:
3860 - type: f1
3861 value: 51.711309523809526
3862 - task:
3863 type: BitextMining
3864 dataset:
3865 type: mteb/tatoeba-bitext-mining
3866 name: MTEB Tatoeba (arq-eng)
3867 config: arq-eng
3868 split: test
3869 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3870 metrics:
3871 - type: f1
3872 value: 24.98790634904795
3873 - task:
3874 type: BitextMining
3875 dataset:
3876 type: mteb/tatoeba-bitext-mining
3877 name: MTEB Tatoeba (kab-eng)
3878 config: kab-eng
3879 split: test
3880 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3881 metrics:
3882 - type: f1
3883 value: 17.19218192918193
3884 - task:
3885 type: BitextMining
3886 dataset:
3887 type: mteb/tatoeba-bitext-mining
3888 name: MTEB Tatoeba (fra-eng)
3889 config: fra-eng
3890 split: test
3891 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3892 metrics:
3893 - type: f1
3894 value: 93.26666666666667
3895 - task:
3896 type: BitextMining
3897 dataset:
3898 type: mteb/tatoeba-bitext-mining
3899 name: MTEB Tatoeba (por-eng)
3900 config: por-eng
3901 split: test
3902 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3903 metrics:
3904 - type: f1
3905 value: 94.57333333333334
3906 - task:
3907 type: BitextMining
3908 dataset:
3909 type: mteb/tatoeba-bitext-mining
3910 name: MTEB Tatoeba (tat-eng)
3911 config: tat-eng
3912 split: test
3913 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3914 metrics:
3915 - type: f1
3916 value: 42.35127206127206
3917 - task:
3918 type: BitextMining
3919 dataset:
3920 type: mteb/tatoeba-bitext-mining
3921 name: MTEB Tatoeba (oci-eng)
3922 config: oci-eng
3923 split: test
3924 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3925 metrics:
3926 - type: f1
3927 value: 51.12318903318903
3928 - task:
3929 type: BitextMining
3930 dataset:
3931 type: mteb/tatoeba-bitext-mining
3932 name: MTEB Tatoeba (pol-eng)
3933 config: pol-eng
3934 split: test
3935 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3936 metrics:
3937 - type: f1
3938 value: 94.89999999999999
3939 - task:
3940 type: BitextMining
3941 dataset:
3942 type: mteb/tatoeba-bitext-mining
3943 name: MTEB Tatoeba (war-eng)
3944 config: war-eng
3945 split: test
3946 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3947 metrics:
3948 - type: f1
3949 value: 23.856320290390055
3950 - task:
3951 type: BitextMining
3952 dataset:
3953 type: mteb/tatoeba-bitext-mining
3954 name: MTEB Tatoeba (aze-eng)
3955 config: aze-eng
3956 split: test
3957 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3958 metrics:
3959 - type: f1
3960 value: 79.52833333333334
3961 - task:
3962 type: BitextMining
3963 dataset:
3964 type: mteb/tatoeba-bitext-mining
3965 name: MTEB Tatoeba (vie-eng)
3966 config: vie-eng
3967 split: test
3968 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3969 metrics:
3970 - type: f1
3971 value: 95.93333333333334
3972 - task:
3973 type: BitextMining
3974 dataset:
3975 type: mteb/tatoeba-bitext-mining
3976 name: MTEB Tatoeba (nno-eng)
3977 config: nno-eng
3978 split: test
3979 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3980 metrics:
3981 - type: f1
3982 value: 90.75333333333333
3983 - task:
3984 type: BitextMining
3985 dataset:
3986 type: mteb/tatoeba-bitext-mining
3987 name: MTEB Tatoeba (cha-eng)
3988 config: cha-eng
3989 split: test
3990 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
3991 metrics:
3992 - type: f1
3993 value: 30.802919708029197
3994 - task:
3995 type: BitextMining
3996 dataset:
3997 type: mteb/tatoeba-bitext-mining
3998 name: MTEB Tatoeba (mhr-eng)
3999 config: mhr-eng
4000 split: test
4001 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4002 metrics:
4003 - type: f1
4004 value: 15.984076294076294
4005 - task:
4006 type: BitextMining
4007 dataset:
4008 type: mteb/tatoeba-bitext-mining
4009 name: MTEB Tatoeba (dan-eng)
4010 config: dan-eng
4011 split: test
4012 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4013 metrics:
4014 - type: f1
4015 value: 91.82666666666667
4016 - task:
4017 type: BitextMining
4018 dataset:
4019 type: mteb/tatoeba-bitext-mining
4020 name: MTEB Tatoeba (ell-eng)
4021 config: ell-eng
4022 split: test
4023 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4024 metrics:
4025 - type: f1
4026 value: 91.9
4027 - task:
4028 type: BitextMining
4029 dataset:
4030 type: mteb/tatoeba-bitext-mining
4031 name: MTEB Tatoeba (amh-eng)
4032 config: amh-eng
4033 split: test
4034 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4035 metrics:
4036 - type: f1
4037 value: 76.36054421768706
4038 - task:
4039 type: BitextMining
4040 dataset:
4041 type: mteb/tatoeba-bitext-mining
4042 name: MTEB Tatoeba (pam-eng)
4043 config: pam-eng
4044 split: test
4045 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4046 metrics:
4047 - type: f1
4048 value: 9.232711399711398
4049 - task:
4050 type: BitextMining
4051 dataset:
4052 type: mteb/tatoeba-bitext-mining
4053 name: MTEB Tatoeba (hsb-eng)
4054 config: hsb-eng
4055 split: test
4056 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4057 metrics:
4058 - type: f1
4059 value: 45.640803181175855
4060 - task:
4061 type: BitextMining
4062 dataset:
4063 type: mteb/tatoeba-bitext-mining
4064 name: MTEB Tatoeba (srp-eng)
4065 config: srp-eng
4066 split: test
4067 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4068 metrics:
4069 - type: f1
4070 value: 86.29
4071 - task:
4072 type: BitextMining
4073 dataset:
4074 type: mteb/tatoeba-bitext-mining
4075 name: MTEB Tatoeba (epo-eng)
4076 config: epo-eng
4077 split: test
4078 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4079 metrics:
4080 - type: f1
4081 value: 88.90833333333332
4082 - task:
4083 type: BitextMining
4084 dataset:
4085 type: mteb/tatoeba-bitext-mining
4086 name: MTEB Tatoeba (kzj-eng)
4087 config: kzj-eng
4088 split: test
4089 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4090 metrics:
4091 - type: f1
4092 value: 11.11880248978075
4093 - task:
4094 type: BitextMining
4095 dataset:
4096 type: mteb/tatoeba-bitext-mining
4097 name: MTEB Tatoeba (awa-eng)
4098 config: awa-eng
4099 split: test
4100 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4101 metrics:
4102 - type: f1
4103 value: 48.45839345839346
4104 - task:
4105 type: BitextMining
4106 dataset:
4107 type: mteb/tatoeba-bitext-mining
4108 name: MTEB Tatoeba (fao-eng)
4109 config: fao-eng
4110 split: test
4111 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4112 metrics:
4113 - type: f1
4114 value: 65.68157033805888
4115 - task:
4116 type: BitextMining
4117 dataset:
4118 type: mteb/tatoeba-bitext-mining
4119 name: MTEB Tatoeba (mal-eng)
4120 config: mal-eng
4121 split: test
4122 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4123 metrics:
4124 - type: f1
4125 value: 94.63852498786997
4126 - task:
4127 type: BitextMining
4128 dataset:
4129 type: mteb/tatoeba-bitext-mining
4130 name: MTEB Tatoeba (ile-eng)
4131 config: ile-eng
4132 split: test
4133 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4134 metrics:
4135 - type: f1
4136 value: 81.67904761904761
4137 - task:
4138 type: BitextMining
4139 dataset:
4140 type: mteb/tatoeba-bitext-mining
4141 name: MTEB Tatoeba (bos-eng)
4142 config: bos-eng
4143 split: test
4144 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4145 metrics:
4146 - type: f1
4147 value: 89.35969868173258
4148 - task:
4149 type: BitextMining
4150 dataset:
4151 type: mteb/tatoeba-bitext-mining
4152 name: MTEB Tatoeba (cor-eng)
4153 config: cor-eng
4154 split: test
4155 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4156 metrics:
4157 - type: f1
4158 value: 5.957229437229437
4159 - task:
4160 type: BitextMining
4161 dataset:
4162 type: mteb/tatoeba-bitext-mining
4163 name: MTEB Tatoeba (cat-eng)
4164 config: cat-eng
4165 split: test
4166 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4167 metrics:
4168 - type: f1
4169 value: 91.50333333333333
4170 - task:
4171 type: BitextMining
4172 dataset:
4173 type: mteb/tatoeba-bitext-mining
4174 name: MTEB Tatoeba (eus-eng)
4175 config: eus-eng
4176 split: test
4177 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4178 metrics:
4179 - type: f1
4180 value: 63.75498778998778
4181 - task:
4182 type: BitextMining
4183 dataset:
4184 type: mteb/tatoeba-bitext-mining
4185 name: MTEB Tatoeba (yue-eng)
4186 config: yue-eng
4187 split: test
4188 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4189 metrics:
4190 - type: f1
4191 value: 82.99190476190476
4192 - task:
4193 type: BitextMining
4194 dataset:
4195 type: mteb/tatoeba-bitext-mining
4196 name: MTEB Tatoeba (swe-eng)
4197 config: swe-eng
4198 split: test
4199 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4200 metrics:
4201 - type: f1
4202 value: 92.95
4203 - task:
4204 type: BitextMining
4205 dataset:
4206 type: mteb/tatoeba-bitext-mining
4207 name: MTEB Tatoeba (dtp-eng)
4208 config: dtp-eng
4209 split: test
4210 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4211 metrics:
4212 - type: f1
4213 value: 9.054042624042623
4214 - task:
4215 type: BitextMining
4216 dataset:
4217 type: mteb/tatoeba-bitext-mining
4218 name: MTEB Tatoeba (kat-eng)
4219 config: kat-eng
4220 split: test
4221 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4222 metrics:
4223 - type: f1
4224 value: 72.77064981488574
4225 - task:
4226 type: BitextMining
4227 dataset:
4228 type: mteb/tatoeba-bitext-mining
4229 name: MTEB Tatoeba (jpn-eng)
4230 config: jpn-eng
4231 split: test
4232 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4233 metrics:
4234 - type: f1
4235 value: 93.14
4236 - task:
4237 type: BitextMining
4238 dataset:
4239 type: mteb/tatoeba-bitext-mining
4240 name: MTEB Tatoeba (csb-eng)
4241 config: csb-eng
4242 split: test
4243 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4244 metrics:
4245 - type: f1
4246 value: 29.976786498525627
4247 - task:
4248 type: BitextMining
4249 dataset:
4250 type: mteb/tatoeba-bitext-mining
4251 name: MTEB Tatoeba (xho-eng)
4252 config: xho-eng
4253 split: test
4254 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4255 metrics:
4256 - type: f1
4257 value: 67.6525821596244
4258 - task:
4259 type: BitextMining
4260 dataset:
4261 type: mteb/tatoeba-bitext-mining
4262 name: MTEB Tatoeba (orv-eng)
4263 config: orv-eng
4264 split: test
4265 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4266 metrics:
4267 - type: f1
4268 value: 33.12964812964813
4269 - task:
4270 type: BitextMining
4271 dataset:
4272 type: mteb/tatoeba-bitext-mining
4273 name: MTEB Tatoeba (ind-eng)
4274 config: ind-eng
4275 split: test
4276 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4277 metrics:
4278 - type: f1
4279 value: 92.30666666666666
4280 - task:
4281 type: BitextMining
4282 dataset:
4283 type: mteb/tatoeba-bitext-mining
4284 name: MTEB Tatoeba (tuk-eng)
4285 config: tuk-eng
4286 split: test
4287 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4288 metrics:
4289 - type: f1
4290 value: 34.36077879427633
4291 - task:
4292 type: BitextMining
4293 dataset:
4294 type: mteb/tatoeba-bitext-mining
4295 name: MTEB Tatoeba (max-eng)
4296 config: max-eng
4297 split: test
4298 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4299 metrics:
4300 - type: f1
4301 value: 52.571845212690285
4302 - task:
4303 type: BitextMining
4304 dataset:
4305 type: mteb/tatoeba-bitext-mining
4306 name: MTEB Tatoeba (swh-eng)
4307 config: swh-eng
4308 split: test
4309 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4310 metrics:
4311 - type: f1
4312 value: 58.13107263107262
4313 - task:
4314 type: BitextMining
4315 dataset:
4316 type: mteb/tatoeba-bitext-mining
4317 name: MTEB Tatoeba (hin-eng)
4318 config: hin-eng
4319 split: test
4320 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4321 metrics:
4322 - type: f1
4323 value: 93.33333333333333
4324 - task:
4325 type: BitextMining
4326 dataset:
4327 type: mteb/tatoeba-bitext-mining
4328 name: MTEB Tatoeba (dsb-eng)
4329 config: dsb-eng
4330 split: test
4331 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4332 metrics:
4333 - type: f1
4334 value: 42.87370133925458
4335 - task:
4336 type: BitextMining
4337 dataset:
4338 type: mteb/tatoeba-bitext-mining
4339 name: MTEB Tatoeba (ber-eng)
4340 config: ber-eng
4341 split: test
4342 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4343 metrics:
4344 - type: f1
4345 value: 20.394327616827614
4346 - task:
4347 type: BitextMining
4348 dataset:
4349 type: mteb/tatoeba-bitext-mining
4350 name: MTEB Tatoeba (tam-eng)
4351 config: tam-eng
4352 split: test
4353 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4354 metrics:
4355 - type: f1
4356 value: 84.29967426710098
4357 - task:
4358 type: BitextMining
4359 dataset:
4360 type: mteb/tatoeba-bitext-mining
4361 name: MTEB Tatoeba (slk-eng)
4362 config: slk-eng
4363 split: test
4364 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4365 metrics:
4366 - type: f1
4367 value: 88.80666666666667
4368 - task:
4369 type: BitextMining
4370 dataset:
4371 type: mteb/tatoeba-bitext-mining
4372 name: MTEB Tatoeba (tgl-eng)
4373 config: tgl-eng
4374 split: test
4375 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4376 metrics:
4377 - type: f1
4378 value: 67.23062271062273
4379 - task:
4380 type: BitextMining
4381 dataset:
4382 type: mteb/tatoeba-bitext-mining
4383 name: MTEB Tatoeba (ast-eng)
4384 config: ast-eng
4385 split: test
4386 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4387 metrics:
4388 - type: f1
4389 value: 78.08398950131233
4390 - task:
4391 type: BitextMining
4392 dataset:
4393 type: mteb/tatoeba-bitext-mining
4394 name: MTEB Tatoeba (mkd-eng)
4395 config: mkd-eng
4396 split: test
4397 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4398 metrics:
4399 - type: f1
4400 value: 77.85166666666666
4401 - task:
4402 type: BitextMining
4403 dataset:
4404 type: mteb/tatoeba-bitext-mining
4405 name: MTEB Tatoeba (khm-eng)
4406 config: khm-eng
4407 split: test
4408 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4409 metrics:
4410 - type: f1
4411 value: 67.63004001231148
4412 - task:
4413 type: BitextMining
4414 dataset:
4415 type: mteb/tatoeba-bitext-mining
4416 name: MTEB Tatoeba (ces-eng)
4417 config: ces-eng
4418 split: test
4419 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4420 metrics:
4421 - type: f1
4422 value: 89.77000000000001
4423 - task:
4424 type: BitextMining
4425 dataset:
4426 type: mteb/tatoeba-bitext-mining
4427 name: MTEB Tatoeba (tzl-eng)
4428 config: tzl-eng
4429 split: test
4430 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4431 metrics:
4432 - type: f1
4433 value: 40.2654503616042
4434 - task:
4435 type: BitextMining
4436 dataset:
4437 type: mteb/tatoeba-bitext-mining
4438 name: MTEB Tatoeba (urd-eng)
4439 config: urd-eng
4440 split: test
4441 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4442 metrics:
4443 - type: f1
4444 value: 83.90333333333334
4445 - task:
4446 type: BitextMining
4447 dataset:
4448 type: mteb/tatoeba-bitext-mining
4449 name: MTEB Tatoeba (ara-eng)
4450 config: ara-eng
4451 split: test
4452 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4453 metrics:
4454 - type: f1
4455 value: 77.80666666666666
4456 - task:
4457 type: BitextMining
4458 dataset:
4459 type: mteb/tatoeba-bitext-mining
4460 name: MTEB Tatoeba (kor-eng)
4461 config: kor-eng
4462 split: test
4463 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4464 metrics:
4465 - type: f1
4466 value: 84.08
4467 - task:
4468 type: BitextMining
4469 dataset:
4470 type: mteb/tatoeba-bitext-mining
4471 name: MTEB Tatoeba (yid-eng)
4472 config: yid-eng
4473 split: test
4474 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4475 metrics:
4476 - type: f1
4477 value: 60.43098607367475
4478 - task:
4479 type: BitextMining
4480 dataset:
4481 type: mteb/tatoeba-bitext-mining
4482 name: MTEB Tatoeba (fin-eng)
4483 config: fin-eng
4484 split: test
4485 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4486 metrics:
4487 - type: f1
4488 value: 88.19333333333333
4489 - task:
4490 type: BitextMining
4491 dataset:
4492 type: mteb/tatoeba-bitext-mining
4493 name: MTEB Tatoeba (tha-eng)
4494 config: tha-eng
4495 split: test
4496 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4497 metrics:
4498 - type: f1
4499 value: 90.55352798053529
4500 - task:
4501 type: BitextMining
4502 dataset:
4503 type: mteb/tatoeba-bitext-mining
4504 name: MTEB Tatoeba (wuu-eng)
4505 config: wuu-eng
4506 split: test
4507 revision: 9080400076fbadbb4c4dcb136ff4eddc40b42553
4508 metrics:
4509 - type: f1
4510 value: 88.44999999999999
4511 - task:
4512 type: Clustering
4513 dataset:
4514 type: C-MTEB/ThuNewsClusteringP2P
4515 name: MTEB ThuNewsClusteringP2P
4516 config: default
4517 split: test
4518 revision: 5798586b105c0434e4f0fe5e767abe619442cf93
4519 metrics:
4520 - type: v_measure
4521 value: 57.25416429643288
4522 - task:
4523 type: Clustering
4524 dataset:
4525 type: C-MTEB/ThuNewsClusteringS2S
4526 name: MTEB ThuNewsClusteringS2S
4527 config: default
4528 split: test
4529 revision: 8a8b2caeda43f39e13c4bc5bea0f8a667896e10d
4530 metrics:
4531 - type: v_measure
4532 value: 56.616646560243524
4533 - task:
4534 type: Retrieval
4535 dataset:
4536 type: mteb/touche2020
4537 name: MTEB Touche2020
4538 config: default
4539 split: test
4540 revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f
4541 metrics:
4542 - type: ndcg_at_10
4543 value: 22.819
4544 - task:
4545 type: Classification
4546 dataset:
4547 type: mteb/toxic_conversations_50k
4548 name: MTEB ToxicConversationsClassification
4549 config: default
4550 split: test
4551 revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
4552 metrics:
4553 - type: accuracy
4554 value: 71.02579999999999
4555 - task:
4556 type: Classification
4557 dataset:
4558 type: mteb/tweet_sentiment_extraction
4559 name: MTEB TweetSentimentExtractionClassification
4560 config: default
4561 split: test
4562 revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
4563 metrics:
4564 - type: accuracy
4565 value: 57.60045274476514
4566 - task:
4567 type: Clustering
4568 dataset:
4569 type: mteb/twentynewsgroups-clustering
4570 name: MTEB TwentyNewsgroupsClustering
4571 config: default
4572 split: test
4573 revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
4574 metrics:
4575 - type: v_measure
4576 value: 50.346666699466205
4577 - task:
4578 type: PairClassification
4579 dataset:
4580 type: mteb/twittersemeval2015-pairclassification
4581 name: MTEB TwitterSemEval2015
4582 config: default
4583 split: test
4584 revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
4585 metrics:
4586 - type: cos_sim_ap
4587 value: 71.88199004440489
4588 - task:
4589 type: PairClassification
4590 dataset:
4591 type: mteb/twitterurlcorpus-pairclassification
4592 name: MTEB TwitterURLCorpus
4593 config: default
4594 split: test
4595 revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
4596 metrics:
4597 - type: cos_sim_ap
4598 value: 85.41587779677383
4599 - task:
4600 type: Retrieval
4601 dataset:
4602 type: C-MTEB/VideoRetrieval
4603 name: MTEB VideoRetrieval
4604 config: default
4605 split: dev
4606 revision: 58c2597a5943a2ba48f4668c3b90d796283c5639
4607 metrics:
4608 - type: ndcg_at_10
4609 value: 72.792
4610 - task:
4611 type: Classification
4612 dataset:
4613 type: C-MTEB/waimai-classification
4614 name: MTEB Waimai
4615 config: default
4616 split: test
4617 revision: 339287def212450dcaa9df8c22bf93e9980c7023
4618 metrics:
4619 - type: accuracy
4620 value: 82.58000000000001
4621 - task:
4622 type: Retrieval
4623 dataset:
4624 type: jinaai/xpqa
4625 name: MTEB XPQARetrieval (fr)
4626 config: fr
4627 split: test
4628 revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f
4629 metrics:
4630 - type: ndcg_at_10
4631 value: 67.327
4632 ---
4633
4634 ## gte-multilingual-base
4635
4636 The **gte-multilingual-base** model is the latest in the [GTE](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469) (General Text Embedding) family of models, featuring several key attributes:
4637
4638 - **High Performance**: Achieves state-of-the-art (SOTA) results in multilingual retrieval tasks and multi-task representation model evaluations when compared to models of similar size.
4639 - **Training Architecture**: Trained using an encoder-only transformers architecture, resulting in a smaller model size. Unlike previous models based on decode-only LLM architecture (e.g., gte-qwen2-1.5b-instruct), this model has lower hardware requirements for inference, offering a 10x increase in inference speed.
4640 - **Long Context**: Supports text lengths up to **8192** tokens.
4641 - **Multilingual Capability**: Supports over **70** languages.
4642 - **Elastic Dense Embedding**: Support elastic output dense representation while maintaining the effectiveness of downstream tasks, which significantly reduces storage costs and improves execution efficiency.
4643 - **Sparse Vectors**: In addition to dense representations, it can also generate sparse vectors.
4644
4645
4646 **Paper**: [mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval](https://arxiv.org/pdf/2407.19669)
4647
4648 ## Model Information
4649 - Model Size: 305M
4650 - Embedding Dimension: 768
4651 - Max Input Tokens: 8192
4652
4653
4654 ## Usage
4655
4656 - **It is recommended to install xformers and enable unpadding for acceleration,
4657 refer to [enable-unpadding-and-xformers](https://huggingface.co/Alibaba-NLP/new-impl#recommendation-enable-unpadding-and-acceleration-with-xformers).**
4658 - **How to use it offline: [new-impl/discussions/2](https://huggingface.co/Alibaba-NLP/new-impl/discussions/2#662b08d04d8c3d0a09c88fa3)**
4659 - **How to use with [TEI](https://github.com/huggingface/text-embeddings-inference): [refs/pr/7](https://huggingface.co/Alibaba-NLP/gte-multilingual-base/discussions/7#66bfb82ea03b764ca92a2221)**
4660
4661
4662
4663 ### Get Dense Embeddings with Transformers
4664 ```python
4665 # Requires transformers>=4.36.0
4666
4667 import torch.nn.functional as F
4668 from transformers import AutoModel, AutoTokenizer
4669
4670 input_texts = [
4671 "what is the capital of China?",
4672 "how to implement quick sort in python?",
4673 "北京",
4674 "快排算法介绍"
4675 ]
4676
4677 model_name_or_path = 'Alibaba-NLP/gte-multilingual-base'
4678 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
4679 model = AutoModel.from_pretrained(model_name_or_path, trust_remote_code=True)
4680
4681 # Tokenize the input texts
4682 batch_dict = tokenizer(input_texts, max_length=8192, padding=True, truncation=True, return_tensors='pt')
4683
4684 outputs = model(**batch_dict)
4685
4686 dimension=768 # The output dimension of the output embedding, should be in [128, 768]
4687 embeddings = outputs.last_hidden_state[:, 0][:dimension]
4688
4689 embeddings = F.normalize(embeddings, p=2, dim=1)
4690 scores = (embeddings[:1] @ embeddings[1:].T) * 100
4691 print(scores.tolist())
4692
4693 # [[0.3016996383666992, 0.7503870129585266, 0.3203084468841553]]
4694 ```
4695
4696 ### Use with sentence-transformers
4697 ```python
4698 # Requires sentence-transformers>=3.0.0
4699
4700 from sentence_transformers import SentenceTransformer
4701
4702 input_texts = [
4703 "what is the capital of China?",
4704 "how to implement quick sort in python?",
4705 "北京",
4706 "快排算法介绍"
4707 ]
4708
4709 model_name_or_path="Alibaba-NLP/gte-multilingual-base"
4710 model = SentenceTransformer(model_name_or_path, trust_remote_code=True)
4711 embeddings = model.encode(input_texts, normalize_embeddings=True) # embeddings.shape (4, 768)
4712
4713 # sim scores
4714 scores = model.similarity(embeddings[:1], embeddings[1:])
4715
4716 print(scores.tolist())
4717 # [[0.301699697971344, 0.7503870129585266, 0.32030850648880005]]
4718 ```
4719
4720 ### Use with infinity
4721
4722 Usage via docker and [infinity](https://github.com/michaelfeil/infinity), MIT Licensed.
4723 ```
4724 docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \
4725 michaelf34/infinity:0.0.69 \
4726 v2 --model-id Alibaba-NLP/gte-multilingual-base --revision "main" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
4727 ```
4728
4729 ### Use with Text Embeddings Inference (TEI)
4730
4731 Usage via Docker and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference):
4732
4733 - CPU:
4734
4735 ```bash
4736 docker run --platform linux/amd64 \
4737 -p 8080:80 \
4738 -v $PWD/data:/data \
4739 --pull always \
4740 ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 \
4741 --model-id Alibaba-NLP/gte-multilingual-base \
4742 --dtype float16
4743 ```
4744
4745 - GPU:
4746
4747 ```
4748 docker run --gpus all \
4749 -p 8080:80 \
4750 -v $PWD/data:/data \
4751 --pull always \
4752 ghcr.io/huggingface/text-embeddings-inference:1.7 \
4753 --model-id Alibaba-NLP/gte-multilingual-base \
4754 --dtype float16
4755 ```
4756
4757 Then you can send requests to the deployed API via the OpenAI-compatible `v1/embeddings` route (more information about the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)):
4758
4759 ```bash
4760 curl https://0.0.0.0:8080/v1/embeddings \
4761 -H "Content-Type: application/json" \
4762 -d '{
4763 "input": [
4764 "what is the capital of China?",
4765 "how to implement quick sort in python?",
4766 "北京",
4767 "快排算法介绍"
4768 ],
4769 "model": "Alibaba-NLP/gte-multilingual-base",
4770 "encoding_format": "float"
4771 }'
4772 ```
4773
4774 ### Use with custom code to get dense embeddings and sparse token weights
4775 ```python
4776 # You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py
4777
4778 from gte_embedding import GTEEmbeddidng
4779
4780 model_name_or_path = 'Alibaba-NLP/gte-multilingual-base'
4781 model = GTEEmbeddidng(model_name_or_path)
4782 query = "中国的首都在哪儿"
4783
4784 docs = [
4785 "what is the capital of China?",
4786 "how to implement quick sort in python?",
4787 "北京",
4788 "快排算法介绍"
4789 ]
4790
4791 embs = model.encode(docs, return_dense=True,return_sparse=True)
4792 print('dense_embeddings vecs', embs['dense_embeddings'])
4793 print('token_weights', embs['token_weights'])
4794 pairs = [(query, doc) for doc in docs]
4795 dense_scores = model.compute_scores(pairs, dense_weight=1.0, sparse_weight=0.0)
4796 sparse_scores = model.compute_scores(pairs, dense_weight=0.0, sparse_weight=1.0)
4797 hybrid_scores = model.compute_scores(pairs, dense_weight=1.0, sparse_weight=0.3)
4798
4799 print('dense_scores', dense_scores)
4800 print('sparse_scores', sparse_scores)
4801 print('hybrid_scores', hybrid_scores)
4802
4803 # dense_scores [0.85302734375, 0.257568359375, 0.76953125, 0.325439453125]
4804 # sparse_scores [0.0, 0.0, 4.600879669189453, 1.570279598236084]
4805 # hybrid_scores [0.85302734375, 0.257568359375, 2.1497951507568356, 0.7965233325958252]
4806
4807 ```
4808
4809 ## Evaluation
4810
4811 We validated the performance of the **gte-multilingual-base** model on multiple downstream tasks, including multilingual retrieval, cross-lingual retrieval, long text retrieval, and general text representation evaluation on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard), among others.
4812
4813 ### Retrieval Task
4814
4815 Retrieval results on [MIRACL](https://arxiv.org/abs/2210.09984) and [MLDR](https://arxiv.org/abs/2402.03216) (multilingual), [MKQA](https://arxiv.org/abs/2007.15207) (crosslingual), [BEIR](https://arxiv.org/abs/2104.08663) and [LoCo](https://arxiv.org/abs/2402.07440) (English).
4816
4817 ![image](./images/mgte-retrieval.png)
4818
4819 - Detail results on [MLDR](https://arxiv.org/abs/2402.03216)
4820
4821 ![image](./images/mgte-retrieval.png)
4822
4823 - Detail results on [LoCo](https://arxiv.org/abs/2402.07440)
4824
4825 ### MTEB
4826
4827 Results on MTEB English, Chinese, French, Polish
4828
4829 ![image](./images/mgte-mteb.png)
4830
4831 **More detailed experimental results can be found in the [paper](https://arxiv.org/pdf/2407.19669)**.
4832
4833
4834 ## Cloud API Services
4835
4836 In addition to the open-source [GTE](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469) series models, GTE series models are also available as commercial API services on Alibaba Cloud.
4837
4838 - [Embedding Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-embedding/): Three versions of the text embedding models are available: text-embedding-v1/v2/v3, with v3 being the latest API service.
4839 - [ReRank Models](https://help.aliyun.com/zh/model-studio/developer-reference/general-text-sorting-model/): The gte-rerank model service is available.
4840
4841 Note that the models behind the commercial APIs are not entirely identical to the open-source models.
4842
4843 ## Citation
4844 If you find our paper or models helpful, please consider cite:
4845
4846 ```
4847 @inproceedings{zhang2024mgte,
4848 title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
4849 author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others},
4850 booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
4851 pages={1393--1412},
4852 year={2024}
4853 }
4854 ```