README.md
1.3 KB · 36 lines · markdown Raw
1 ---
2 license: mit
3 datasets:
4 - wmt/europarl
5 metrics:
6 - f1
7 - recall
8 - precision
9 ---
10 This is based on [Oliver Guhr's work](https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large). The difference is that it is a finetuned xlm-roberta-base instead of an xlm-roberta-large and on twelve languages instead of four. The languages are: English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian.
11
12 ----- report -----
13
14 precision recall f1-score support
15
16 0 0.99 0.99 0.99 73317475
17 . 0.94 0.95 0.95 4484845
18 , 0.86 0.86 0.86 6100650
19 ? 0.88 0.85 0.86 136479
20 - 0.60 0.29 0.39 233630
21 : 0.71 0.49 0.58 152424
22
23 accuracy 0.98 84425503
24 macro avg 0.83 0.74 0.77 84425503
25 weighted avg 0.98 0.98 0.98 84425503
26
27
28 ----- confusion matrix -----
29
30 t/p 0 . , ? - :
31 0 1.0 0.0 0.0 0.0 0.0 0.0
32 . 0.0 1.0 0.0 0.0 0.0 0.0
33 , 0.1 0.0 0.9 0.0 0.0 0.0
34 ? 0.0 0.1 0.0 0.8 0.0 0.0
35 - 0.1 0.1 0.5 0.0 0.3 0.0
36 : 0.0 0.3 0.1 0.0 0.0 0.5