README.md
| 1 | --- |
| 2 | license: mit |
| 3 | datasets: |
| 4 | - wmt/europarl |
| 5 | metrics: |
| 6 | - f1 |
| 7 | - recall |
| 8 | - precision |
| 9 | --- |
| 10 | This is based on [Oliver Guhr's work](https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large). The difference is that it is a finetuned xlm-roberta-base instead of an xlm-roberta-large and on twelve languages instead of four. The languages are: English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian. |
| 11 | |
| 12 | ----- report ----- |
| 13 | |
| 14 | precision recall f1-score support |
| 15 | |
| 16 | 0 0.99 0.99 0.99 73317475 |
| 17 | . 0.94 0.95 0.95 4484845 |
| 18 | , 0.86 0.86 0.86 6100650 |
| 19 | ? 0.88 0.85 0.86 136479 |
| 20 | - 0.60 0.29 0.39 233630 |
| 21 | : 0.71 0.49 0.58 152424 |
| 22 | |
| 23 | accuracy 0.98 84425503 |
| 24 | macro avg 0.83 0.74 0.77 84425503 |
| 25 | weighted avg 0.98 0.98 0.98 84425503 |
| 26 | |
| 27 | |
| 28 | ----- confusion matrix ----- |
| 29 | |
| 30 | t/p 0 . , ? - : |
| 31 | 0 1.0 0.0 0.0 0.0 0.0 0.0 |
| 32 | . 0.0 1.0 0.0 0.0 0.0 0.0 |
| 33 | , 0.1 0.0 0.9 0.0 0.0 0.0 |
| 34 | ? 0.0 0.1 0.0 0.8 0.0 0.0 |
| 35 | - 0.1 0.1 0.5 0.0 0.3 0.0 |
| 36 | : 0.0 0.3 0.1 0.0 0.0 0.5 |