README.md
| 1 | --- |
| 2 | tags: |
| 3 | - translation |
| 4 | license: cc-by-4.0 |
| 5 | --- |
| 6 | |
| 7 | ### opus-mt-en-de |
| 8 | |
| 9 | |
| 10 | ## Table of Contents |
| 11 | - [Model Details](#model-details) |
| 12 | - [Uses](#uses) |
| 13 | - [Risks, Limitations and Biases](#risks-limitations-and-biases) |
| 14 | - [Training](#training) |
| 15 | - [Evaluation](#evaluation) |
| 16 | - [Citation Information](#citation-information) |
| 17 | - [How to Get Started With the Model](#how-to-get-started-with-the-model) |
| 18 | |
| 19 | ## Model Details |
| 20 | **Model Description:** |
| 21 | - **Developed by:** Language Technology Research Group at the University of Helsinki |
| 22 | - **Model Type:** Translation |
| 23 | - **Language(s):** |
| 24 | - Source Language: English |
| 25 | - Target Language: German |
| 26 | - **License:** CC-BY-4.0 |
| 27 | - **Resources for more information:** |
| 28 | - [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train) |
| 29 | |
| 30 | |
| 31 | ## Uses |
| 32 | |
| 33 | #### Direct Use |
| 34 | |
| 35 | This model can be used for translation and text-to-text generation. |
| 36 | |
| 37 | |
| 38 | ## Risks, Limitations and Biases |
| 39 | |
| 40 | |
| 41 | |
| 42 | **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.** |
| 43 | |
| 44 | Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). |
| 45 | |
| 46 | Further details about the dataset for this model can be found in the OPUS readme: [en-de](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/en-de/README.md) |
| 47 | |
| 48 | |
| 49 | #### Training Data |
| 50 | ##### Preprocessing |
| 51 | * pre-processing: normalization + SentencePiece |
| 52 | |
| 53 | * dataset: [opus](https://github.com/Helsinki-NLP/Opus-MT) |
| 54 | * download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.zip) |
| 55 | |
| 56 | * test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.test.txt) |
| 57 | |
| 58 | ## Evaluation |
| 59 | |
| 60 | #### Results |
| 61 | |
| 62 | * test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.eval.txt) |
| 63 | |
| 64 | |
| 65 | #### Benchmarks |
| 66 | |
| 67 | | testset | BLEU | chr-F | |
| 68 | |-----------------------|-------|-------| |
| 69 | | newssyscomb2009.en.de | 23.5 | 0.540 | |
| 70 | | news-test2008.en.de | 23.5 | 0.529 | |
| 71 | | newstest2009.en.de | 22.3 | 0.530 | |
| 72 | | newstest2010.en.de | 24.9 | 0.544 | |
| 73 | | newstest2011.en.de | 22.5 | 0.524 | |
| 74 | | newstest2012.en.de | 23.0 | 0.525 | |
| 75 | | newstest2013.en.de | 26.9 | 0.553 | |
| 76 | | newstest2015-ende.en.de | 31.1 | 0.594 | |
| 77 | | newstest2016-ende.en.de | 37.0 | 0.636 | |
| 78 | | newstest2017-ende.en.de | 29.9 | 0.586 | |
| 79 | | newstest2018-ende.en.de | 45.2 | 0.690 | |
| 80 | | newstest2019-ende.en.de | 40.9 | 0.654 | |
| 81 | | Tatoeba.en.de | 47.3 | 0.664 | |
| 82 | |
| 83 | |
| 84 | |
| 85 | ## Citation Information |
| 86 | |
| 87 | ```bibtex |
| 88 | @InProceedings{TiedemannThottingal:EAMT2020, |
| 89 | author = {J{\"o}rg Tiedemann and Santhosh Thottingal}, |
| 90 | title = {{OPUS-MT} — {B}uilding open translation services for the {W}orld}, |
| 91 | booktitle = {Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)}, |
| 92 | year = {2020}, |
| 93 | address = {Lisbon, Portugal} |
| 94 | } |
| 95 | ``` |
| 96 | |
| 97 | ## How to Get Started With the Model |
| 98 | ```python |
| 99 | from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| 100 | |
| 101 | tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de") |
| 102 | |
| 103 | model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-de") |
| 104 | |
| 105 | ``` |
| 106 | |
| 107 | |
| 108 | |
| 109 | |
| 110 | |