README.md
10.7 KB · 196 lines · markdown Raw
1 ---
2 language: en
3 datasets:
4 - squad
5 widget:
6 - text: "Which name is also used to describe the Amazon rainforest in English?"
7 context: "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
8 - text: "How many square kilometers of rainforest is covered in the basin?"
9 context: "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
10 license: apache-2.0
11 ---
12
13 # DistilBERT base uncased distilled SQuAD
14
15 ## Table of Contents
16 - [Model Details](#model-details)
17 - [How To Get Started With the Model](#how-to-get-started-with-the-model)
18 - [Uses](#uses)
19 - [Risks, Limitations and Biases](#risks-limitations-and-biases)
20 - [Training](#training)
21 - [Evaluation](#evaluation)
22 - [Environmental Impact](#environmental-impact)
23 - [Technical Specifications](#technical-specifications)
24 - [Citation Information](#citation-information)
25 - [Model Card Authors](#model-card-authors)
26
27 ## Model Details
28
29 **Model Description:** The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.
30
31 This model is a fine-tune checkpoint of [DistilBERT-base-uncased](https://huggingface.co/distilbert-base-uncased), fine-tuned using (a second step of) knowledge distillation on [SQuAD v1.1](https://huggingface.co/datasets/squad).
32
33 - **Developed by:** Hugging Face
34 - **Model Type:** Transformer-based language model
35 - **Language(s):** English
36 - **License:** Apache 2.0
37 - **Related Models:** [DistilBERT-base-uncased](https://huggingface.co/distilbert-base-uncased)
38 - **Resources for more information:**
39 - See [this repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) for more about Distil\* (a class of compressed models including this model)
40 - See [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) for more information about knowledge distillation and the training procedure
41
42 ## How to Get Started with the Model
43
44 Use the code below to get started with the model.
45
46 ```python
47 >>> from transformers import pipeline
48 >>> question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')
49
50 >>> context = r"""
51 ... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
52 ... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
53 ... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
54 ... """
55
56 >>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
57 >>> print(
58 ... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
59 ...)
60
61 Answer: 'SQuAD dataset', score: 0.4704, start: 147, end: 160
62 ```
63
64 Here is how to use this model in PyTorch:
65
66 ```python
67 from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering
68 import torch
69 tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')
70 model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')
71
72 question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
73
74 inputs = tokenizer(question, text, return_tensors="pt")
75 with torch.no_grad():
76 outputs = model(**inputs)
77
78 answer_start_index = torch.argmax(outputs.start_logits)
79 answer_end_index = torch.argmax(outputs.end_logits)
80
81 predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
82 tokenizer.decode(predict_answer_tokens)
83 ```
84
85 And in TensorFlow:
86
87 ```python
88 from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
89 import tensorflow as tf
90
91 tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad")
92 model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased-distilled-squad")
93
94 question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
95
96 inputs = tokenizer(question, text, return_tensors="tf")
97 outputs = model(**inputs)
98
99 answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
100 answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
101
102 predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
103 tokenizer.decode(predict_answer_tokens)
104 ```
105
106 ## Uses
107
108 This model can be used for question answering.
109
110 #### Misuse and Out-of-scope Use
111
112 The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
113
114 ## Risks, Limitations and Biases
115
116 **CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.**
117
118 Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:
119
120
121 ```python
122 >>> from transformers import pipeline
123 >>> question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')
124
125 >>> context = r"""
126 ... Alice is sitting on the bench. Bob is sitting next to her.
127 ... """
128
129 >>> result = question_answerer(question="Who is the CEO?", context=context)
130 >>> print(
131 ... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
132 ...)
133
134 Answer: 'Bob', score: 0.4183, start: 32, end: 35
135 ```
136
137 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
138
139 ## Training
140
141 #### Training Data
142
143 The [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased) model describes it's training data as:
144
145 > DistilBERT pretrained on the same data as BERT, which is [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers).
146
147 To learn more about the SQuAD v1.1 dataset, see the [SQuAD v1.1 data card](https://huggingface.co/datasets/squad).
148
149 #### Training Procedure
150
151 ##### Preprocessing
152
153 See the [distilbert-base-uncased model card](https://huggingface.co/distilbert-base-uncased) for further details.
154
155 ##### Pretraining
156
157 See the [distilbert-base-uncased model card](https://huggingface.co/distilbert-base-uncased) for further details.
158
159 ## Evaluation
160
161 As discussed in the [model repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
162
163 > This model reaches a F1 score of 86.9 on the [SQuAD v1.1] dev set (for comparison, Bert bert-base-uncased version reaches a F1 score of 88.5).
164
165 ## Environmental Impact
166
167 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). We present the hardware type and hours used based on the [associated paper](https://arxiv.org/pdf/1910.01108.pdf). Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD.
168
169 - **Hardware Type:** 8 16GB V100 GPUs
170 - **Hours used:** 90 hours
171 - **Cloud Provider:** Unknown
172 - **Compute Region:** Unknown
173 - **Carbon Emitted:** Unknown
174
175 ## Technical Specifications
176
177 See the [associated paper](https://arxiv.org/abs/1910.01108) for details on the modeling architecture, objective, compute infrastructure, and training details.
178
179 ## Citation Information
180
181 ```bibtex
182 @inproceedings{sanh2019distilbert,
183 title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
184 author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
185 booktitle={NeurIPS EMC^2 Workshop},
186 year={2019}
187 }
188 ```
189
190 APA:
191 - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
192
193 ## Model Card Authors
194
195 This model card was written by the Hugging Face team.
196