README.md
9.3 KB · 211 lines · markdown Raw
1 ---
2 language: en
3 license: apache-2.0
4 datasets:
5 - squad
6 metrics:
7 - squad
8 model-index:
9 - name: distilbert-base-cased-distilled-squad
10 results:
11 - task:
12 type: question-answering
13 name: Question Answering
14 dataset:
15 name: squad
16 type: squad
17 config: plain_text
18 split: validation
19 metrics:
20 - type: exact_match
21 value: 79.5998
22 name: Exact Match
23 verified: true
24 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTViZDA2Y2E2NjUyMjNjYjkzNTUzODc5OTk2OTNkYjQxMDRmMDhlYjdmYWJjYWQ2N2RlNzY1YmI3OWY1NmRhOSIsInZlcnNpb24iOjF9.ZJHhboAMwsi3pqU-B-XKRCYP_tzpCRb8pEjGr2Oc-TteZeoWHI8CXcpDxugfC3f7d_oBcKWLzh3CClQxBW1iAQ
25 - type: f1
26 value: 86.9965
27 name: F1
28 verified: true
29 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWZlMzY2MmE1NDNhOGNjNWRmODg0YjQ2Zjk5MjUzZDQ2MDYxOTBlMTNhNzQ4NTA2NjRmNDU3MGIzMTYwMmUyOSIsInZlcnNpb24iOjF9.z0ZDir87aT7UEmUeDm8Uw0oUdAqzlBz343gwnsQP3YLfGsaHe-jGlhco0Z7ISUd9NokyCiJCRc4NNxJQ83IuCw
30 ---
31
32 # DistilBERT base cased distilled SQuAD
33
34 ## Table of Contents
35 - [Model Details](#model-details)
36 - [How To Get Started With the Model](#how-to-get-started-with-the-model)
37 - [Uses](#uses)
38 - [Risks, Limitations and Biases](#risks-limitations-and-biases)
39 - [Training](#training)
40 - [Evaluation](#evaluation)
41 - [Environmental Impact](#environmental-impact)
42 - [Technical Specifications](#technical-specifications)
43 - [Citation Information](#citation-information)
44 - [Model Card Authors](#model-card-authors)
45
46 ## Model Details
47
48 **Model Description:** The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.
49
50 This model is a fine-tune checkpoint of [DistilBERT-base-cased](https://huggingface.co/distilbert-base-cased), fine-tuned using (a second step of) knowledge distillation on [SQuAD v1.1](https://huggingface.co/datasets/squad).
51
52 - **Developed by:** Hugging Face
53 - **Model Type:** Transformer-based language model
54 - **Language(s):** English
55 - **License:** Apache 2.0
56 - **Related Models:** [DistilBERT-base-cased](https://huggingface.co/distilbert-base-cased)
57 - **Resources for more information:**
58 - See [this repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) for more about Distil\* (a class of compressed models including this model)
59 - See [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) for more information about knowledge distillation and the training procedure
60
61 ## How to Get Started with the Model
62
63 Use the code below to get started with the model.
64
65 ```python
66 >>> from transformers import pipeline
67 >>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
68
69 >>> context = r"""
70 ... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
71 ... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
72 ... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
73 ... """
74
75 >>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
76 >>> print(
77 ... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
78 ...)
79
80 Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160
81 ```
82
83 Here is how to use this model in PyTorch:
84
85 ```python
86 from transformers import DistilBertTokenizer, DistilBertModel
87 import torch
88 tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')
89 model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')
90
91 question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
92
93 inputs = tokenizer(question, text, return_tensors="pt")
94 with torch.no_grad():
95 outputs = model(**inputs)
96
97 print(outputs)
98 ```
99
100 And in TensorFlow:
101
102 ```python
103 from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
104 import tensorflow as tf
105
106 tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
107 model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
108
109 question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
110
111 inputs = tokenizer(question, text, return_tensors="tf")
112 outputs = model(**inputs)
113
114 answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
115 answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
116
117 predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
118 tokenizer.decode(predict_answer_tokens)
119 ```
120
121 ## Uses
122
123 This model can be used for question answering.
124
125 #### Misuse and Out-of-scope Use
126
127 The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
128
129 ## Risks, Limitations and Biases
130
131 **CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.**
132
133 Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:
134
135
136 ```python
137 >>> from transformers import pipeline
138 >>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')
139
140 >>> context = r"""
141 ... Alice is sitting on the bench. Bob is sitting next to her.
142 ... """
143
144 >>> result = question_answerer(question="Who is the CEO?", context=context)
145 >>> print(
146 ... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
147 ...)
148
149 Answer: 'Bob', score: 0.7527, start: 32, end: 35
150 ```
151
152 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
153
154 ## Training
155
156 #### Training Data
157
158 The [distilbert-base-cased model](https://huggingface.co/distilbert-base-cased) was trained using the same data as the [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased). The [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased) model describes it's training data as:
159
160 > DistilBERT pretrained on the same data as BERT, which is [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers).
161
162 To learn more about the SQuAD v1.1 dataset, see the [SQuAD v1.1 data card](https://huggingface.co/datasets/squad).
163
164 #### Training Procedure
165
166 ##### Preprocessing
167
168 See the [distilbert-base-cased model card](https://huggingface.co/distilbert-base-cased) for further details.
169
170 ##### Pretraining
171
172 See the [distilbert-base-cased model card](https://huggingface.co/distilbert-base-cased) for further details.
173
174 ## Evaluation
175
176 As discussed in the [model repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
177
178 > This model reaches a F1 score of 87.1 on the [SQuAD v1.1] dev set (for comparison, BERT bert-base-cased version reaches a F1 score of 88.7).
179
180 ## Environmental Impact
181
182 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). We present the hardware type and hours used based on the [associated paper](https://arxiv.org/pdf/1910.01108.pdf). Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD.
183
184 - **Hardware Type:** 8 16GB V100 GPUs
185 - **Hours used:** 90 hours
186 - **Cloud Provider:** Unknown
187 - **Compute Region:** Unknown
188 - **Carbon Emitted:** Unknown
189
190 ## Technical Specifications
191
192 See the [associated paper](https://arxiv.org/abs/1910.01108) for details on the modeling architecture, objective, compute infrastructure, and training details.
193
194 ## Citation Information
195
196 ```bibtex
197 @inproceedings{sanh2019distilbert,
198 title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
199 author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
200 booktitle={NeurIPS EMC^2 Workshop},
201 year={2019}
202 }
203 ```
204
205 APA:
206 - Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
207
208 ## Model Card Authors
209
210 This model card was written by the Hugging Face team.
211