README.md · distilbert-base-cased-distilled-squad

1

---

2

language: en

3

license: apache-2.0

4

datasets:

5

- squad

6

metrics:

7

- squad

8

model-index:

9

- name: distilbert-base-cased-distilled-squad

10

results:

11

- task:

12

type: question-answering

13

name: Question Answering

14

dataset:

15

name: squad

16

type: squad

17

config: plain_text

18

split: validation

19

metrics:

20

- type: exact_match

21

value: 79.5998

22

name: Exact Match

23

verified: true

24

      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTViZDA2Y2E2NjUyMjNjYjkzNTUzODc5OTk2OTNkYjQxMDRmMDhlYjdmYWJjYWQ2N2RlNzY1YmI3OWY1NmRhOSIsInZlcnNpb24iOjF9.ZJHhboAMwsi3pqU-B-XKRCYP_tzpCRb8pEjGr2Oc-TteZeoWHI8CXcpDxugfC3f7d_oBcKWLzh3CClQxBW1iAQ

25

- type: f1

26

value: 86.9965

27

name: F1

28

verified: true

29

      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWZlMzY2MmE1NDNhOGNjNWRmODg0YjQ2Zjk5MjUzZDQ2MDYxOTBlMTNhNzQ4NTA2NjRmNDU3MGIzMTYwMmUyOSIsInZlcnNpb24iOjF9.z0ZDir87aT7UEmUeDm8Uw0oUdAqzlBz343gwnsQP3YLfGsaHe-jGlhco0Z7ISUd9NokyCiJCRc4NNxJQ83IuCw

30

---

31

32

# DistilBERT base cased distilled SQuAD

33

34

## Table of Contents

35

- [Model Details](#model-details)

36

- [How To Get Started With the Model](#how-to-get-started-with-the-model)

37

- [Uses](#uses)

38

- [Risks, Limitations and Biases](#risks-limitations-and-biases)

39

- [Training](#training)

40

- [Evaluation](#evaluation)

41

- [Environmental Impact](#environmental-impact)

42

- [Technical Specifications](#technical-specifications)

43

- [Citation Information](#citation-information)

44

- [Model Card Authors](#model-card-authors)

45

46

## Model Details

47

48

**Model Description:** The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.

49

50

This model is a fine-tune checkpoint of [DistilBERT-base-cased](https://huggingface.co/distilbert-base-cased), fine-tuned using (a second step of) knowledge distillation on [SQuAD v1.1](https://huggingface.co/datasets/squad).

51

52

- **Developed by:** Hugging Face

53

- **Model Type:** Transformer-based language model

54

- **Language(s):** English

55

- **License:** Apache 2.0

56

- **Related Models:** [DistilBERT-base-cased](https://huggingface.co/distilbert-base-cased)

57

- **Resources for more information:**

58

  - See [this repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) for more about Distil\* (a class of compressed models including this model)

59

  - See [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) for more information about knowledge distillation and the training procedure

60

61

## How to Get Started with the Model

62

63

Use the code below to get started with the model.

64

65

```python

66

>>> from transformers import pipeline

67

>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

68

69

>>> context = r"""

70

... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a

71

... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune

72

... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.

73

... """

74

75

>>> result = question_answerer(question="What is a good example of a question answering dataset?", context=context)

76

>>> print(

77

... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"

78

...)

79

80

Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160

81

```

82

83

Here is how to use this model in PyTorch:

84

85

```python

86

from transformers import DistilBertTokenizer, DistilBertModel

87

import torch

88

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')

89

model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')

90

91

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

92

93

inputs = tokenizer(question, text, return_tensors="pt")

94

with torch.no_grad():

95

outputs = model(**inputs)

96

97

print(outputs)

98

```

99

100

And in TensorFlow:

101

102

```python

103

from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering

104

import tensorflow as tf

105

106

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")

107

model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")

108

109

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

110

111

inputs = tokenizer(question, text, return_tensors="tf")

112

outputs = model(**inputs)

113

114

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])

115

answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

116

117

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]

118

tokenizer.decode(predict_answer_tokens)

119

```

120

121

## Uses

122

123

This model can be used for question answering.

124

125

#### Misuse and Out-of-scope Use

126

127

The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

128

129

## Risks, Limitations and Biases

130

131

**CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.**

132

133

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

134

135

136

```python

137

>>> from transformers import pipeline

138

>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

139

140

>>> context = r"""

141

... Alice is sitting on the bench. Bob is sitting next to her.

142

... """

143

144

>>> result = question_answerer(question="Who is the CEO?", context=context)

145

>>> print(

146

... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"

147

...)

148

149

Answer: 'Bob', score: 0.7527, start: 32, end: 35

150

```

151

152

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

153

154

## Training

155

156

#### Training Data

157

158

The [distilbert-base-cased model](https://huggingface.co/distilbert-base-cased) was trained using the same data as the [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased). The [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased) model describes it's training data as:

159

160

> DistilBERT pretrained on the same data as BERT, which is [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers).

161

162

To learn more about the SQuAD v1.1 dataset, see the [SQuAD v1.1 data card](https://huggingface.co/datasets/squad).

163

164

#### Training Procedure

165

166

##### Preprocessing

167

168

See the [distilbert-base-cased model card](https://huggingface.co/distilbert-base-cased) for further details.

169

170

##### Pretraining

171

172

See the [distilbert-base-cased model card](https://huggingface.co/distilbert-base-cased) for further details.

173

174

## Evaluation

175

176

As discussed in the [model repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)

177

178

> This model reaches a F1 score of 87.1 on the [SQuAD v1.1] dev set (for comparison, BERT bert-base-cased version reaches a F1 score of 88.7).

179

180

## Environmental Impact

181

182

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). We present the hardware type and hours used based on the [associated paper](https://arxiv.org/pdf/1910.01108.pdf). Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD.

183

184

- **Hardware Type:** 8 16GB V100 GPUs

185

- **Hours used:** 90 hours

186

- **Cloud Provider:** Unknown

187

- **Compute Region:** Unknown

188

- **Carbon Emitted:** Unknown

189

190

## Technical Specifications

191

192

See the [associated paper](https://arxiv.org/abs/1910.01108) for details on the modeling architecture, objective, compute infrastructure, and training details.

193

194

## Citation Information

195

196

```bibtex

197

@inproceedings{sanh2019distilbert,

198

title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},

199

author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},

200

booktitle={NeurIPS EMC^2 Workshop},

201

year={2019}

202

}

203

```

204

205

APA:

206

- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

207

208

## Model Card Authors

209

210

This model card was written by the Hugging Face team.

211