README.md
6.4 KB · 164 lines · markdown Raw
1 ---
2 language: en
3 datasets:
4 - conll2003
5 license: mit
6 model-index:
7 - name: dslim/bert-base-NER
8 results:
9 - task:
10 type: token-classification
11 name: Token Classification
12 dataset:
13 name: conll2003
14 type: conll2003
15 config: conll2003
16 split: test
17 metrics:
18 - name: Accuracy
19 type: accuracy
20 value: 0.9118041001560013
21 verified: true
22 - name: Precision
23 type: precision
24 value: 0.9211550382257732
25 verified: true
26 - name: Recall
27 type: recall
28 value: 0.9306415698281261
29 verified: true
30 - name: F1
31 type: f1
32 value: 0.9258740048459675
33 verified: true
34 - name: loss
35 type: loss
36 value: 0.48325642943382263
37 verified: true
38 ---
39 # bert-base-NER
40
41 If my open source models have been useful to you, please consider supporting me in building small, useful AI models for everyone (and help me afford med school / help out my parents financially). Thanks!
42
43 <a href="https://www.buymeacoffee.com/dslim" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/arial-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
44
45 ## Model description
46
47 **bert-base-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance** for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC).
48
49 Specifically, this model is a *bert-base-cased* model that was fine-tuned on the English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset.
50
51 If you'd like to use a larger BERT-large model fine-tuned on the same dataset, a [**bert-large-NER**](https://huggingface.co/dslim/bert-large-NER/) version is also available.
52
53 ### Available NER models
54 | Model Name | Description | Parameters |
55 |-------------------|-------------|------------------|
56 | [distilbert-NER](https://huggingface.co/dslim/distilbert-NER) **(NEW!)** | Fine-tuned DistilBERT - a smaller, faster, lighter version of BERT | 66M |
57 | [bert-large-NER](https://huggingface.co/dslim/bert-large-NER/) | Fine-tuned bert-large-cased - larger model with slightly better performance | 340M |
58 | [bert-base-NER](https://huggingface.co/dslim/bert-base-NER)-([uncased](https://huggingface.co/dslim/bert-base-NER-uncased)) | Fine-tuned bert-base, available in both cased and uncased versions | 110M |
59
60
61 ## Intended uses & limitations
62
63 #### How to use
64
65 You can use this model with Transformers *pipeline* for NER.
66
67 ```python
68 from transformers import AutoTokenizer, AutoModelForTokenClassification
69 from transformers import pipeline
70
71 tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
72 model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
73
74 nlp = pipeline("ner", model=model, tokenizer=tokenizer)
75 example = "My name is Wolfgang and I live in Berlin"
76
77 ner_results = nlp(example)
78 print(ner_results)
79 ```
80
81 #### Limitations and bias
82
83 This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. Furthermore, the model occassionally tags subword tokens as entities and post-processing of results may be necessary to handle those cases.
84
85 ## Training data
86
87 This model was fine-tuned on English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset.
88
89 The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:
90
91 Abbreviation|Description
92 -|-
93 O|Outside of a named entity
94 B-MISC |Beginning of a miscellaneous entity right after another miscellaneous entity
95 I-MISC | Miscellaneous entity
96 B-PER |Beginning of a person’s name right after another person’s name
97 I-PER |Person’s name
98 B-ORG |Beginning of an organization right after another organization
99 I-ORG |organization
100 B-LOC |Beginning of a location right after another location
101 I-LOC |Location
102
103
104 ### CoNLL-2003 English Dataset Statistics
105 This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper.
106 #### # of training examples per entity type
107 Dataset|LOC|MISC|ORG|PER
108 -|-|-|-|-
109 Train|7140|3438|6321|6600
110 Dev|1837|922|1341|1842
111 Test|1668|702|1661|1617
112 #### # of articles/sentences/tokens per dataset
113 Dataset |Articles |Sentences |Tokens
114 -|-|-|-
115 Train |946 |14,987 |203,621
116 Dev |216 |3,466 |51,362
117 Test |231 |3,684 |46,435
118
119 ## Training procedure
120
121 This model was trained on a single NVIDIA V100 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805) which trained & evaluated the model on CoNLL-2003 NER task.
122
123 ## Eval results
124 metric|dev|test
125 -|-|-
126 f1 |95.1 |91.3
127 precision |95.0 |90.7
128 recall |95.3 |91.9
129
130 The test metrics are a little lower than the official Google BERT results which encoded document context & experimented with CRF. More on replicating the original results [here](https://github.com/google-research/bert/issues/223).
131
132 ### BibTeX entry and citation info
133
134 ```
135 @article{DBLP:journals/corr/abs-1810-04805,
136 author = {Jacob Devlin and
137 Ming{-}Wei Chang and
138 Kenton Lee and
139 Kristina Toutanova},
140 title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
141 Understanding},
142 journal = {CoRR},
143 volume = {abs/1810.04805},
144 year = {2018},
145 url = {http://arxiv.org/abs/1810.04805},
146 archivePrefix = {arXiv},
147 eprint = {1810.04805},
148 timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
149 biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
150 bibsource = {dblp computer science bibliography, https://dblp.org}
151 }
152 ```
153 ```
154 @inproceedings{tjong-kim-sang-de-meulder-2003-introduction,
155 title = "Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition",
156 author = "Tjong Kim Sang, Erik F. and
157 De Meulder, Fien",
158 booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",
159 year = "2003",
160 url = "https://www.aclweb.org/anthology/W03-0419",
161 pages = "142--147",
162 }
163 ```
164