README.md · bert-base-NER

README.md

6.4 KB · 164 lines · markdown Raw

1	`---`
2	`language: en`
3	`datasets:`
4	`- conll2003`
5	`license: mit`
6	`model-index:`
7	`- name: dslim/bert-base-NER`
8	`results:`
9	`- task:`
10	`type: token-classification`
11	`name: Token Classification`
12	`dataset:`
13	`name: conll2003`
14	`type: conll2003`
15	`config: conll2003`
16	`split: test`
17	`metrics:`
18	`- name: Accuracy`
19	`type: accuracy`
20	`value: 0.9118041001560013`
21	`verified: true`
22	`- name: Precision`
23	`type: precision`
24	`value: 0.9211550382257732`
25	`verified: true`
26	`- name: Recall`
27	`type: recall`
28	`value: 0.9306415698281261`
29	`verified: true`
30	`- name: F1`
31	`type: f1`
32	`value: 0.9258740048459675`
33	`verified: true`
34	`- name: loss`
35	`type: loss`
36	`value: 0.48325642943382263`
37	`verified: true`
38	`---`
39	`# bert-base-NER`
40
41	`If my open source models have been useful to you, please consider supporting me in building small, useful AI models for everyone (and help me afford med school / help out my parents financially). Thanks!`
42
43	`<a href="https://www.buymeacoffee.com/dslim" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/arial-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>`
44
45	`## Model description`
46
47	`bert-base-NER is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC).`
48
49	`Specifically, this model is a bert-base-cased model that was fine-tuned on the English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset.`
50
51	`If you'd like to use a larger BERT-large model fine-tuned on the same dataset, a [bert-large-NER](https://huggingface.co/dslim/bert-large-NER/) version is also available.`
52
53	`### Available NER models`
54	`\| Model Name \| Description \| Parameters \|`
55	`\|-------------------\|-------------\|------------------\|`
56	`\| [distilbert-NER](https://huggingface.co/dslim/distilbert-NER) (NEW!) \| Fine-tuned DistilBERT - a smaller, faster, lighter version of BERT \| 66M \|`
57	`\| [bert-large-NER](https://huggingface.co/dslim/bert-large-NER/) \| Fine-tuned bert-large-cased - larger model with slightly better performance \| 340M \|`
58	`\| [bert-base-NER](https://huggingface.co/dslim/bert-base-NER)-([uncased](https://huggingface.co/dslim/bert-base-NER-uncased)) \| Fine-tuned bert-base, available in both cased and uncased versions \| 110M \|`
59
60
61	`## Intended uses & limitations`
62
63	`#### How to use`
64
65	`You can use this model with Transformers pipeline for NER.`
66
67	```python
68	`from transformers import AutoTokenizer, AutoModelForTokenClassification`
69	`from transformers import pipeline`
70
71	`tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")`
72	`model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")`
73
74	`nlp = pipeline("ner", model=model, tokenizer=tokenizer)`
75	`example = "My name is Wolfgang and I live in Berlin"`
76
77	`ner_results = nlp(example)`
78	`print(ner_results)`
79	```
80
81	`#### Limitations and bias`
82
83	`This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. Furthermore, the model occassionally tags subword tokens as entities and post-processing of results may be necessary to handle those cases.`
84
85	`## Training data`
86
87	`This model was fine-tuned on English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset.`
88
89	`The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:`
90
91	`Abbreviation\|Description`
92	`-\|-`
93	`O\|Outside of a named entity`
94	`B-MISC \|Beginning of a miscellaneous entity right after another miscellaneous entity`
95	`I-MISC \| Miscellaneous entity`
96	`B-PER \|Beginning of a person’s name right after another person’s name`
97	`I-PER \|Person’s name`
98	`B-ORG \|Beginning of an organization right after another organization`
99	`I-ORG \|organization`
100	`B-LOC \|Beginning of a location right after another location`
101	`I-LOC \|Location`
102
103
104	`### CoNLL-2003 English Dataset Statistics`
105	`This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper.`
106	`#### # of training examples per entity type`
107	`Dataset\|LOC\|MISC\|ORG\|PER`
108	`-\|-\|-\|-\|-`
109	`Train\|7140\|3438\|6321\|6600`
110	`Dev\|1837\|922\|1341\|1842`
111	`Test\|1668\|702\|1661\|1617`
112	`#### # of articles/sentences/tokens per dataset`
113	`Dataset \|Articles \|Sentences \|Tokens`
114	`-\|-\|-\|-`
115	`Train \|946 \|14,987 \|203,621`
116	`Dev \|216 \|3,466 \|51,362`
117	`Test \|231 \|3,684 \|46,435`
118
119	`## Training procedure`
120
121	`This model was trained on a single NVIDIA V100 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805) which trained & evaluated the model on CoNLL-2003 NER task.`
122
123	`## Eval results`
124	`metric\|dev\|test`
125	`-\|-\|-`
126	`f1 \|95.1 \|91.3`
127	`precision \|95.0 \|90.7`
128	`recall \|95.3 \|91.9`
129
130	`The test metrics are a little lower than the official Google BERT results which encoded document context & experimented with CRF. More on replicating the original results [here](https://github.com/google-research/bert/issues/223).`
131
132	`### BibTeX entry and citation info`
133
134	```
135	`@article{DBLP:journals/corr/abs-1810-04805,`
136	`author = {Jacob Devlin and`
137	`Ming{-}Wei Chang and`
138	`Kenton Lee and`
139	`Kristina Toutanova},`
140	`title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language`
141	`Understanding},`
142	`journal = {CoRR},`
143	`volume = {abs/1810.04805},`
144	`year = {2018},`
145	`url = {http://arxiv.org/abs/1810.04805},`
146	`archivePrefix = {arXiv},`
147	`eprint = {1810.04805},`
148	`timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},`
149	`biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},`
150	`bibsource = {dblp computer science bibliography, https://dblp.org}`
151	`}`
152	```
153	```
154	`@inproceedings{tjong-kim-sang-de-meulder-2003-introduction,`
155	`title = "Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition",`
156	`author = "Tjong Kim Sang, Erik F. and`
157	`De Meulder, Fien",`
158	`booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",`
159	`year = "2003",`
160	`url = "https://www.aclweb.org/anthology/W03-0419",`
161	`pages = "142--147",`
162	`}`
163	```
164