README.md · ner-english-fast

README.md

3.4 KB · 147 lines · markdown Raw

1	`---`
2	`tags:`
3	`- flair`
4	`- token-classification`
5	`- sequence-tagger-model`
6	`language: en`
7	`datasets:`
8	`- conll2003`
9	`widget:`
10	`- text: "George Washington went to Washington"`
11	`---`
12
13	`## English NER in Flair (fast model)`
14
15	`This is the fast 4-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).`
16
17	`F1-Score: 92,92 (corrected CoNLL-03)`
18
19	`Predicts 4 tags:`
20
21	`\| tag \| meaning \|`
22	`\|---------------------------------\|-----------\|`
23	`\| PER \| person name \|`
24	`\| LOC \| location name \|`
25	`\| ORG \| organization name \|`
26	`\| MISC \| other name \|`
27
28	`Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.`
29
30	`---`
31
32	`### Demo: How to use in Flair`
33
34	Requires: [Flair](https://github.com/flairNLP/flair/) (`pip install flair`)
35
36	```python
37	`from flair.data import Sentence`
38	`from flair.models import SequenceTagger`
39
40	`# load tagger`
41	`tagger = SequenceTagger.load("flair/ner-english-fast")`
42
43	`# make example sentence`
44	`sentence = Sentence("George Washington went to Washington")`
45
46	`# predict NER tags`
47	`tagger.predict(sentence)`
48
49	`# print sentence`
50	`print(sentence)`
51
52	`# print predicted NER spans`
53	`print('The following NER tags are found:')`
54	`# iterate over entities and print`
55	`for entity in sentence.get_spans('ner'):`
56	`print(entity)`
57
58	```
59
60	`This yields the following output:`
61	```
62	`Span [1,2]: "George Washington" [− Labels: PER (0.9515)]`
63	`Span [5]: "Washington" [− Labels: LOC (0.992)]`
64	```
65
66	`So, the entities "George Washington" (labeled as a person) and "Washington" (labeled as a location) are found in the sentence "George Washington went to Washington".`
67
68
69	`---`
70
71	`### Training: Script to train this model`
72
73	`The following Flair script was used to train this model:`
74
75	```python
76	`from flair.data import Corpus`
77	`from flair.datasets import CONLL_03`
78	`from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings`
79
80	`# 1. get the corpus`
81	`corpus: Corpus = CONLL_03()`
82
83	`# 2. what tag do we want to predict?`
84	`tag_type = 'ner'`
85
86	`# 3. make the tag dictionary from the corpus`
87	`tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)`
88
89	`# 4. initialize each embedding we use`
90	`embedding_types = [`
91
92	`# GloVe embeddings`
93	`WordEmbeddings('glove'),`
94
95	`# contextual string embeddings, forward`
96	`FlairEmbeddings('news-forward-fast'),`
97
98	`# contextual string embeddings, backward`
99	`FlairEmbeddings('news-backward-fast'),`
100	`]`
101
102	`# embedding stack consists of Flair and GloVe embeddings`
103	`embeddings = StackedEmbeddings(embeddings=embedding_types)`
104
105	`# 5. initialize sequence tagger`
106	`from flair.models import SequenceTagger`
107
108	`tagger = SequenceTagger(hidden_size=256,`
109	`embeddings=embeddings,`
110	`tag_dictionary=tag_dictionary,`
111	`tag_type=tag_type)`
112
113	`# 6. initialize trainer`
114	`from flair.trainers import ModelTrainer`
115
116	`trainer = ModelTrainer(tagger, corpus)`
117
118	`# 7. run training`
119	`trainer.train('resources/taggers/ner-english',`
120	`train_with_dev=True,`
121	`max_epochs=150)`
122	```
123
124
125
126	`---`
127
128	`### Cite`
129
130	`Please cite the following paper when using this model.`
131
132	```
133	`@inproceedings{akbik2018coling,`
134	`title={Contextual String Embeddings for Sequence Labeling},`
135	`author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},`
136	`booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},`
137	`pages = {1638--1649},`
138	`year = {2018}`
139	`}`
140	```
141
142	`---`
143
144	`### Issues?`
145
146	`The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).`
147