README.md
3.4 KB · 147 lines · markdown Raw
1 ---
2 tags:
3 - flair
4 - token-classification
5 - sequence-tagger-model
6 language: en
7 datasets:
8 - conll2003
9 widget:
10 - text: "George Washington went to Washington"
11 ---
12
13 ## English NER in Flair (fast model)
14
15 This is the fast 4-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).
16
17 F1-Score: **92,92** (corrected CoNLL-03)
18
19 Predicts 4 tags:
20
21 | **tag** | **meaning** |
22 |---------------------------------|-----------|
23 | PER | person name |
24 | LOC | location name |
25 | ORG | organization name |
26 | MISC | other name |
27
28 Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.
29
30 ---
31
32 ### Demo: How to use in Flair
33
34 Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
35
36 ```python
37 from flair.data import Sentence
38 from flair.models import SequenceTagger
39
40 # load tagger
41 tagger = SequenceTagger.load("flair/ner-english-fast")
42
43 # make example sentence
44 sentence = Sentence("George Washington went to Washington")
45
46 # predict NER tags
47 tagger.predict(sentence)
48
49 # print sentence
50 print(sentence)
51
52 # print predicted NER spans
53 print('The following NER tags are found:')
54 # iterate over entities and print
55 for entity in sentence.get_spans('ner'):
56 print(entity)
57
58 ```
59
60 This yields the following output:
61 ```
62 Span [1,2]: "George Washington" [− Labels: PER (0.9515)]
63 Span [5]: "Washington" [− Labels: LOC (0.992)]
64 ```
65
66 So, the entities "*George Washington*" (labeled as a **person**) and "*Washington*" (labeled as a **location**) are found in the sentence "*George Washington went to Washington*".
67
68
69 ---
70
71 ### Training: Script to train this model
72
73 The following Flair script was used to train this model:
74
75 ```python
76 from flair.data import Corpus
77 from flair.datasets import CONLL_03
78 from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
79
80 # 1. get the corpus
81 corpus: Corpus = CONLL_03()
82
83 # 2. what tag do we want to predict?
84 tag_type = 'ner'
85
86 # 3. make the tag dictionary from the corpus
87 tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
88
89 # 4. initialize each embedding we use
90 embedding_types = [
91
92 # GloVe embeddings
93 WordEmbeddings('glove'),
94
95 # contextual string embeddings, forward
96 FlairEmbeddings('news-forward-fast'),
97
98 # contextual string embeddings, backward
99 FlairEmbeddings('news-backward-fast'),
100 ]
101
102 # embedding stack consists of Flair and GloVe embeddings
103 embeddings = StackedEmbeddings(embeddings=embedding_types)
104
105 # 5. initialize sequence tagger
106 from flair.models import SequenceTagger
107
108 tagger = SequenceTagger(hidden_size=256,
109 embeddings=embeddings,
110 tag_dictionary=tag_dictionary,
111 tag_type=tag_type)
112
113 # 6. initialize trainer
114 from flair.trainers import ModelTrainer
115
116 trainer = ModelTrainer(tagger, corpus)
117
118 # 7. run training
119 trainer.train('resources/taggers/ner-english',
120 train_with_dev=True,
121 max_epochs=150)
122 ```
123
124
125
126 ---
127
128 ### Cite
129
130 Please cite the following paper when using this model.
131
132 ```
133 @inproceedings{akbik2018coling,
134 title={Contextual String Embeddings for Sequence Labeling},
135 author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
136 booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
137 pages = {1638--1649},
138 year = {2018}
139 }
140 ```
141
142 ---
143
144 ### Issues?
145
146 The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
147