README.md · distilbart-mnli-12-1

README.md

2.3 KB · 59 lines · markdown Raw

1	`---`
2	`datasets:`
3	`- mnli`
4	`tags:`
5	`- distilbart`
6	`- distilbart-mnli`
7	`pipeline_tag: zero-shot-classification`
8	`---`
9
10	`# DistilBart-MNLI`
11
12	`distilbart-mnli is the distilled version of bart-large-mnli created using the No Teacher Distillation technique proposed for BART summarisation by Huggingface, [here](https://github.com/huggingface/transformers/tree/master/examples/seq2seq#distilbart).`
13
14	We just copy alternating layers from `bart-large-mnli` and finetune more on the same data.
15
16
17	`\| \| matched acc \| mismatched acc \|`
18	`\| ------------------------------------------------------------------------------------ \| ----------- \| -------------- \|`
19	`\| [bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) (baseline, 12-12) \| 89.9 \| 90.01 \|`
20	`\| [distilbart-mnli-12-1](https://huggingface.co/valhalla/distilbart-mnli-12-1) \| 87.08 \| 87.5 \|`
21	`\| [distilbart-mnli-12-3](https://huggingface.co/valhalla/distilbart-mnli-12-3) \| 88.1 \| 88.19 \|`
22	`\| [distilbart-mnli-12-6](https://huggingface.co/valhalla/distilbart-mnli-12-6) \| 89.19 \| 89.01 \|`
23	`\| [distilbart-mnli-12-9](https://huggingface.co/valhalla/distilbart-mnli-12-9) \| 89.56 \| 89.52 \|`
24
25
26	`This is a very simple and effective technique, as we can see the performance drop is very little.`
27
28	`Detailed performace trade-offs will be posted in this [sheet](https://docs.google.com/spreadsheets/d/1dQeUvAKpScLuhDV1afaPJRRAE55s2LpIzDVA5xfqxvk/edit?usp=sharing).`
29
30
31	`## Fine-tuning`
32	`If you want to train these models yourself, clone the [distillbart-mnli repo](https://github.com/patil-suraj/distillbart-mnli) and follow the steps below`
33
34	`Clone and install transformers from source`
35	```bash
36	`git clone https://github.com/huggingface/transformers.git`
37	`pip install -qqq -U ./transformers`
38	```
39
40	`Download MNLI data`
41	```bash
42	`python transformers/utils/download_glue_data.py --data_dir glue_data --tasks MNLI`
43	```
44
45	`Create student model`
46	```bash
47	`python create_student.py \`
48	`--teacher_model_name_or_path facebook/bart-large-mnli \`
49	`--student_encoder_layers 12 \`
50	`--student_decoder_layers 6 \`
51	`--save_path student-bart-mnli-12-6 \`
52	```
53
54	`Start fine-tuning`
55	```bash
56	`python run_glue.py args.json`
57	```
58
59	`You can find the logs of these trained models in this [wandb project](https://wandb.ai/psuraj/distilbart-mnli).`