README.md
3.2 KB · 63 lines · markdown Raw
1 ---
2 license: mit
3 ---
4
5 # LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank
6
7 This model was introduced in the paper [**LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression** (Pan et al, 2024)](https://arxiv.org/abs/2403.12968). It is a [XLM-RoBERTa (large-sized model)](https://huggingface.co/FacebookAI/xlm-roberta-large) finetuned to perform token classification for task agnostic prompt compression. The probability $p_{preserve}$ of each token $x_i$ is used as the metric for compression. This model is trained on [the extractive text compression dataset](https://huggingface.co/datasets/microsoft/MeetingBank-LLMCompressed) constructed with the methodology proposed in the [**LLMLingua-2**](https://arxiv.org/abs/2403.12968), using training examples from [MeetingBank (Hu et al, 2023)](https://meetingbank.github.io/) as the seed data.
8
9 You can evaluate the model on downstream tasks such as question answering (QA) and summarization over compressed meeting transcripts using [this dataset](https://huggingface.co/datasets/microsoft/MeetingBank-QA-Summary).
10
11 For more details, please check the home page of [LLMLingua-2](https://llmlingua.com/llmlingua2.html) and [LLMLingua Series](https://llmlingua.com/).
12
13 ## Usage
14 ```python
15 from llmlingua import PromptCompressor
16
17 compressor = PromptCompressor(
18 model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
19 use_llmlingua2=True
20 )
21
22 original_prompt = """John: So, um, I've been thinking about the project, you know, and I believe we need to, uh, make some changes. I mean, we want the project to succeed, right? So, like, I think we should consider maybe revising the timeline.
23 Sarah: I totally agree, John. I mean, we have to be realistic, you know. The timeline is, like, too tight. You know what I mean? We should definitely extend it.
24 """
25 results = compressor.compress_prompt_llmlingua2(
26 original_prompt,
27 rate=0.6,
28 force_tokens=['\n', '.', '!', '?', ','],
29 chunk_end_tokens=['.', '\n'],
30 return_word_label=True,
31 drop_consecutive=True
32 )
33
34 print(results.keys())
35 print(f"Compressed prompt: {results['compressed_prompt']}")
36 print(f"Original tokens: {results['origin_tokens']}")
37 print(f"Compressed tokens: {results['compressed_tokens']}")
38 print(f"Compression rate: {results['rate']}")
39
40 # get the annotated results over the original prompt
41 word_sep = "\t\t|\t\t"
42 label_sep = " "
43 lines = results["fn_labeled_original_prompt"].split(word_sep)
44 annotated_results = []
45 for line in lines:
46 word, label = line.split(label_sep)
47 annotated_results.append((word, '+') if label == '1' else (word, '-')) # list of tuples: (word, label)
48 print("Annotated results:")
49 for word, label in annotated_results[:10]:
50 print(f"{word} {label}")
51 ```
52
53 ## Citation
54 ```
55 @article{wu2024llmlingua2,
56 title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
57 author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
58 url = "https://arxiv.org/abs/2403.12968",
59 journal = "ArXiv preprint",
60 volume = "abs/2403.12968",
61 year = "2024",
62 }
63 ```