README.md
| 1 | --- |
| 2 | license: mit |
| 3 | --- |
| 4 | |
| 5 | # LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank |
| 6 | |
| 7 | This model was introduced in the paper [**LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression** (Pan et al, 2024)](https://arxiv.org/abs/2403.12968). It is a [XLM-RoBERTa (large-sized model)](https://huggingface.co/FacebookAI/xlm-roberta-large) finetuned to perform token classification for task agnostic prompt compression. The probability $p_{preserve}$ of each token $x_i$ is used as the metric for compression. This model is trained on [the extractive text compression dataset](https://huggingface.co/datasets/microsoft/MeetingBank-LLMCompressed) constructed with the methodology proposed in the [**LLMLingua-2**](https://arxiv.org/abs/2403.12968), using training examples from [MeetingBank (Hu et al, 2023)](https://meetingbank.github.io/) as the seed data. |
| 8 | |
| 9 | You can evaluate the model on downstream tasks such as question answering (QA) and summarization over compressed meeting transcripts using [this dataset](https://huggingface.co/datasets/microsoft/MeetingBank-QA-Summary). |
| 10 | |
| 11 | For more details, please check the home page of [LLMLingua-2](https://llmlingua.com/llmlingua2.html) and [LLMLingua Series](https://llmlingua.com/). |
| 12 | |
| 13 | ## Usage |
| 14 | ```python |
| 15 | from llmlingua import PromptCompressor |
| 16 | |
| 17 | compressor = PromptCompressor( |
| 18 | model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank", |
| 19 | use_llmlingua2=True |
| 20 | ) |
| 21 | |
| 22 | original_prompt = """John: So, um, I've been thinking about the project, you know, and I believe we need to, uh, make some changes. I mean, we want the project to succeed, right? So, like, I think we should consider maybe revising the timeline. |
| 23 | Sarah: I totally agree, John. I mean, we have to be realistic, you know. The timeline is, like, too tight. You know what I mean? We should definitely extend it. |
| 24 | """ |
| 25 | results = compressor.compress_prompt_llmlingua2( |
| 26 | original_prompt, |
| 27 | rate=0.6, |
| 28 | force_tokens=['\n', '.', '!', '?', ','], |
| 29 | chunk_end_tokens=['.', '\n'], |
| 30 | return_word_label=True, |
| 31 | drop_consecutive=True |
| 32 | ) |
| 33 | |
| 34 | print(results.keys()) |
| 35 | print(f"Compressed prompt: {results['compressed_prompt']}") |
| 36 | print(f"Original tokens: {results['origin_tokens']}") |
| 37 | print(f"Compressed tokens: {results['compressed_tokens']}") |
| 38 | print(f"Compression rate: {results['rate']}") |
| 39 | |
| 40 | # get the annotated results over the original prompt |
| 41 | word_sep = "\t\t|\t\t" |
| 42 | label_sep = " " |
| 43 | lines = results["fn_labeled_original_prompt"].split(word_sep) |
| 44 | annotated_results = [] |
| 45 | for line in lines: |
| 46 | word, label = line.split(label_sep) |
| 47 | annotated_results.append((word, '+') if label == '1' else (word, '-')) # list of tuples: (word, label) |
| 48 | print("Annotated results:") |
| 49 | for word, label in annotated_results[:10]: |
| 50 | print(f"{word} {label}") |
| 51 | ``` |
| 52 | |
| 53 | ## Citation |
| 54 | ``` |
| 55 | @article{wu2024llmlingua2, |
| 56 | title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression", |
| 57 | author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang", |
| 58 | url = "https://arxiv.org/abs/2403.12968", |
| 59 | journal = "ArXiv preprint", |
| 60 | volume = "abs/2403.12968", |
| 61 | year = "2024", |
| 62 | } |
| 63 | ``` |