README.md
| 1 | --- |
| 2 | language: |
| 3 | - en |
| 4 | tags: |
| 5 | - summarization |
| 6 | datasets: |
| 7 | - xsum |
| 8 | metrics: |
| 9 | - rouge |
| 10 | widget: |
| 11 | - text: National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed |
| 12 | to buy rival Samba Financial Group for $15 billion in the biggest banking takeover |
| 13 | this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to |
| 14 | a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer |
| 15 | 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio |
| 16 | the banks set when they signed an initial framework agreement in June.The offer |
| 17 | is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% |
| 18 | higher than the level the shares traded at before the talks were made public. |
| 19 | Bloomberg News first reported the merger discussions.The new bank will have total |
| 20 | assets of more than $220 billion, creating the Gulf region’s third-largest lender. |
| 21 | The entity’s $46 billion market capitalization nearly matches that of Qatar National |
| 22 | Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion |
| 23 | of assets. |
| 24 | model-index: |
| 25 | - name: human-centered-summarization/financial-summarization-pegasus |
| 26 | results: |
| 27 | - task: |
| 28 | type: summarization |
| 29 | name: Summarization |
| 30 | dataset: |
| 31 | name: xsum |
| 32 | type: xsum |
| 33 | config: default |
| 34 | split: test |
| 35 | metrics: |
| 36 | - type: rouge |
| 37 | value: 35.2055 |
| 38 | name: ROUGE-1 |
| 39 | verified: true |
| 40 | verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTA5OTZkY2YxMDU1YzE3NGJlMmE1OTg1NjlmNzcxOTg4YzY2OThlOTlkNGFhMGFjZWY4YjdiMjU5NDdmMWYzNSIsInZlcnNpb24iOjF9.ufBRoV2JoX4UlEfAUOYq7F3tZougwngdpKlnaC37tYXJU3omsR5hTsWM69hSdYO-k0cKUbAWCAMzjmoGwIaPAw |
| 41 | - type: rouge |
| 42 | value: 16.5689 |
| 43 | name: ROUGE-2 |
| 44 | verified: true |
| 45 | verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWQwMmM2NjJjNzM1N2Y3NjZmMmE5NzNlNjRjNjEwNzNhNjcyZTRiMGRlODY3NWUyMGQ0YzZmMGFhODYzOTRmOSIsInZlcnNpb24iOjF9.AZZkbaYBZG6rw6-QHYjRlSl-p0gBT2EtJxwjIP7QYH5XIQjeoiQsTnDPIq25dSMDbmQLSZnpHC104ZctX0f_Dg |
| 46 | - type: rouge |
| 47 | value: 30.1285 |
| 48 | name: ROUGE-L |
| 49 | verified: true |
| 50 | verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTRjYThlMTllZjI4MGFiMDZhZTVkYmRjMTNhZDUzNTQ0OWQyNDQxMmQ5ODJiMmJiNGI3OTAzYjhiMzc2MTI4NCIsInZlcnNpb24iOjF9.zTHd3F4ZlgS-azl-ZVjOckcTrtrJmDOGWVaC3qQsvvn2UW9TnseNkmo7KBc3DJU7_NmlxWZArl1BdSetED0NCg |
| 51 | - type: rouge |
| 52 | value: 30.1706 |
| 53 | name: ROUGE-LSUM |
| 54 | verified: true |
| 55 | verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGMzZGFjNzVkYWI0NTJkMmZjZDQ0YjhiYjIxN2VkNmJjMTgwZTk1NjFlOGU2NjNjM2VjYTNlYTBhNTQ5MGZkNSIsInZlcnNpb24iOjF9.xQ2LoI3PwlEiXo1OT2o4Pq9o2thYCd9lSCKCWlLmZdxI5GxdsjcASBKmHKopzUcwCGBPR7zF95MHSAPyszOODA |
| 56 | - type: loss |
| 57 | value: 2.7092134952545166 |
| 58 | name: loss |
| 59 | verified: true |
| 60 | verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzQzODE0NDc5YTYzYjJlMWU2YTVjOGRjN2JmYWVkOWNkNTRlMTZlOWIyN2NiODJkMDljMjI3YzZmYzM3N2JjYSIsInZlcnNpb24iOjF9.Vv_pdeFuRMoKK3cPr5P6n7D6_18ChJX-2qcT0y4is3XX3mS98fk3U1AYEuy9nBHOwYR3o0U8WBgQ-Ya_FqefBg |
| 61 | - type: gen_len |
| 62 | value: 15.1414 |
| 63 | name: gen_len |
| 64 | verified: true |
| 65 | verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjk5OTk3NWRiNjZlZmQzMmYwOTU2MmQwOWE1MDNlNTg3YWVkOTgwOTc2ZTQ0MTBiZjliOWMyZTYwMDI2MDUzYiIsInZlcnNpb24iOjF9.Zvj84JzIhM50rWTQ2GrEeOU7HrS8KsILH-8ApTcSWSI6kVnucY0MyW2ODxvRAa_zHeCygFW6Q13TFGrT5kLNAA |
| 66 | --- |
| 67 | |
| 68 | ### PEGASUS for Financial Summarization |
| 69 | |
| 70 | This model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies. |
| 71 | |
| 72 | It is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf). |
| 73 | |
| 74 | *Note: This model serves as a base version. For an even more advanced model with significantly enhanced performance, please check out our [advanced version](https://rapidapi.com/medoid-ai-medoid-ai-default/api/financial-summarization-advanced) on Rapid API. The advanced model offers more than a 16% increase in ROUGE scores (similarity to a human-generated summary) compared to our base model. Moreover, our advanced model also offers several convenient plans tailored to different use cases and workloads, ensuring a seamless experience for both personal and enterprise access.* |
| 75 | |
| 76 | ### How to use |
| 77 | We provide a simple snippet of how to use this model for the task of financial summarization in PyTorch. |
| 78 | |
| 79 | ```Python |
| 80 | from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration |
| 81 | |
| 82 | # Let's load the model and the tokenizer |
| 83 | model_name = "human-centered-summarization/financial-summarization-pegasus" |
| 84 | tokenizer = PegasusTokenizer.from_pretrained(model_name) |
| 85 | model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model |
| 86 | # just replace with TFPegasusForConditionalGeneration |
| 87 | |
| 88 | |
| 89 | # Some text to summarize here |
| 90 | text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets." |
| 91 | |
| 92 | # Tokenize our text |
| 93 | # If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf' |
| 94 | input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids |
| 95 | |
| 96 | # Generate the output (Here, we use beam search but you can also use any other strategy you like) |
| 97 | output = model.generate( |
| 98 | input_ids, |
| 99 | max_length=32, |
| 100 | num_beams=5, |
| 101 | early_stopping=True |
| 102 | ) |
| 103 | |
| 104 | # Finally, we can print the generated summary |
| 105 | print(tokenizer.decode(output[0], skip_special_tokens=True)) |
| 106 | # Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion |
| 107 | ``` |
| 108 | |
| 109 | ## Evaluation Results |
| 110 | The results before and after the fine-tuning on our dataset are shown below: |
| 111 | |
| 112 | |
| 113 | | Fine-tuning | R-1 | R-2 | R-L | R-S | |
| 114 | |:-----------:|:-----:|:-----:|:------:|:-----:| |
| 115 | | Yes | 23.55 | 6.99 | 18.14 | 21.36 | |
| 116 | | No | 13.8 | 2.4 | 10.63 | 12.03 | |
| 117 | |
| 118 | |
| 119 | ## Citation |
| 120 | |
| 121 | You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper: |
| 122 | |
| 123 | > T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021. |
| 124 | > Towards Human-Centered Summarization: A Case Study on Financial News. |
| 125 | > In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics. |
| 126 | |
| 127 | BibTeX entry: |
| 128 | |
| 129 | ``` |
| 130 | @inproceedings{passali-etal-2021-towards, |
| 131 | title = "Towards Human-Centered Summarization: A Case Study on Financial News", |
| 132 | author = "Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios", |
| 133 | booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing", |
| 134 | month = apr, |
| 135 | year = "2021", |
| 136 | address = "Online", |
| 137 | publisher = "Association for Computational Linguistics", |
| 138 | url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4", |
| 139 | pages = "21--27", |
| 140 | } |
| 141 | ``` |
| 142 | |
| 143 | ## Support |
| 144 | |
| 145 | Contact us at [info@medoid.ai](mailto:info@medoid.ai) if you are interested in a more sophisticated version of the model, trained on more articles and adapted to your needs! |
| 146 | |
| 147 | More information about Medoid AI: |
| 148 | - Website: [https://www.medoid.ai](https://www.medoid.ai) |
| 149 | - LinkedIn: [https://www.linkedin.com/company/medoid-ai/](https://www.linkedin.com/company/medoid-ai/) |
| 150 | |
| 151 | |
| 152 | |