README.md
8.7 KB · 152 lines · markdown Raw
1 ---
2 language:
3 - en
4 tags:
5 - summarization
6 datasets:
7 - xsum
8 metrics:
9 - rouge
10 widget:
11 - text: National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed
12 to buy rival Samba Financial Group for $15 billion in the biggest banking takeover
13 this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to
14 a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer
15 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio
16 the banks set when they signed an initial framework agreement in June.The offer
17 is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24%
18 higher than the level the shares traded at before the talks were made public.
19 Bloomberg News first reported the merger discussions.The new bank will have total
20 assets of more than $220 billion, creating the Gulf region’s third-largest lender.
21 The entity’s $46 billion market capitalization nearly matches that of Qatar National
22 Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion
23 of assets.
24 model-index:
25 - name: human-centered-summarization/financial-summarization-pegasus
26 results:
27 - task:
28 type: summarization
29 name: Summarization
30 dataset:
31 name: xsum
32 type: xsum
33 config: default
34 split: test
35 metrics:
36 - type: rouge
37 value: 35.2055
38 name: ROUGE-1
39 verified: true
40 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTA5OTZkY2YxMDU1YzE3NGJlMmE1OTg1NjlmNzcxOTg4YzY2OThlOTlkNGFhMGFjZWY4YjdiMjU5NDdmMWYzNSIsInZlcnNpb24iOjF9.ufBRoV2JoX4UlEfAUOYq7F3tZougwngdpKlnaC37tYXJU3omsR5hTsWM69hSdYO-k0cKUbAWCAMzjmoGwIaPAw
41 - type: rouge
42 value: 16.5689
43 name: ROUGE-2
44 verified: true
45 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWQwMmM2NjJjNzM1N2Y3NjZmMmE5NzNlNjRjNjEwNzNhNjcyZTRiMGRlODY3NWUyMGQ0YzZmMGFhODYzOTRmOSIsInZlcnNpb24iOjF9.AZZkbaYBZG6rw6-QHYjRlSl-p0gBT2EtJxwjIP7QYH5XIQjeoiQsTnDPIq25dSMDbmQLSZnpHC104ZctX0f_Dg
46 - type: rouge
47 value: 30.1285
48 name: ROUGE-L
49 verified: true
50 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTRjYThlMTllZjI4MGFiMDZhZTVkYmRjMTNhZDUzNTQ0OWQyNDQxMmQ5ODJiMmJiNGI3OTAzYjhiMzc2MTI4NCIsInZlcnNpb24iOjF9.zTHd3F4ZlgS-azl-ZVjOckcTrtrJmDOGWVaC3qQsvvn2UW9TnseNkmo7KBc3DJU7_NmlxWZArl1BdSetED0NCg
51 - type: rouge
52 value: 30.1706
53 name: ROUGE-LSUM
54 verified: true
55 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGMzZGFjNzVkYWI0NTJkMmZjZDQ0YjhiYjIxN2VkNmJjMTgwZTk1NjFlOGU2NjNjM2VjYTNlYTBhNTQ5MGZkNSIsInZlcnNpb24iOjF9.xQ2LoI3PwlEiXo1OT2o4Pq9o2thYCd9lSCKCWlLmZdxI5GxdsjcASBKmHKopzUcwCGBPR7zF95MHSAPyszOODA
56 - type: loss
57 value: 2.7092134952545166
58 name: loss
59 verified: true
60 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzQzODE0NDc5YTYzYjJlMWU2YTVjOGRjN2JmYWVkOWNkNTRlMTZlOWIyN2NiODJkMDljMjI3YzZmYzM3N2JjYSIsInZlcnNpb24iOjF9.Vv_pdeFuRMoKK3cPr5P6n7D6_18ChJX-2qcT0y4is3XX3mS98fk3U1AYEuy9nBHOwYR3o0U8WBgQ-Ya_FqefBg
61 - type: gen_len
62 value: 15.1414
63 name: gen_len
64 verified: true
65 verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjk5OTk3NWRiNjZlZmQzMmYwOTU2MmQwOWE1MDNlNTg3YWVkOTgwOTc2ZTQ0MTBiZjliOWMyZTYwMDI2MDUzYiIsInZlcnNpb24iOjF9.Zvj84JzIhM50rWTQ2GrEeOU7HrS8KsILH-8ApTcSWSI6kVnucY0MyW2ODxvRAa_zHeCygFW6Q13TFGrT5kLNAA
66 ---
67
68 ### PEGASUS for Financial Summarization
69
70 This model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies.
71
72 It is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
73
74 *Note: This model serves as a base version. For an even more advanced model with significantly enhanced performance, please check out our [advanced version](https://rapidapi.com/medoid-ai-medoid-ai-default/api/financial-summarization-advanced) on Rapid API. The advanced model offers more than a 16% increase in ROUGE scores (similarity to a human-generated summary) compared to our base model. Moreover, our advanced model also offers several convenient plans tailored to different use cases and workloads, ensuring a seamless experience for both personal and enterprise access.*
75
76 ### How to use
77 We provide a simple snippet of how to use this model for the task of financial summarization in PyTorch.
78
79 ```Python
80 from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
81
82 # Let's load the model and the tokenizer
83 model_name = "human-centered-summarization/financial-summarization-pegasus"
84 tokenizer = PegasusTokenizer.from_pretrained(model_name)
85 model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model
86 # just replace with TFPegasusForConditionalGeneration
87
88
89 # Some text to summarize here
90 text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
91
92 # Tokenize our text
93 # If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
94 input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
95
96 # Generate the output (Here, we use beam search but you can also use any other strategy you like)
97 output = model.generate(
98 input_ids,
99 max_length=32,
100 num_beams=5,
101 early_stopping=True
102 )
103
104 # Finally, we can print the generated summary
105 print(tokenizer.decode(output[0], skip_special_tokens=True))
106 # Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
107 ```
108
109 ## Evaluation Results
110 The results before and after the fine-tuning on our dataset are shown below:
111
112
113 | Fine-tuning | R-1 | R-2 | R-L | R-S |
114 |:-----------:|:-----:|:-----:|:------:|:-----:|
115 | Yes | 23.55 | 6.99 | 18.14 | 21.36 |
116 | No | 13.8 | 2.4 | 10.63 | 12.03 |
117
118
119 ## Citation
120
121 You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper:
122
123 > T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021.
124 > Towards Human-Centered Summarization: A Case Study on Financial News.
125 > In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.
126
127 BibTeX entry:
128
129 ```
130 @inproceedings{passali-etal-2021-towards,
131 title = "Towards Human-Centered Summarization: A Case Study on Financial News",
132 author = "Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios",
133 booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
134 month = apr,
135 year = "2021",
136 address = "Online",
137 publisher = "Association for Computational Linguistics",
138 url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
139 pages = "21--27",
140 }
141 ```
142
143 ## Support
144
145 Contact us at [info@medoid.ai](mailto:info@medoid.ai) if you are interested in a more sophisticated version of the model, trained on more articles and adapted to your needs!
146
147 More information about Medoid AI:
148 - Website: [https://www.medoid.ai](https://www.medoid.ai)
149 - LinkedIn: [https://www.linkedin.com/company/medoid-ai/](https://www.linkedin.com/company/medoid-ai/)
150
151
152