텍스트 요약 비교
TextRank, BART, Pegasus 의 텍스트 요약결과 비교
케글에 비교 결과가 있다.
https://www.kaggle.com/discussions/general/481901
Text Summarization: TextRank vs. BART and PEGASUS | Kaggle
Text Summarization: TextRank vs. BART and PEGASUS.
www.kaggle.com
I had 4000 german speeches summarized with BART, PEGASUS and TextRank and evaluated them myself. The results can be seen below.
Statistic (All Averages)TextRankBARTPEGASUSOverallDescription
Score | 6 | 7 | 5 | 6 | The average score of the summary between 1-10 |
Text Length (In Words) | 93 | 48 | 52 | 65 | Concerning the summary |
Sentence Length (In Words) | 24 | 14 | 20 | 19 | Concerning the summary |
Compression Rate | 73% | 82% | 75% | 77% | How much shorter was the summary over the speech |
Faulty | 0% | 0% | 10.36% | 3.45% | How many summaries were complete rubbish |
-
For my specific use case, summarizing German speeches, even fine-tuned PEGASUS versions performed… mediocre at best. Funny thing is: TextRank was supposed to be the only extractive text summarization technique, but PEGASUS as well as BART generated summaries that were incredibly close to the original wordings.
TextRank's summaries weren't great either, but at least it was consistent. PEGASUS had so many complete rubbish summaries (something like "this this this that that"), it was flabbergasting.
결과적으로 BART가 제일 나은 것 같다는 평가
https://github.com/facebookresearch/fairseq/tree/main/examples/bart
https://huggingface.co/facebook/bart-large-cnn
facebook/bart-large-cnn · Hugging Face
BART (large-sized model), fine-tuned on CNN Daily Mail BART model pre-trained on English language, and fine-tuned on CNN Daily Mail. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translati
huggingface.co
https://huggingface.co/google/pegasus-xsum
google/pegasus-xsum · Hugging Face
Pegasus Models See Docs: here Original TF 1 code here Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019 Maintained by: @sshleifer Task: Summarization The following is copied from the authors' README. Mixed & Stochastic Chec
huggingface.co
TextRank(추출요약), BART(추상요약), Pegasus(추상요약)