빅데이터

텍스트 요약 비교

도그사운드 2024. 3. 6. 16:48

TextRank, BART, Pegasus 의 텍스트 요약결과 비교

 

케글에 비교 결과가 있다.

https://www.kaggle.com/discussions/general/481901

 

Text Summarization: TextRank vs. BART and PEGASUS | Kaggle

Text Summarization: TextRank vs. BART and PEGASUS.

www.kaggle.com

I had 4000 german speeches summarized with BART, PEGASUS and TextRank and evaluated them myself. The results can be seen below.

Statistic (All Averages)TextRankBARTPEGASUSOverallDescription

Score 6 7 5 6 The average score of the summary between 1-10
Text Length (In Words) 93 48 52 65 Concerning the summary
Sentence Length (In Words) 24 14 20 19 Concerning the summary
Compression Rate 73% 82% 75% 77% How much shorter was the summary over the speech
Faulty 0% 0% 10.36% 3.45% How many summaries were complete rubbish

-

For my specific use case, summarizing German speeches, even fine-tuned PEGASUS versions performed… mediocre at best. Funny thing is: TextRank was supposed to be the only extractive text summarization technique, but PEGASUS as well as BART generated summaries that were incredibly close to the original wordings.

TextRank's summaries weren't great either, but at least it was consistent. PEGASUS had so many complete rubbish summaries (something like "this this this that that"), it was flabbergasting.

 

결과적으로 BART가 제일 나은 것 같다는 평가

 

 

 

https://github.com/facebookresearch/fairseq/tree/main/examples/bart

 

https://huggingface.co/facebook/bart-large-cnn

 

facebook/bart-large-cnn · Hugging Face

BART (large-sized model), fine-tuned on CNN Daily Mail BART model pre-trained on English language, and fine-tuned on CNN Daily Mail. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translati

huggingface.co

 

https://huggingface.co/google/pegasus-xsum

 

google/pegasus-xsum · Hugging Face

Pegasus Models See Docs: here Original TF 1 code here Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019 Maintained by: @sshleifer Task: Summarization The following is copied from the authors' README. Mixed & Stochastic Chec

huggingface.co

 

TextRank(추출요약), BART(추상요약), Pegasus(추상요약)