Whisper로 동영상의 음성을 텍스트로 추출하여 자막생성

음성인식

Whisper로 동영상의 음성을 텍스트로 추출하여 자막생성

도그사운드 2023. 10. 17. 17:14

Whisper는 OpenAI의 공개 음성인식 라이브러리다.

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

github.com

Whisper 를 활용하여 영상 음성을 추출하기

ffmpeg 설치가 필요하다

ffmpeg은 에프에프엠펙이라고 읽어야하나

모든 멀티미디어 자료의 인코딩과 디코딩을 목표로 하는 프로젝트로서 오픈소스이다.

우선적으로 설치가 필요하고 아래의 링크에서 받을 수 있다. 리눅스, 윈도우 등 다양한 OS의 드라이버가 아래의 링크에 있다.

https://ffmpeg.org/download.html

Download FFmpeg

If you find FFmpeg useful, you are welcome to contribute by donating. More downloading options Git Repositories Since FFmpeg is developed with Git, multiple repositories from developers and groups of developers are available. Release Verification All FFmpe

ffmpeg.org

윈도우용으로 받으려면 아래의 주소로 간다.

https://www.gyan.dev/ffmpeg/builds/

Builds - CODEX FFMPEG @ gyan.dev

FFmpeg is a widely-used cross-platform multimedia framework which can process almost all common and many uncommon media formats. It has over 1000 internal components to capture, decode, encode, modify, combine, stream media, and it can make use of dozens o

www.gyan.dev

압축을 풀어주고 ffmpeg폴더의 /bin 폴더를 path에 걸어준다.

테스트를 해볼 수 있다. ffmpeg -i 입력파일 출력파일

예시) ffmpeg -i demo.mp4 demo.wav

40Mb 2분 영상의 경우 약 1초 정도면 음성 추출이 된다. wav라 27Mb가 된다.

웹 인터페이스를 활용한 자막 생성

https://github.com/abdeladim-s/subsai/tree/main

GitHub - abdeladim-s/subsai: 🎞️ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and i

🎞️ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants 🎞️ - GitHub - abdeladim-s/subsai: 🎞️ Subtitles generation tool (Web-UI + CLI + Python p...

github.com

웹 인터페이스를 이용하면 자막 생성도 가능하다.

sub ai는 업로드 또는 local path를 명시하고 transcribe를 클릭하면 된다.

단 문제가 있는데 200Mb 제한이다. 이 문제를 해결하기 위해서는...

자세한 내용은 아래에 있다.

https://docs.streamlit.io/library/advanced-features/configuration

Streamlit Docs

Join the community Streamlit is more than just a way to make data apps, it's also a community of creators that share their apps and ideas and help each other make their work better. Please come join us on the community forum. We love to hear your questions

docs.streamlit.io

간단히는

사용자\사용자\.streamlit 폴더에 들어간다.

config.toml 텍스트 파일을 생성하고

[server]

maxMessageSize = 600 #600Mb까지 허용

해주면 된다.

파이썬으로 subai를 활용하여 음성-텍스트 전환

파이썬으로 subsai를 다룰 생각이라면 ffmpeg를 다른 경로로도 설치해야 한다.

powershell을 실행한다. 관리자 모드로 실행해야 한다.

Get-ExecutionPolicy 결과가 restricted이면 Set-ExecutionPolicy RemoteSigned로 변경

그렇지 않으면 프로그램 설치가 안되어서 계속 비어있는 폴더만 설치된다.

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((Invoke-WebRequest -UseBasicParsing -Uri https://chocolatey.org/install.ps1).Content)

를 실행한다. programdata 폴더에 chocolately가 설치될 것이다.

choco -v

choco 명령이 실행되는지 확인해보고

ffmpeg를 설치한다.

choco install ffmpeg

설치가 완료되면

C:\ProgramData\chocolatey\bin

C:\ProgramData\chocolatey\lib\ffmpeg\tools\ffmpeg\bin

을 path에 추가한다

파이썬 실행코드는 github를 참조

from subsai import SubsAI
import os
os.getcwd()
#os.chdir('../../study/xxx/xxx')
file = './xxxx.mp4'
if os.path.exists(file):
    print("파일있다.")
else:
    print("파일없다.")
subs_ai = SubsAI()
model = subs_ai.create_model('openai/whisper', {'model_type': 'base'})
subs = subs_ai.transcribe(file, model)
subs.save('.xxx.srt')

30분 500Mb 영상이나 생각보다 GPU 사용량은 많지 않다. 파이썬으로 작업할 경우 용량제한이 없다.

번역 모듈 사용하기

방화벽때문에 바로 파일이 다운되면서 진행되지 않는 경우 아래의 파일을 huggingface에서 받아야 한다.

유형	파일 속성	파일명
모델 및 설정	모델파일	pytorch_model.bin
모델 및 설정	설정파일	config.json
토크나이저	토크나이저 설정	vocab.json
	merge	merge.txt
	토크나이저 설정	tokenizer_config.json
	특수 토크나이저 설정	special_tokens_map.json

자막은 가급적 srt로 설정해야 오류가 적다. format = 'srt', 만약 sub일 경우 시간 셋팅 등 추가적인 작업이 발생할 수 있다.

import os
from pathlib import Path
import pysubs2
from subsai import Tools
os.getcwd()
os.chdir('../../study/xxx1')

for i in os.listdir('./xxx/'):
    subtitles_file = i[:-3]+'srt'
    subs = pysubs2.load(subtitles_file)
    Tools.available_translation_models()
    translation_model = 'facebook/m2m100_1.2B'
    source_language = 'English'
    target_language = 'Korean'
    format = 'srt' 
    translated_file = f"{subtitles_file}-{source_language}-{target_language}.{format}"

    translated_subs = Tools.translate(subs, source_language=source_language, target_language=target_language, model=translation_model)
    translated_subs.save(translated_file)
    print(f"translated file saved to {translated_file}")