Generate subtitles for video file
Using the paid Google Cloud Speech-To-Text API
This program requires a Google account and an API key: Create project on Google Cloud
$ ./srtgen.py
usage
srtgen.py --apikey path/to/keyfile.json path/to/input-video.mp4
environment variables
GOOGLE_APPLICATION_CREDENTIALS=path/to/keyfile.json srtgen.py path/to/input-video.mp4
keyfile
This program requires a Google account and an API key
https://console.cloud.google.com/projectcreate
subtitle is written to stdout and output/xxxxxx-input-video.mp4/output_file.srt
where xxxxxx
is the sha1 hash of the input video file
temporary files are stored in output/xxxxxx-input-video.mp4/
- workaround size limit in google API
- no need for Google Cloud Storage =
gs
protocol - duration is limited to 60 seconds
- file size is limited to 10485760 bytes
- no need for Google Cloud Storage =
- ffmpeg
- python
- pydub
- google.cloud.speech
- API key
- pricing
- speech recognition needs lots of space and time = there is no free lunch
- https://cloud.google.com/speech-to-text/pricing#pricing_table
- first hour is free
- TODO one hour per month or one hour per google account?
- Speech Recognition without Data Logging: $0.006 / 15 seconds = $0.024 / 1 minute = about $1.50 / 1 hour
- Speech Recognition with Data Logging: $0.004 / 15 seconds = $0.016 / 1 minute = about $1.00 / 1 hour
- Data Logging = feedback of manually corrected text to improve quality of service
- TODO implement upload of corrected text
- first hour is free
- TODO Automatic punctuation
- https://github.com/plutowang/generate-video-subtitle
- https://cloud.google.com/speech-to-text/docs/basics
- edit subtitles
- translate subtitles
- https://github.com/BingLingGroup/autosub
- online speech recognition
- Xfyun
- Baidu
- online speech recognition
- https://github.com/abhirooptalasila/AutoSub
- offline speech recognition
- Mozilla DeepSpeech
- lower quality than google speech
- limited by user hardware: space, time, cpu instruction set (binaries dont run on weak cpus)
- offline speech recognition
- https://github.com/topics/subtitles-generator
- use
speech_recognition
module, so srtgen can use multiple backend services- support offline speech recognition
- mozilla deepspeech
- vosk
- Picovoice https://picovoice.ai/docs/picovoice/
- we need a service that returns timestamps for every word
- google cloud speeech: enable_word_time_offsets=True
- alternative: synchronize words and audio waveform
- https://github.com/otsaloma?tab=stars&q=subtitle
- https://github.com/smacke/ffsubsync Automagically synchronize subtitles with video.
- https://github.com/kaegi/alass "Automatic Language-Agnostic Subtitle Synchronization"
- https://github.com/otsaloma?tab=stars&q=subtitle
- support offline speech recognition
- hybrid of offline and online speech recognition
- deepspeech for offline speech recognition
- google for online speech recognition
- can deepspeech return confidence values?
- run deepspeech with different models? (and manually select the best result?)
- automatic postprocessing
- reduce manual work
- split long sentences
- merge short sentences