Using OpenAI's whisper or whisper-faster and ffmpeg take a list of video and audio files and provide subtitles. These subtitle files can be used to feed it further into a LLM model and create a ML knowledge database to be queried by a user.
- Windows 10 64-bit, 19045.3693
- Python 3.9.13 64-bit
- ffmpeg v4.4-full_build
- Open-AI's whisper 20231117 release (CPU-based computation)
- Whister-faster 1.0.3 with support for large-v3 model (GPU-based computation)
- CUDA 12.5
- cuDNN v8.9.6 11.x release | note: cuDNN 9 is not supported yet | more info:
- nVidia GTX 1080
Let's get subtitles of the first 10 videos from Mr. Beast's youtube channel using CUDA
python -f --playlist_end 10 -od mrbeast
input_dir None
audio_filter None
output_name None
output_dir mrbeast
language en
beam_size 5
precision auto
device cuda
model_size small
nproc 8
keep False
verbose False
quiet False
playlist_start None
playlist_end 10
More Output
Parameters ['', '-f', 'bestaudio[ext=m4a]/best[ext=aac]', '--merge-output-format', 'mkv', '--yes-overwrites', '--playlist-end', '10']
[youtube:tab] Extracting URL:
[youtube:tab] @MrBeast/videos: Downloading webpage
[download] Downloading playlist: MrBeast - Videos
[youtube:tab] Playlist MrBeast - Videos: Downloading 10 items
[download] Downloading item 1 of 10
[youtube] Extracting URL:
[youtube] erLbbextvlY: Downloading webpage
[youtube] erLbbextvlY: Downloading ios player API JSON
[youtube] erLbbextvlY: Downloading android player API JSON
[youtube] erLbbextvlY: Downloading m3u8 information
[info] erLbbextvlY: Downloading 1 format(s): 140-15
[download] Destination: 7 Days Stranded On An Island [erLbbextvlY].m4a
[download] 100% of 20.77MiB in 00:00:00 at 21.64MiB/s00
[FixupM4a] Correcting container of "7 Days Stranded On An Island [erLbbextvlY].m4a"
[download] Downloading item 2 of 10
[youtube] Extracting URL:
[youtube] mKdjycj-7eE: Downloading webpage
[youtube] mKdjycj-7eE: Downloading ios player API JSON
[youtube] mKdjycj-7eE: Downloading android player API JSON
[youtube] mKdjycj-7eE: Downloading m3u8 information
[info] mKdjycj-7eE: Downloading 1 format(s): 140-14
[download] Destination: Stop This Train, Win a Lamborghini [mKdjycj-7eE].m4a
[download] 100% of 17.49MiB in 00:00:00 at 18.90MiB/s00
[FixupM4a] Correcting container of "Stop This Train, Win a Lamborghini [mKdjycj-7eE].m4a"
[download] Downloading item 3 of 10
[youtube] Extracting URL:
[youtube] tWYsfOSY9vY: Downloading webpage
[youtube] tWYsfOSY9vY: Downloading ios player API JSON
[youtube] tWYsfOSY9vY: Downloading android player API JSON
[youtube] tWYsfOSY9vY: Downloading m3u8 information
[info] tWYsfOSY9vY: Downloading 1 format(s): 140-15
[download] Destination: I Survived 7 Days In An Abandoned City [tWYsfOSY9vY].m4a
[download] 100% of 16.10MiB in 00:00:00 at 21.02MiB/s00
[FixupM4a] Correcting container of "I Survived 7 Days In An Abandoned City [tWYsfOSY9vY].m4a"
[download] Downloading item 4 of 10
[youtube] Extracting URL:
[youtube] KOEfDvr4DcQ: Downloading webpage
[youtube] KOEfDvr4DcQ: Downloading ios player API JSON
[youtube] KOEfDvr4DcQ: Downloading android player API JSON
[youtube] KOEfDvr4DcQ: Downloading m3u8 information
[info] KOEfDvr4DcQ: Downloading 1 format(s): 140-14
[download] Destination: Face Your Biggest Fear To Win $800,000 [KOEfDvr4DcQ].m4a
[download] 100% of 20.42MiB in 00:00:01 at 15.06MiB/s00
[FixupM4a] Correcting container of "Face Your Biggest Fear To Win $800,000 [KOEfDvr4DcQ].m4a"
[download] Downloading item 5 of 10
[youtube] Extracting URL:
[youtube] krsBRQbOPQ4: Downloading webpage
[youtube] krsBRQbOPQ4: Downloading ios player API JSON
[youtube] krsBRQbOPQ4: Downloading android player API JSON
[youtube] krsBRQbOPQ4: Downloading m3u8 information
[info] krsBRQbOPQ4: Downloading 1 format(s): 140-14
[download] Destination: $1 vs $250,000,000 Private Island! [krsBRQbOPQ4].m4a
[download] 100% of 15.72MiB in 00:00:00 at 18.04MiB/s00
[FixupM4a] Correcting container of "$1 vs $250,000,000 Private Island! [krsBRQbOPQ4].m4a"
[download] Downloading item 6 of 10
[youtube] Extracting URL:
[youtube] 7ESeQBeikKs: Downloading webpage
[youtube] 7ESeQBeikKs: Downloading ios player API JSON
[youtube] 7ESeQBeikKs: Downloading android player API JSON
[youtube] 7ESeQBeikKs: Downloading m3u8 information
[info] 7ESeQBeikKs: Downloading 1 format(s): 140-13
[download] Destination: Protect $500,000 Keep It! [7ESeQBeikKs].m4a
[download] 100% of 14.42MiB in 00:00:00 at 19.46MiB/s00
[FixupM4a] Correcting container of "Protect $500,000 Keep It! [7ESeQBeikKs].m4a"
[download] Downloading item 7 of 10
[youtube] Extracting URL:
[youtube] K_CbgLpvH9E: Downloading webpage
[youtube] K_CbgLpvH9E: Downloading ios player API JSON
[youtube] K_CbgLpvH9E: Downloading android player API JSON
[youtube] K_CbgLpvH9E: Downloading m3u8 information
[info] K_CbgLpvH9E: Downloading 1 format(s): 140-13
[download] Destination: I Spent 7 Days In Solitary Confinement [K_CbgLpvH9E].m4a
[download] 100% of 18.77MiB in 00:00:00 at 21.75MiB/s00
[FixupM4a] Correcting container of "I Spent 7 Days In Solitary Confinement [K_CbgLpvH9E].m4a"
[download] Downloading item 8 of 10
[youtube] Extracting URL:
[youtube] lOKASgtr6kU: Downloading webpage
[youtube] lOKASgtr6kU: Downloading ios player API JSON
[youtube] lOKASgtr6kU: Downloading android player API JSON
[youtube] lOKASgtr6kU: Downloading m3u8 information
[info] lOKASgtr6kU: Downloading 1 format(s): 140-13
[download] Destination: I Saved 100 Dogs From Dying [lOKASgtr6kU].m4a
[download] 100% of 13.93MiB in 00:00:00 at 20.23MiB/s00
[FixupM4a] Correcting container of "I Saved 100 Dogs From Dying [lOKASgtr6kU].m4a"
[download] Downloading item 9 of 10
[youtube] Extracting URL:
[youtube] 9RhWXPcKBI8: Downloading webpage
[youtube] 9RhWXPcKBI8: Downloading ios player API JSON
[youtube] 9RhWXPcKBI8: Downloading android player API JSON
[youtube] 9RhWXPcKBI8: Downloading m3u8 information
[info] 9RhWXPcKBI8: Downloading 1 format(s): 140-13
[download] Destination: Survive 100 Days Trapped, Win $500,000 [9RhWXPcKBI8].m4a
[download] 100% of 25.12MiB in 00:00:01 at 21.34MiB/s00
[FixupM4a] Correcting container of "Survive 100 Days Trapped, Win $500,000 [9RhWXPcKBI8].m4a"
[download] Downloading item 10 of 10
[youtube] Extracting URL:
[youtube] tnTPaLOaHz8: Downloading webpage
[youtube] tnTPaLOaHz8: Downloading ios player API JSON
[youtube] tnTPaLOaHz8: Downloading android player API JSON
[youtube] tnTPaLOaHz8: Downloading m3u8 information
[info] tnTPaLOaHz8: Downloading 1 format(s): 140-13
[download] Destination: $10,000 Every Day You Survive In A Grocery Store [tnTPaLOaHz8].m4a
[download] 100% of 19.94MiB in 00:00:00 at 20.84MiB/s00
[FixupM4a] Correcting container of "$10,000 Every Day You Survive In A Grocery Store [tnTPaLOaHz8].m4a"
[download] Finished downloading playlist: MrBeast - Videos
WARNING: [youtube] Skipping player responses from android clients (got player responses for video "aQvGIIdgFDM" instead of "erLbbextvlY")
Downloaded file C:\projects\mrbeast\7 Days Stranded On An Island [erLbbextvlY].m4a
Downloaded file C:\projects\mrbeast\Stop This Train, Win a Lamborghini [mKdjycj-7eE].m4a
Downloaded file C:\projects\mrbeast\I Survived 7 Days In An Abandoned City [tWYsfOSY9vY].m4a
Downloaded file C:\projects\mrbeast\Face Your Biggest Fear To Win $800,000 [KOEfDvr4DcQ].m4a
Downloaded file C:\projects\mrbeast\$1 vs $250,000,000 Private Island! [krsBRQbOPQ4].m4a
Downloaded file C:\projects\mrbeast\Protect $500,000 Keep It! [7ESeQBeikKs].m4a
Downloaded file C:\projects\mrbeast\I Spent 7 Days In Solitary Confinement [K_CbgLpvH9E].m4a
Downloaded file C:\projects\mrbeast\I Saved 100 Dogs From Dying [lOKASgtr6kU].m4a
Downloaded file C:\projects\mrbeast\Survive 100 Days Trapped, Win $500,000 [9RhWXPcKBI8].m4a
Downloaded file C:\projects\mrbeast\$10,000 Every Day You Survive In A Grocery Store [tnTPaLOaHz8].m4a
Returned code 0
Media files found 10
Processing file: C:\projects\mrbeast\7 Days Stranded On An Island [erLbbextvlY].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\7 Days Stranded On An Island [erLbbextvlY].m4a took 127.4 seconds
Processing file: C:\projects\mrbeast\Stop This Train, Win a Lamborghini [mKdjycj-7eE].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\Stop This Train, Win a Lamborghini [mKdjycj-7eE].m4a took 109.4 seconds
Processing file: C:\projects\mrbeast\I Survived 7 Days In An Abandoned City [tWYsfOSY9vY].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\I Survived 7 Days In An Abandoned City [tWYsfOSY9vY].m4a took 92.5 seconds
Processing file: C:\projects\mrbeast\Face Your Biggest Fear To Win $800,000 [KOEfDvr4DcQ].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\Face Your Biggest Fear To Win $800,000 [KOEfDvr4DcQ].m4a took 112.3 seconds
Processing file: C:\projects\mrbeast\$1 vs $250,000,000 Private Island! [krsBRQbOPQ4].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\$1 vs $250,000,000 Private Island! [krsBRQbOPQ4].m4a took 84.6 seconds
Processing file: C:\projects\mrbeast\Protect $500,000 Keep It! [7ESeQBeikKs].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\Protect $500,000 Keep It! [7ESeQBeikKs].m4a took 81.1 seconds
Processing file: C:\projects\mrbeast\I Spent 7 Days In Solitary Confinement [K_CbgLpvH9E].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\I Spent 7 Days In Solitary Confinement [K_CbgLpvH9E].m4a took 97.5 seconds
Processing file: C:\projects\mrbeast\I Saved 100 Dogs From Dying [lOKASgtr6kU].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\I Saved 100 Dogs From Dying [lOKASgtr6kU].m4a took 96.0 seconds
Processing file: C:\projects\mrbeast\Survive 100 Days Trapped, Win $500,000 [9RhWXPcKBI8].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\Survive 100 Days Trapped, Win $500,000 [9RhWXPcKBI8].m4a took 121.3 seconds
Processing file: C:\projects\mrbeast\$10,000 Every Day You Survive In A Grocery Store [tnTPaLOaHz8].m4a
Detected language 'en' with probability 1.000000
Begin transcription and creating subtitle file:
C:\projects\mrbeast\$10,000 Every Day You Survive In A Grocery Store [tnTPaLOaHz8].m4a took 77.1 seconds
Now check the results
directory and if necessary, verify that the speech within the audio matches the subtitles. Experiment with beam size, precision and model size.
Usage on GPUs with defaults
# default device is CUDA using nVidia GPU, CUDA dNN and Compute
python -f path-to-a-media-file
# Test with CPU in case you have no CUDA device
python -f path-to-a-media-file -d cpu
# Test using the original whisper package from OpenAI
python -f path-to-a-media-file
If you want to transcribe videos inside a specific directory then provide an input path like so
# Subtitle outputs will be placed in the same directory as the input directory
python -i C:\projects\Videos
Found 18 files
Settings as follows:
input_dir C:\projects\Videos
filename None
output_dir C:\projects\Videos
language en
beam_size 5
precision auto
device cuda
model_size small
nproc 8
If you want to transcribe videos and audio from the internet, depending on whether the yt-dl (Youtube Downloader) can download them as intermediate files, you can also provide a URL:
python -f
Even youtube videos from a particular channel can be downloaded and the entire set of files transcribed by the script:
python -f
I usually place a maximum number of videos to transcribe for testing and to save time:
python -f --playlist_end 10
Above, only 10 videos will be downloaded in audio format and transcribed.
But wait! There's MORE
And it doesn't have to be from youtube. Youtube Downloader gets better over time and has its own reporsitory. Here it uses a soundcloud link to download a song
python -f
In case you want to test a script standalone before using the whisper package. Here's the command
# not using the whisper AI or CUDA script here
# download manually the necessary file(s) and put them into a directory if needed
# -o is output filename
python -l -o projects/sick_beat
# then specify the local path into the rest of the scripts
python -f projects/sick_beat.m4a
Checkout the various formats and quality of the files avaiallable before downloading, since this step may or may not have an impact on the performance of transcription due to level of clarity in the speech.
For e.g. many artifacts in the audio from an old recording may impact transcription
python -l -F
More Output
Settings as follows:
list True
verbose False
keep False
format bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best
output_dir C:\Users\<username>\Videos
audio_only False
video_only False
bin yt-dlp.exe
merge False
merge_format mkv
quiet False
overwrite False
Starting process <Popen: returncode: None args: ['yt-dlp.exe', '-F', '>
[soundcloud] Extracting URL:
[soundcloud] prznt/prznt-x-2scratch-stay: Downloading info JSON
[soundcloud] 715263370: Downloading JSON metadata
[soundcloud] 715263370: Downloading JSON metadata
[soundcloud] 715263370: Downloading JSON metadata
[info] Available formats for 715263370:
hls_opus_64 opus audio only | ~1.39MiB 64k m3u8 | audio only unknown 64k
hls_mp3_128 mp3 audio only | ~2.78MiB 128k m3u8 | audio only unknown 128k
http_mp3_128 mp3 audio only | ~2.78MiB 128k http | audio only unknown 128k
Video file: '' returned code 0
And when downloading an intermediate file before the transcription, the output looks like:
# here I choose the last option which seems to have a higher TBR
python -l -f http_mp3_128
Starting process <Popen: returncode: None args: ['yt-dlp.exe', '>
[soundcloud] Extracting URL:
[soundcloud] prznt/prznt-x-2scratch-stay: Downloading info JSON
[soundcloud] 715263370: Downloading JSON metadata
[soundcloud] 715263370: Downloading JSON metadata
[soundcloud] 715263370: Downloading JSON metadata
[info] 715263370: Downloading 1 format(s): http_mp3_128
[download] Destination: Prznt x 2Scratch - Stay [715263370].mp3
[download] 100% of 2.71MiB in 00:00:00 at 6.74MiB/s:00
Media file: 'Prznt x 2Scratch - Stay [715263370].mp3' returned code 0
# transcribe this song or a recording
python -f 'Prznt x 2Scratch - Stay [715263370].mp3'
# try different settings
python -f 'Prznt x 2Scratch - Stay [715263370].mp3' --model_size large
Consider using a larger beam_size
hyper-parameter, which as you may be aware increases the time to complete the transcoding. This will probably rectify any issues with spoken languages with an accent that the tiny model struggles with. To get supported model sizes of faster-whisper issue the command -s
or --model_size
# Call the faster-whisper API to get supported sizes in that version number
python -s
Supported size in faster-whisper
* tiny.en
* tiny
* base.en
* base
* small.en
* small
* medium.en
* medium
* large-v1
* large-v2
* large-v3
* large
* distil-large-v2
* distil-medium.en
* distil-small.en
More information can be found at To install dependencies for whisper-gpu
pip install -r requirements.txt
More information can be found at To install dependencies for whisper-og
pip install git+ ffmpeg