Code for using Elevenlabs voice cloning for fun with friends
-
/speak
- Choose a voice, provide text and receive an audio clip in a message (embeds natively on Desktop but on mobile may need an app like VLC to open)
-
/conversation
- Provide a script in the format
---Voice Name: hi how are you? --Voice Name: Pretty good!
and have the bot generate audio and splice it together into a conversation
- Provide a script in the format
-
Config for specifying
-
Source (local or YouTube URL)
-
Audio start and end time
-
Voice isolation using an open source ML model (Demucs)
-
Set clip sizes to automatically clip into smaller parts
-
- Clone or fork this repo
git clone https://github.com/safurrier/deepfake-radio/
- Set up the bot
-
Visit the Discord Developer portal to creata new bot application
-
In the top right, select
New Application
and give the app a name -
If you want to update the bot avatar, description etc do so here
-
On the left, click
Bot
-
Scroll down to find the section
MESSAGE CONTENT INTENT
and turn this on. Save Changes. -
Scroll back up and click
Reset Token
. Make sure to copy this information down as you'll need it to run your bot -
Create a file in the top level directory called
.env
. You can do this by making a copy of the fileexample.env
-
In the
.env
file, setBOT_TOKEN=$YOUR_TOKEN
on a line. E.g.BOT_TOKEN=WWKDEQX8JOoxW61WPb6dXIzklNjaMHJf2zuHGk
-
Join ElevenLabs and sign up for an account with API access. Try promo code for
BETA11
which may still be active for a free month of Creator tier access. -
Copy your ElevenLabs API token (Top Right -> Profile -> API Key Section) to the
.env
file underELEVEN_API_KEY
likeELEVEN_API_KEY=asdfaskjd123498sasw
-
Create some voice clones using ElevenLabs.
-
See section Add Custom Voices for how to do this programtically, or clone them on the ElevenLabs platform
-
Branch
voice_catalog
contains some custom voices ready to go that you can copy into thevoices
directory if you so desire. -
This is a good short guide on some tips to creating good voice clones
-
In general, quality > quantitity. (It's possible Elevenlabs doesn't use anything longer than 2 minutes for samples)
-
Diversity of voice samples and short clips can help
-
Consider adding many short clips and on the web UI editing the voice clones to select only the best ones
-
The best samples are of a clear voice with no interruptions or background noise (like a monologue or audio book).
- Get a python environment
-
This was written with a python 3.11 environment but may work with other major versions
-
I would suggest using Conda as an environment manager as it's fairly simple to use
-
Installing conda is outside of the scope of this tutorial (some instructions here), but if it's already installed you can run the make command to get a python env setup
conda create --name deepfake_radio python=3.11 -y
or using the make command make create-conda-env
And then activate the env
conda activate deepfake_radio
- Install the dependencies
make requirements or manually run:
pip install -r requirements.txt
chmod +x install_dependencies.sh
sh install_dependencies.sh
- Run the bot
python main.py
- Invite the bot to your server
-
Return to the Discord Developer portal
-
Select
OAuth 2
-> URL Generator -
Under Scopes, select
bot
-
Under Bot Permissions select at least
Send Messages
,Send Messages in Threads
,Embed Links
,Attach Links
, andUse Slash Commands
-
Copy the generated URL underneath (you may want to save this link somewhere if you plan on inviting the bot to multiple servers)
-
Select the server that you're a modmin on and add it to server. Authorize the bot.
- Use the bot! Use the
/speak
command to select a voice add text and generate audio. If no voices are present in the options, try running/update-voices
first. Have fun!
-
Go to Railway
-
Sign up using Github
-
Verify your account
-
Create a new project
-
Deploy from a Github Repo -> select your copy of this repo.
-
Add Variables -> Add the env variables for
BOT_TOKEN
andELEVEN_API_KEY
(optionally setUPLOAD_VOICES
if you want to upload voices before startup. Do not setPROCESS_VOICES
as it breaks the Runway deployment as of now)
- You will need audio samples to create voice clones in ElevenLabs. You can do this through the platform, or alternatively there are automated tools to do so included here. By default, the bot will process and upload custom voices to the ElevenLabs API before startup. Previously added voices are skipped.
There may or may not be a voice catalog
branch with many voice samples already set up. All you would have to do is copy over the voices you'd like to into the voices
directory and turn on env variable UPLOAD_VOICES
. You didn't hear it from me.
-
All voice clones belong in the
voices
directory and require aconfig.yaml
file. The two accepted sources for samples are a local mp3 file and a Youtube video link -
Each config file requires a
name
, an optionaldescription
and at minimum, a specifiedsource
andlocation
setting. -
There is an example config under
voices/example
:
name: Joe Biden
description: President Joe Biden
1:
source:
location: input/biden.mp3
type: file
2:
source:
location: https://www.youtube.com/watch?v=KADpsS8fbg8
type: youtube
# Note: start_time and end_time are optional
# but must be in format "HH:MM:SS"
start_time: "00:00:01"
end_time: "00:10:01"
isolate: True
clip_size: 60
Each number is a separate source, location is a URL for Youtube and filepath for file (relative to the config file). In general this should be input/your_sample.mp3
There are other optional configurations to facilitate creating good samples including start and end time, voice isolation and clip size.
The voice isolation uses an open source model demucs for voice isolation. This can be helpful if there is background noise, but does not always work well. It will also increase the processing time substantially.
-
Make sure in the .env file
PROCESS_VOICES
andUPLOAD_VOICES
is set to a value. Remove this env var if you don't want custom voices to be processed, and uploaded on bot startup. -
NOTE: Remove the example voice otherwise it will be uploaded as a voice clone to your ElevenLabs account
-
Run the bot with env vars set for
PROCESS_VOICES
(if adding new custom voices that need processing) and/orUPLOAD_VOICES
(if processed files present and no processing needed, just set this to upload to ElevenLabs) -
If you're looking for voice clone samples, there may or may not be a branch on this repo with available files.
[] Add CLI tool for processing + uploading voices
[] Add tag support for voice config
[] Improve voice selection option to sort by tags
[] Have the bot set the API key using a command
[] Add tests
Where we're going, we don't need tests. If the bot breaks on your server and your friends complain tell them to kick rocks or fix it themselves.
Original inspiration of the speak command from ElevenLabsBot