Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio conditioning #14

Open
generalsvr opened this issue Feb 11, 2025 · 1 comment
Open

Audio conditioning #14

generalsvr opened this issue Feb 11, 2025 · 1 comment

Comments

@generalsvr
Copy link

Great model! Can you explain how to use audio prefix described in the readme? Should prefix be added directly in the text prompt? Can you provide some examples?

@spaghettiSystems
Copy link
Contributor

You upload a prefix speech sample (3-5s~ recommended, but you can really do 20s+ if you liked) into the gradio and transcribe that and put it into the text box. Then enter the speech you wish for it to be continued with.

This is typically not necessary but if you want a very high level of control over the generation or even more speaker cloning fidelity it is helpful.

We typically recommend just putting a few ms of pure silence there to condition the model to output high quality outputs.

If you do put a speech sample there instead of silence it can help to set the other conditioning inputs appropriately for your prefix audio, or just set them to uncond and let the model figure it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants