Audio conditioning #14

generalsvr · 2025-02-11T04:50:14Z

Great model! Can you explain how to use audio prefix described in the readme? Should prefix be added directly in the text prompt? Can you provide some examples?

spaghettiSystems · 2025-02-11T08:06:43Z

You upload a prefix speech sample (3-5s~ recommended, but you can really do 20s+ if you liked) into the gradio and transcribe that and put it into the text box. Then enter the speech you wish for it to be continued with.

This is typically not necessary but if you want a very high level of control over the generation or even more speaker cloning fidelity it is helpful.

We typically recommend just putting a few ms of pure silence there to condition the model to output high quality outputs.

If you do put a speech sample there instead of silence it can help to set the other conditioning inputs appropriately for your prefix audio, or just set them to uncond and let the model figure it out.

coezbek mentioned this issue Feb 27, 2025

Optional Prefix Audio #156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio conditioning #14

Audio conditioning #14

generalsvr commented Feb 11, 2025

spaghettiSystems commented Feb 11, 2025

Audio conditioning #14

Audio conditioning #14

Comments

generalsvr commented Feb 11, 2025

spaghettiSystems commented Feb 11, 2025