Replies: 3 comments
-
Hey @Gnomsons. |
Beta Was this translation helpful? Give feedback.
-
I would also like a settings explanation on generally what each one does and why you would use it. That would be very helpful. |
Beta Was this translation helpful? Give feedback.
-
AngryBearr gave a good answer. If you want a little more verbage, go to https://docs.coqui.ai/en/latest/models/tortoise.html and scroll down to the Tortoise Model heading. If you are like me, you'll only be slightly more knowledgeable after reading it. All I can add is that some of the settings under GENERATE interact with the model in unexpected ways; i.e. if you have prepared two dramatically different TTS models, changing a setting like 'warmth' or Top P may produce one change in the generated audio. If you use the same setting changes with the second model, the settings changes may not produce the same type of result or no change at all. If you want to dive into a rabbit hole you can ask GEMINI or CHAT GPT for an answer. Here is part of one answer to a settings question from GEMINI - Question: What is the "P" variable in Tortoise TTS? It's a parameter used in nucleus sampling, a technique employed in the model to generate text. What does Top P do? Unfortunately, the only way I've dealt with this is lots and lots of tedious experimentation. |
Beta Was this translation helpful? Give feedback.
-
Hi and thanks for the work you've done!
I was wondering if there is a possibility to create a detailed text guide on how to use this project, what settings are responsible for what and how? Or at least a readmi with links where you can get basic information. I, being a beginner, faced the problem that the learning curve is too "steep", and to check how each setting affects the result without having basic knowledge is confusing. It seems like you can test everything on a small dataset and evaluate the result, but it seems that the relationship is not always linear. There are a million questions like "what is Validation Text Length Threshold and why is it 12?" or "what is Use DeepSpeed for Speed Bump?" etc. I have carefully watched from the videos on the youtube channel, great videos, thanks again! But a lot of them seem to be no longer relevant because the interface has changed or something else, and gathering information bit by bit doesn't leave the feeling that "this button still works and does the same thing as it did six months ago".
I realise that my request may sound a bit ridiculous, as github is not the most appropriate platform for a school-wikipedia format of education. But I would be extremely grateful (and I think I'm not the only one) if you could spare some time to create a textual description of what is responsible for what and why.
Beta Was this translation helpful? Give feedback.
All reactions