-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added API support for local Zonos. #73
base: main
Are you sure you want to change the base?
Conversation
i would maybe separate that into a different api file without gradio - and have a api consuming gradio ui - as a refactor - if that is the goal also as a request - maybe trying to keep in alignment to openai's tts api this would allow easy integration for 3rd party systems without much hassle and with sane defaults |
Thank you @PhialsBasement , you are a lifesaver. |
thats more akin to what im proposing .. ( mind you uploading a voice file for every request to a remote mashine maybe suboptimal) we may even want to isolate loading transformer and hybrid at the same time so there is no need to swap over .. models are small enough to fit even in peanut cards - ( model loading time would hurt throughput ) ( optional pinning or full override-able but i would make that the default behaviour for any load bearing api) in an api scenario batch processor with queue could be prefixed with just what model to take as both are present in vram ( i work on that once we get a go ahead or at least a LGTM from the team) voice could be embedded as tensors on voice upload - and on usege we just pull in the tensor to save computation atm i support mp3/wav while always converting to wav as a baseline happy to help out .. but i think api and gradio should be clearly separated .. can someone from zyphra chip in here ? |
Just want to mention this thread as relevant for when a teammate comes around to see this PR: #37. |
agreed but that is different as there api has different sampling .. that should be compensate able once we know what they use |
With OAI endpoint and speakers from folder as returned voices it would work straight away in sillytavern. Unconditional emotions and it would be good "as-is". |
pretty much why i proposed it that way .. integration in hundreds of systems would work w/o any extra work |
@darkacorn just threw in some of your suggestions, check it out and tell me if its what you were thinking |
amazing thanks for pulling that in, good baseline |
I'm currently testing the openai endpoint, will report back if I run into any issues! |
Has anyone been able to create embeddings? I'm running into this error:
|
@ther3zz Fixed. Issue was in api.py, i was tryina use .query() on a CUDA stream handle, now its just a normal UNIX timestamp instead. |
Looks like it's working! |
Another issue I noticed is that MODEL_CACHE_DIR=/app/models doesnt seem to work. I'm not seeing the models cached there. I see them going here: /root/.cache/huggingface/hub/ |
Whack, ill look into it and see whats going on there |
Why can't we just load models from a folder we manually saved? I get that huggingface hub is used for docker, but not all of us are doing that. |
i dont think there is anything that prevents it .. you can even use it offline with the hf client |
I've had to change loading to from_local in gradio and all. The from_pretrained is hijacked away from torch. |
hope this helps: HF hub config: |
@ther3zz can you move this to issues tab over on my fork? |
But if we manually clone it, Sillytavern would be supporting one specific branch of Zonos that may or may not continue to have its other features or be maintained. We would have to tell everyone : "No, you can't use its latest update, you have to go git switch and then use the API from that particular branch!" |
welcome to opensource - you patch it your self - if you dont want or cant do that - use 11labs |
ill look into it soon |
my bad for not including instructions, you need to do
after this wait until the api is up and running, it will first download the models and once done it will open the endpoint. |
FYI, i was on a bit of a break since the last comment i left here. Ill pick up with working on it now so any suggestions should be reiterated in case i miss them, for this i have enabled the issues and discussions tab on the fork, please add issues and suggestions over there. Thank you. |
I agree this will become an issue sooner or later if i ever am unable to continue maintaining this. |
- Implement file-based storage for voice embeddings and metadata - Add support for custom voice naming during creation - Enable voice lookup by either name or ID - Create new /v1/audio/voices endpoint to list saved voices - Improve reliability with UUID-based voice ID generation - Enhance error handling with descriptive messages
@ther3zz just implemented your suggestions from #73 (comment) |
Again, until the API has been merged, Sillytavern can not support Zonos TTS. You seem to be misunderstanding me. It has to be merged. |
you are wrong on that - a the api has no splitting of long ctx so anything over 30 sec will fail - you dont need a custom intergration - openai tts compatible endpoint and just link the url of the api .. no custom integration needed pull the pr run the api and you are off to the races BY DEFAULT .. no custom stuff needed |
So one of the concatenation methods has to go into the API. Currently they're targeting the gradio. |
As far as I know, this here is only supported by Linux. What about Windows users? |
if you manage to run zonos on windows that will run on windows too - there are no exotic dependencies for the api |
Taken from the description: Installation |
and if you look lower there is a experimental link for windows installations . but i would not recommend it .. albeit some run it on windows just fine |
Docker? I am talking about windows without any other software. How would you install this rep? |
read the documentation and maybe dont spam an PR - open an issue .. and if someone cares enough they will answer - this is the wrong thread for that |
@Ph0rk0z ill look into this when i get back from work tonight @Sturmgewehr444 windows should work just fine but this PR focuses on adding API support primairly. If i have extra time ill look into streamlining it for windows but it will not be a main focus. |
why do we have an endpoint for creating a speaker embeddings ? |
could be seen as such and be optional sure - but generally you use the api for your self or just change that and pass that over - even with user auth - you fence that off to a different s3 bucket and fetch from there as its faster in throwput then the customer to have to send the voices he uses frequent all the time
|
also this only stores the torch tensor not the audio files / its to mimick oai tts api as close as possible and thats with fixed voices |
Sorry for the delay! Just tested this and its working perfectly! |
would someone mind telling me how I setup the voice IDs with the JSON? |
voice^^
|
thank you! And I was thinking a good way to improve this overall would be by incorporating 100ms silence that's included in the Zonos/Asset folder as the default value for the "prefix_audio" argument in the speechrequest instead of none because I read somewhere in the file docs or a commit that it increases quality and stability of generations |
any luck in getting this merged? |
Still interested in Zonos having a built-in API like this - any idea if it's possible for merge? |
Add REST API Endpoints
This PR adds FastAPI endpoints to Zonos, allowing programmatic access to the model's functionality alongside the existing Gradio interface.
Added Features
/models
endpoint to list available models/generate
endpoint for text-to-speech generation/speaker_embedding
endpoint for creating speaker embeddingsChanges
Testing
Tested with curl commands:
The implementation reuses existing model management code and runs alongside the Gradio interface on a different port.