Added API support for local Zonos. #73

PhialsBasement · 2025-02-14T14:09:38Z

Add REST API Endpoints

This PR adds FastAPI endpoints to Zonos, allowing programmatic access to the model's functionality alongside the existing Gradio interface.

Added Features

/models endpoint to list available models
/generate endpoint for text-to-speech generation
/speaker_embedding endpoint for creating speaker embeddings

Changes

Added FastAPI integration
Model responses are streamed as WAV files
Added Pydantic models for request validation

Testing

Tested with curl commands:

GET /models works as expected
POST /generate successfully generates audio
POST /speaker_embedding successfully creates embeddings

The implementation reuses existing model management code and runs alongside the Gradio interface on a different port.

darkacorn · 2025-02-14T18:36:58Z

i would maybe separate that into a different api file without gradio -
as you use one or the other most likely not the same time

and have a api consuming gradio ui - as a refactor - if that is the goal

also as a request -

maybe trying to keep in alignment to openai's tts api
that is very much integrated and supported everywhere,optional features as separate parameters

this would allow easy integration for 3rd party systems without much hassle and with sane defaults

Steveboy123 · 2025-02-14T22:08:51Z

Thank you @PhialsBasement , you are a lifesaver.

darkacorn · 2025-02-15T10:33:09Z

thats more akin to what im proposing .. ( mind you uploading a voice file for every request to a remote mashine maybe suboptimal)

we may even want to isolate loading transformer and hybrid at the same time so there is no need to swap over .. models are small enough to fit even in peanut cards - ( model loading time would hurt throughput ) ( optional pinning or full override-able but i would make that the default behaviour for any load bearing api)

in an api scenario batch processor with queue could be prefixed with just what model to take as both are present in vram ( i work on that once we get a go ahead or at least a LGTM from the team)

voice could be embedded as tensors on voice upload - and on usege we just pull in the tensor to save computation

atm i support mp3/wav while always converting to wav as a baseline

happy to help out .. but i think api and gradio should be clearly separated .. can someone from zyphra chip in here ?

zaydek · 2025-02-15T10:59:22Z

Just want to mention this thread as relevant for when a teammate comes around to see this PR: #37.

darkacorn · 2025-02-15T11:14:27Z

agreed but that is different as there api has different sampling .. that should be compensate able once we know what they use
the model cond. has params for min p top k / top p / temp and rep_pen .. which are not exposed or used atm in oss only min_p for the time beeing

Ph0rk0z · 2025-02-15T15:11:05Z

With OAI endpoint and speakers from folder as returned voices it would work straight away in sillytavern. Unconditional emotions and it would be good "as-is".

darkacorn · 2025-02-15T15:50:09Z

With OAI endpoint and speakers from folder as returned voices it would work straight away in sillytavern. Unconditional emotions and it would be good "as-is".

pretty much why i proposed it that way .. integration in hundreds of systems would work w/o any extra work

PhialsBasement · 2025-02-16T02:21:27Z

@darkacorn just threw in some of your suggestions, check it out and tell me if its what you were thinking

darkacorn · 2025-02-16T06:10:15Z

amazing thanks for pulling that in, good baseline

ther3zz · 2025-02-16T13:49:31Z

I'm currently testing the openai endpoint, will report back if I run into any issues!
That being said, it makes sense to include a swagger docs endpoint as well (or at least some variable to enable/disable the docs page)

ther3zz · 2025-02-16T17:05:34Z

Has anyone been able to create embeddings? I'm running into this error:

{
    "detail": "'int' object has no attribute 'query'"
}

PhialsBasement · 2025-02-17T00:05:03Z

@ther3zz Fixed. Issue was in api.py, i was tryina use .query() on a CUDA stream handle, now its just a normal UNIX timestamp instead.

ther3zz · 2025-02-17T01:08:11Z

@ther3zz Fixed. Issue was in api.py, i was tryina use .query() on a CUDA stream handle, now its just a normal UNIX timestamp instead.

Looks like it's working!

ther3zz · 2025-02-17T01:16:30Z

Another issue I noticed is that MODEL_CACHE_DIR=/app/models doesnt seem to work. I'm not seeing the models cached there. I see them going here: /root/.cache/huggingface/hub/

PhialsBasement · 2025-02-17T03:10:18Z

Whack, ill look into it and see whats going on there

Ph0rk0z · 2025-02-17T13:36:57Z

Why can't we just load models from a folder we manually saved? I get that huggingface hub is used for docker, but not all of us are doing that.

darkacorn · 2025-02-17T14:02:26Z

i dont think there is anything that prevents it .. you can even use it offline with the hf client

Ph0rk0z · 2025-02-17T22:12:27Z

I've had to change loading to from_local in gradio and all. The from_pretrained is hijacked away from torch.

mathematicalmichael · 2025-02-18T00:24:00Z

hope this helps: HF hub config:
https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#hfhubcache

PhialsBasement · 2025-02-18T09:31:24Z

@ther3zz can you move this to issues tab over on my fork?

Sturmgewehr444 · 2025-03-01T23:16:36Z

But if we manually clone it, Sillytavern would be supporting one specific branch of Zonos that may or may not continue to have its other features or be maintained. We would have to tell everyone : "No, you can't use its latest update, you have to go git switch and then use the API from that particular branch!"

darkacorn · 2025-03-01T23:39:27Z

welcome to opensource - you patch it your self - if you dont want or cant do that - use 11labs

PhialsBasement · 2025-03-02T03:23:49Z

@PhialsBasement Any chance on getting these suggestions implemented in your PR? #73 (comment) #73 (comment)

ill look into it soon

PhialsBasement · 2025-03-02T03:31:46Z

Are there beginner-friendly instructions for the API setup I got the UI to work but I can't get the API part set up.

Is the API container running at all or do you mean you're trying to send API requests but that is not working?

my bad for not including instructions, you need to do

docker compose build
docker compose up zonos-api

after this wait until the api is up and running, it will first download the models and once done it will open the endpoint.

PhialsBasement · 2025-03-02T03:33:03Z

FYI, i was on a bit of a break since the last comment i left here. Ill pick up with working on it now so any suggestions should be reiterated in case i miss them, for this i have enabled the issues and discussions tab on the fork, please add issues and suggestions over there. Thank you.

PhialsBasement · 2025-03-02T03:49:42Z

But if we manually clone it, Sillytavern would be supporting one specific branch of Zonos that may or may not continue to have its other features or be maintained. We would have to tell everyone : "No, you can't use its latest update, you have to go git switch and then use the API from that particular branch!"

I agree this will become an issue sooner or later if i ever am unable to continue maintaining this.

- Implement file-based storage for voice embeddings and metadata - Add support for custom voice naming during creation - Enable voice lookup by either name or ID - Create new /v1/audio/voices endpoint to list saved voices - Improve reliability with UUID-based voice ID generation - Enhance error handling with descriptive messages

PhialsBasement · 2025-03-02T04:34:17Z

@ther3zz just implemented your suggestions from #73 (comment)

Sturmgewehr444 · 2025-03-02T15:51:52Z

you patch it your self

Again, until the API has been merged, Sillytavern can not support Zonos TTS. You seem to be misunderstanding me. It has to be merged.

darkacorn · 2025-03-02T16:07:40Z

you patch it your self

Again, until the API has been merged, Sillytavern can not support Zonos TTS. You seem to be misunderstanding me. It has to be merged.

you are wrong on that - a the api has no splitting of long ctx so anything over 30 sec will fail - you dont need a custom intergration - openai tts compatible endpoint and just link the url of the api .. no custom integration needed

pull the pr run the api and you are off to the races BY DEFAULT .. no custom stuff needed

Ph0rk0z · 2025-03-02T17:28:59Z

So one of the concatenation methods has to go into the API. Currently they're targeting the gradio.

Sturmgewehr444 · 2025-03-02T17:34:12Z

pull the pr run the api and you are off to the races BY DEFAULT .. no custom stuff needed#

As far as I know, this here is only supported by Linux. What about Windows users?

darkacorn · 2025-03-02T18:08:50Z

pull the pr run the api and you are off to the races BY DEFAULT .. no custom stuff needed#

As far as I know, this here is only supported by Linux. What about Windows users?

if you manage to run zonos on windows that will run on windows too - there are no exotic dependencies for the api

Sturmgewehr444 · 2025-03-02T18:15:11Z

if you manage to run zonos on windows that will run on windows too - there are no exotic dependencies for the api

Taken from the description:

Installation
At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).

darkacorn · 2025-03-02T18:17:11Z

and if you look lower there is a experimental link for windows installations . but i would not recommend it .. albeit some run it on windows just fine

Sturmgewehr444 · 2025-03-02T18:18:06Z

and if you look lower there is a experimental link for windows installations . but i would not recommend it .. albeit some run it on windows just fine

Docker? I am talking about windows without any other software. How would you install this rep?

darkacorn · 2025-03-02T18:39:54Z

and if you look lower there is a experimental link for windows installations . but i would not recommend it .. albeit some run it on windows just fine

Docker? I am talking about windows without any other software. How would you install this rep?

read the documentation and maybe dont spam an PR - open an issue .. and if someone cares enough they will answer - this is the wrong thread for that

PhialsBasement · 2025-03-03T02:22:26Z

So one of the concatenation methods has to go into the API. Currently they're targeting the gradio.

@Ph0rk0z ill look into this when i get back from work tonight

@Sturmgewehr444 windows should work just fine but this PR focuses on adding API support primairly. If i have extra time ill look into streamlining it for windows but it will not be a main focus.

Napolitain · 2025-03-03T08:07:51Z

why do we have an endpoint for creating a speaker embeddings ?
from my understanding, you make the service stateful. Should we have a stateless service, where we generate the embeddings inside the generation of the audio, and if its repeated, then it will be cached instead? It seems a better design to let the backend handle its resources

darkacorn · 2025-03-03T08:53:28Z

could be seen as such and be optional sure - but generally you use the api for your self or just change that and pass that over - even with user auth - you fence that off to a different s3 bucket and fetch from there as its faster in throwput then the customer to have to send the voices he uses frequent all the time

there are many ways to rome - you can most certainly change that part
however .. its a convinience factor for most people who use the api for a local integration - which is the majority of the customer base

darkacorn · 2025-03-03T08:54:27Z

also this only stores the torch tensor not the audio files / its to mimick oai tts api as close as possible and thats with fixed voices

ther3zz · 2025-03-03T16:13:55Z

@ther3zz just implemented your suggestions from #73 (comment)

Sorry for the delay!

Just tested this and its working perfectly!

YellowRoseCx · 2025-03-08T16:29:05Z

would someone mind telling me how I setup the voice IDs with the JSON?

darkacorn · 2025-03-08T16:32:31Z

would someone mind telling me how I setup the voice IDs with the JSON?

voice^^

curl -X POST "http://localhost:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Zyphra/Zonos-v0.1-transformer",
    "input": "Hello, this is a test of the Zonos API.",
    "voice": "voice_12345_0",
    "speed": 1.0,
    "language": "en-us",
    "emotion": {
      "happiness": 1.0
    },
    "response_format": "mp3"
  }' ```

YellowRoseCx · 2025-03-08T16:43:38Z

would someone mind telling me how I setup the voice IDs with the JSON?

voice^^

curl -X POST "http://localhost:8000/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Zyphra/Zonos-v0.1-transformer",
    "input": "Hello, this is a test of the Zonos API.",
    "voice": "voice_12345_0",
    "speed": 1.0,
    "language": "en-us",
    "emotion": {
      "happiness": 1.0
    },
    "response_format": "mp3"
  }' ```

thank you! And I was thinking a good way to improve this overall would be by incorporating 100ms silence that's included in the Zonos/Asset folder as the default value for the "prefix_audio" argument in the speechrequest instead of none because I read somewhere in the file docs or a commit that it increases quality and stability of generations

ther3zz · 2025-03-20T13:34:09Z

any luck in getting this merged?
I've been using it and its working really well

kleineluka · 2025-03-31T06:03:48Z

Still interested in Zonos having a built-in API like this - any idea if it's possible for merge?

Update gradio_interface.py

cf0824f

PhialsBasement mentioned this pull request Feb 14, 2025

No API? #72

Open

PhialsBasement added 2 commits February 15, 2025 02:02

Fixed spawn error

36386ef

Added some Stability impovements. Ready for merge when needed.

438a749

PhialsBasement added 3 commits February 16, 2025 13:18

Create api.py

21d38dc

Split API and Gradio

86e3f94

Split into API and UI

d7e5668

PhialsBasement added 2 commits February 16, 2025 13:44

Update api.py

1bff777

Update Dockerfile to accomodate MP3

53e1239

Update api.py

eedcfa8

PhialsBasement added 2 commits March 2, 2025 14:42

Update README.md

c672911

Update README.md

70b47dc

Added Top P, Top K and Min P control for sampling

b60f3bd

Added API support for local Zonos. #73

Are you sure you want to change the base?

Added API support for local Zonos. #73

Conversation

PhialsBasement commented Feb 14, 2025

Add REST API Endpoints

Added Features

Changes

Testing

darkacorn commented Feb 14, 2025 • edited Loading

Steveboy123 commented Feb 14, 2025

darkacorn commented Feb 15, 2025 • edited Loading

zaydek commented Feb 15, 2025

darkacorn commented Feb 15, 2025 • edited Loading

Ph0rk0z commented Feb 15, 2025

darkacorn commented Feb 15, 2025

PhialsBasement commented Feb 16, 2025

darkacorn commented Feb 16, 2025

ther3zz commented Feb 16, 2025 • edited Loading

ther3zz commented Feb 16, 2025

PhialsBasement commented Feb 17, 2025

ther3zz commented Feb 17, 2025

ther3zz commented Feb 17, 2025

PhialsBasement commented Feb 17, 2025

Ph0rk0z commented Feb 17, 2025

darkacorn commented Feb 17, 2025

Ph0rk0z commented Feb 17, 2025

mathematicalmichael commented Feb 18, 2025

PhialsBasement commented Feb 18, 2025

Sturmgewehr444 commented Mar 1, 2025 • edited Loading

darkacorn commented Mar 1, 2025

PhialsBasement commented Mar 2, 2025

PhialsBasement commented Mar 2, 2025 • edited Loading

PhialsBasement commented Mar 2, 2025 • edited Loading

PhialsBasement commented Mar 2, 2025

PhialsBasement commented Mar 2, 2025

Sturmgewehr444 commented Mar 2, 2025 • edited Loading

darkacorn commented Mar 2, 2025 • edited Loading

Ph0rk0z commented Mar 2, 2025

Sturmgewehr444 commented Mar 2, 2025 • edited Loading

darkacorn commented Mar 2, 2025

Sturmgewehr444 commented Mar 2, 2025

darkacorn commented Mar 2, 2025

Sturmgewehr444 commented Mar 2, 2025

darkacorn commented Mar 2, 2025

PhialsBasement commented Mar 3, 2025 • edited Loading

Napolitain commented Mar 3, 2025

darkacorn commented Mar 3, 2025

darkacorn commented Mar 3, 2025 • edited Loading

ther3zz commented Mar 3, 2025

YellowRoseCx commented Mar 8, 2025

darkacorn commented Mar 8, 2025 • edited Loading

YellowRoseCx commented Mar 8, 2025

ther3zz commented Mar 20, 2025

kleineluka commented Mar 31, 2025

darkacorn commented Feb 14, 2025 •

edited

Loading

darkacorn commented Feb 15, 2025 •

edited

Loading

darkacorn commented Feb 15, 2025 •

edited

Loading

ther3zz commented Feb 16, 2025 •

edited

Loading

Sturmgewehr444 commented Mar 1, 2025 •

edited

Loading

PhialsBasement commented Mar 2, 2025 •

edited

Loading

PhialsBasement commented Mar 2, 2025 •

edited

Loading

Sturmgewehr444 commented Mar 2, 2025 •

edited

Loading

darkacorn commented Mar 2, 2025 •

edited

Loading

Sturmgewehr444 commented Mar 2, 2025 •

edited

Loading

PhialsBasement commented Mar 3, 2025 •

edited

Loading

darkacorn commented Mar 3, 2025 •

edited

Loading

darkacorn commented Mar 8, 2025 •

edited

Loading