Feature Request: Add support for Kokoro TTS #11050

broke-end-dev · 2025-01-03T05:28:06Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Devs, can you add support for Kokoro TTS? It's awesome in terms of accents and natural tone, considering it's size. It is currently one of the most popular models in Pandroker's TTS arena space on hugginface. Thanks!
https://huggingface.co/hexgrad/Kokoro-82M

Motivation

Many, including me want to deploy it on cpu/edge devices

Possible Implementation

No response

darkzbaron · 2025-01-04T09:32:43Z

+1

scalar27 · 2025-01-05T20:32:56Z

+1. The claim is that it's faster than realtime on the Mac.

logikstate · 2025-01-09T20:36:31Z

+1

ggerganov · 2025-01-09T20:55:33Z

+1

stopthinking102 · 2025-01-10T21:43:20Z

+1

OXKSA1 · 2025-01-13T19:59:12Z

+1

frankai · 2025-01-13T20:38:54Z

+1

verioussmith · 2025-01-13T21:26:32Z

+1 🎯

therealtimex · 2025-01-13T22:41:16Z

+1

razorback16 · 2025-01-14T09:06:41Z

+1

KonstantinSelyuk · 2025-01-14T09:33:07Z

+1

apepkuss · 2025-01-14T14:07:13Z

+1

logikstate · 2025-01-15T15:45:17Z

+2

signalstop · 2025-01-18T04:28:23Z

+1

yoshuzx · 2025-01-18T15:35:39Z

+1

henk717 · 2025-01-18T16:00:36Z

+1

YorkieDev · 2025-01-19T09:36:42Z

+1 Would be cool to see more tts options in llama.cpp

hexgrad · 2025-02-01T02:28:57Z

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

I'm sorry Dave, I'm afraid I can't do that.
#10784 (comment)
ˌIm sˈɔɹi dˈAv, ˌIm əfɹˈAd ˌI kˈænt dˈu ðˈæt.

sorry.mp4

TTS requires 2 models to be provided: an LLM and a Vocoder. The first one generates audio codes (tokens) from the provided input text, based on some voice settings. The second one converts the audio codes into a spectrogram. The spectrogram is then converted back to audio with inverse FFT.
#10784 (comment)
tˌitˌiˈɛs ɹəkwˈIəɹz tˈu mˈɑdᵊlz tə bi pɹəvˈIdᵻd: ɐn ˌɛlˌɛlˈɛm ænd ɐ vˈOkˌOdəɹ. ðə fˈɜɹst wˈʌn ʤˈɛnəɹˌAts ˈɔdiO kˈOdz (tˈOkᵊnz) fɹʌm ðə pɹəvˈIdᵻd ˈɪnpˌʊt tˈɛkst, bˈAst ˌɔn sˌʌm vˈYs sˈɛTɪŋz. ðə sˈɛkənd wˈʌn kənvˈɜɹts ði ˈɔdiO kˈOdz ˈɪntu ɐ spˈɛktɹəɡɹˌæm. ðə spˈɛktɹəɡɹˌæm ɪz ðˈɛn kənvˈɜɹTᵻd bˈæk tʊ ˈɔdiO wɪð ˈɪnvˌɜɹs ˌɛfˌɛftˈi.

longer.mp4

Not sure how to pass punctuation yet. Or even if this model supports it.
#10784 (comment)
nˌɑt ʃˈʊɹ hˌW tə pˈæs pˌʌŋkʧəwˈAʃən jˈɛt. ˌɔɹ ˈivən ɪf ðɪs mˈɑdᵊl səpˈɔɹts ɪt.

punctuation.mp4

namhkoh · 2025-02-10T21:39:19Z

@hexgrad are those reprods with a C++ implementation?

hexgrad · 2025-02-10T22:56:09Z

@namhkoh No, it's Python & PyTorch, as I mentioned #11050 (comment)

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

logikstate · 2025-02-14T15:25:20Z

There is an onnx/c# implimentation of Kokoro here https://github.com/Lyrcaxis/KokoroSharp

But I think? (not sure) its using espeak as the phonemiser? which is different? to how the Python & Pytorch version works? That use G2P?

Am I correct here? @hexgrad ?

namhkoh · 2025-02-14T16:29:07Z

I am currently seeking a c++ implementation.

hexgrad · 2025-02-14T18:20:48Z

You need G2P to make the whole thing work, but llama.cpp can probably disregard that piece for now—the c++ scope for llama.cpp would likely just be porting the modeling code in these 3 files:

csukuangfj · 2025-02-17T08:44:33Z

I am currently seeking a c++ implementation.

@namhkoh

We supported kokoro in sherpa-onnx a long time ago.

It provides not only C++ APIs for Kokoro v0.19 and Kokoro 1.0, but it also supports 11 other programming languages, e.g.,
C, Java, Kotlin, Swift, Dart, C#, Go, JavaScript, Object Pascal, Python.

You can find the usage doc at
https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/kokoro.html

broke-end-dev added the enhancement New feature or request label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add support for Kokoro TTS #11050

Feature Request: Add support for Kokoro TTS #11050

broke-end-dev commented Jan 3, 2025

darkzbaron commented Jan 4, 2025

scalar27 commented Jan 5, 2025 •

edited

Loading

logikstate commented Jan 9, 2025

ggerganov commented Jan 9, 2025

stopthinking102 commented Jan 10, 2025

OXKSA1 commented Jan 13, 2025

frankai commented Jan 13, 2025

verioussmith commented Jan 13, 2025

therealtimex commented Jan 13, 2025

razorback16 commented Jan 14, 2025

KonstantinSelyuk commented Jan 14, 2025

apepkuss commented Jan 14, 2025

logikstate commented Jan 15, 2025

signalstop commented Jan 18, 2025

yoshuzx commented Jan 18, 2025

henk717 commented Jan 18, 2025

YorkieDev commented Jan 19, 2025

hexgrad commented Feb 1, 2025

namhkoh commented Feb 10, 2025

hexgrad commented Feb 10, 2025

logikstate commented Feb 14, 2025

namhkoh commented Feb 14, 2025

hexgrad commented Feb 14, 2025

csukuangfj commented Feb 17, 2025

Feature Request: Add support for Kokoro TTS #11050

Feature Request: Add support for Kokoro TTS #11050

Comments

broke-end-dev commented Jan 3, 2025

Prerequisites

Feature Description

Motivation

Possible Implementation

darkzbaron commented Jan 4, 2025

scalar27 commented Jan 5, 2025 • edited Loading

logikstate commented Jan 9, 2025

ggerganov commented Jan 9, 2025

stopthinking102 commented Jan 10, 2025

OXKSA1 commented Jan 13, 2025

frankai commented Jan 13, 2025

verioussmith commented Jan 13, 2025

therealtimex commented Jan 13, 2025

razorback16 commented Jan 14, 2025

KonstantinSelyuk commented Jan 14, 2025

apepkuss commented Jan 14, 2025

logikstate commented Jan 15, 2025

signalstop commented Jan 18, 2025

yoshuzx commented Jan 18, 2025

henk717 commented Jan 18, 2025

YorkieDev commented Jan 19, 2025

hexgrad commented Feb 1, 2025

namhkoh commented Feb 10, 2025

hexgrad commented Feb 10, 2025

logikstate commented Feb 14, 2025

namhkoh commented Feb 14, 2025

hexgrad commented Feb 14, 2025

csukuangfj commented Feb 17, 2025

scalar27 commented Jan 5, 2025 •

edited

Loading