Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add support for Kokoro TTS #11050

Open
4 tasks done
broke-end-dev opened this issue Jan 3, 2025 · 24 comments
Open
4 tasks done

Feature Request: Add support for Kokoro TTS #11050

broke-end-dev opened this issue Jan 3, 2025 · 24 comments
Labels
enhancement New feature or request

Comments

@broke-end-dev
Copy link

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Devs, can you add support for Kokoro TTS? It's awesome in terms of accents and natural tone, considering it's size. It is currently one of the most popular models in Pandroker's TTS arena space on hugginface. Thanks!
https://huggingface.co/hexgrad/Kokoro-82M

Motivation

Many, including me want to deploy it on cpu/edge devices

Possible Implementation

No response

@broke-end-dev broke-end-dev added the enhancement New feature or request label Jan 3, 2025
@darkzbaron
Copy link

+1

@scalar27
Copy link

scalar27 commented Jan 5, 2025

+1. The claim is that it's faster than realtime on the Mac.

@logikstate
Copy link

+1

4 similar comments
@ggerganov
Copy link
Member

+1

@stopthinking102
Copy link

+1

@OXKSA1
Copy link

OXKSA1 commented Jan 13, 2025

+1

@frankai
Copy link

frankai commented Jan 13, 2025

+1

@verioussmith
Copy link

+1 🎯

@therealtimex
Copy link

+1

3 similar comments
@razorback16
Copy link

+1

@KonstantinSelyuk
Copy link

+1

@apepkuss
Copy link

+1

@logikstate
Copy link

+2

@signalstop
Copy link

+1

2 similar comments
@yoshuzx
Copy link

yoshuzx commented Jan 18, 2025

+1

@henk717
Copy link

henk717 commented Jan 18, 2025

+1

@YorkieDev
Copy link

+1 Would be cool to see more tts options in llama.cpp

@hexgrad
Copy link

hexgrad commented Feb 1, 2025

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

I'm sorry Dave, I'm afraid I can't do that.
#10784 (comment)
ˌIm sˈɔɹi dˈAv, ˌIm əfɹˈAd ˌI kˈænt dˈu ðˈæt.

sorry.mp4

TTS requires 2 models to be provided: an LLM and a Vocoder. The first one generates audio codes (tokens) from the provided input text, based on some voice settings. The second one converts the audio codes into a spectrogram. The spectrogram is then converted back to audio with inverse FFT.
#10784 (comment)
tˌitˌiˈɛs ɹəkwˈIəɹz tˈu mˈɑdᵊlz tə bi pɹəvˈIdᵻd: ɐn ˌɛlˌɛlˈɛm ænd ɐ vˈOkˌOdəɹ. ðə fˈɜɹst wˈʌn ʤˈɛnəɹˌAts ˈɔdiO kˈOdz (tˈOkᵊnz) fɹʌm ðə pɹəvˈIdᵻd ˈɪnpˌʊt tˈɛkst, bˈAst ˌɔn sˌʌm vˈYs sˈɛTɪŋz. ðə sˈɛkənd wˈʌn kənvˈɜɹts ði ˈɔdiO kˈOdz ˈɪntu ɐ spˈɛktɹəɡɹˌæm. ðə spˈɛktɹəɡɹˌæm ɪz ðˈɛn kənvˈɜɹTᵻd bˈæk tʊ ˈɔdiO wɪð ˈɪnvˌɜɹs ˌɛfˌɛftˈi.

longer.mp4

Not sure how to pass punctuation yet. Or even if this model supports it.
#10784 (comment)
nˌɑt ʃˈʊɹ hˌW tə pˈæs pˌʌŋkʧəwˈAʃən jˈɛt. ˌɔɹ ˈivən ɪf ðɪs mˈɑdᵊl səpˈɔɹts ɪt.

punctuation.mp4

@namhkoh
Copy link

namhkoh commented Feb 10, 2025

@hexgrad are those reprods with a C++ implementation?

@hexgrad
Copy link

hexgrad commented Feb 10, 2025

@namhkoh No, it's Python & PyTorch, as I mentioned #11050 (comment)

These can be reproduced at https://hf.co/spaces/hexgrad/Kokoro-TTS without installing anything.

@logikstate
Copy link

There is an onnx/c# implimentation of Kokoro here https://github.com/Lyrcaxis/KokoroSharp

But I think? (not sure) its using espeak as the phonemiser? which is different? to how the Python & Pytorch version works? That use G2P?

Am I correct here? @hexgrad ?

@namhkoh
Copy link

namhkoh commented Feb 14, 2025

I am currently seeking a c++ implementation.

@hexgrad
Copy link

hexgrad commented Feb 14, 2025

You need G2P to make the whole thing work, but llama.cpp can probably disregard that piece for now—the c++ scope for llama.cpp would likely just be porting the modeling code in these 3 files:

  1. https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/model.py
  2. https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/modules.py
  3. https://github.com/hexgrad/kokoro/blob/1145c0b7f6f3c781d35b1b67a283a32580bc5acd/kokoro/istftnet.py

@csukuangfj
Copy link

I am currently seeking a c++ implementation.

@namhkoh

We supported kokoro in sherpa-onnx a long time ago.

It provides not only C++ APIs for Kokoro v0.19 and Kokoro 1.0, but it also supports 11 other programming languages, e.g.,
C, Java, Kotlin, Swift, Dart, C#, Go, JavaScript, Object Pascal, Python.

You can find the usage doc at
https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/kokoro.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests