Welcome to Talk To Me Goose! This tool streams your LLM's (Large Language Model) output directly into your TTS (Text-to-Speech) engine, delivering real-time responses without the long waits. Perfect for enhancing your Home Assistant Voice devices with long text-to-audio outputs.
Demos:
What's inside:
- What is the problem?
- Our approach
- Supported LLM/TTS engines
- Getting started
- Usage
- Known issues
- Change log
- [For nerds] Endpoints
- [For nerds] General flow
Home Assistant Voice Preview Edition (HAVPE) devices are great, but they're not so good at handling long responses. In fact, they might just fail silently when asked for anything too lengthy. If your HAVPE device sometimes goes quiet after your command (but works fine on repeat or with other commands), this is probably why.
Imagine you ask your LLM to tell you a story about Home Assistant:
- You send the request to the LLM.
- The LLM takes its sweet time - let's say 10 seconds - to generate a response.
- Silence... because there's a 5-second timeout in HAVPE devices, and no response ever comes. In your HAVPE device log you will see
HTTP_CLIENT: Connection timed out before data was ready!
.
Even if you manually increase the timeout, you'll still be left waiting for 15 seconds: 10 for the LLM, plus an extra 5 for TTS processing.
Talk To Me Goose streams your LLM’s output directly to the TTS engine. This means no more waiting around for the full response to be generated before your device starts talking back. With this approach, responses are streamed token-by-token, so you get audio while the text is still being generated. It looks like this:
- You send a request to the LLM.
- The LLM responds token by token, which is passed to the TTS engine as soon as it hits a complete sentence.
- The audio begins almost immediately, and as more sentences arrive, they are added in real-time.
The result: fast responses with no waiting, even for long stories! Tool calls, message history and local conversation agent's responses are supported, too!
LLMs:
- OpenAI
TTS engines:
- OpenAI
- Google Cloud
- ElevenLabs
- Wyoming-piper
- Check the Known issues.
- Install ffmpeg - required for audio stream conversion.
- Install ESPHome Device Builder - you can do this as a Home Assistant addon or separately, whichever you prefer.
- Adopt your HAVPE device in ESPHome Device Builder (note: it will break your automatic update notifications).
Now you can set everything up in 5 easy(-ish) steps! Follow the steps below to:
- Install TTMG Server
- Configure TTMG Server
- Install Home Assistant integration
- Configure Home Assistant to work with TTMG
- Configure your Home Assistant Voice Preview Edition (HAVPE) device(s) to work with TTMG
- Clone the repo.
- Run the setup script with
./setup.sh
for Unix-based systems orstart.bat
for Windows. It will create a virtual environment, install the required dependencies and createconfiguration.json
to store your settings. - Edit
configuration.json
and add your OpenAI API key to themain
section (get it here). This is the only required parameter, but there are additional optional settings you can further configure - see the next step.
General settings
General settings go under "main"
in the configuration.json
. You need to provide only the "openai_api_key"
, the rest have default values.
{
"main":{
"openai_api_key": <your-openai-api-key>,
"llm_model": <model-to-use>, # https://platform.openai.com/docs/models, TTMG Conversation will overwrite it
"llm_system_prompt": <system-prompt>, # Default system prompt that is applied to all requests. TTMG Conversation will overwrite it, so you usually don't need it
"max_completion_tokens": 400, # Additional model settings, TTMG Conversation will overwrite it
"temperature": 1.0, # Additional model settings, TTMG Conversation will overwrite it
"top_p": 1.0, # Additional model settings, TTMG Conversation will overwrite it
"tts_engine": <selected-tts-engine>, # Selected TTS engine
"host": <service-host>, # Server host
"port": <service-port> # Server port
}
}
TTS settings - OpenAI (Default)
This engine is enabled by default. If you have already provided your OpenAI API key in the "main"
section, you are good to go!
{
"main": {
"tts_engine": "openai",
...
},
"openai": {
<optional parameters>
}
}
You can pass additional parameters in your configuration.json
, see configuration_examples/configuration_openai.json
for all supported options.
TTS settings - Google Cloud
- First you need to obtain a service account file from Google Cloud. Follow these instructions, you need only text-to-speech.
- Change
tts_engine
togoogle_cloud
in yourconfiguration.json
. - Add Google Cloud settings to the
configuration.json
. Only the"credentials_path"
is required, the rest have default values:
{
"main": {
"tts_engine": "google_cloud",
...
},
"google_cloud": {
"credentials_path": "creds.json"
}
}
You can pass additional parameters in your configuration.json
, see configuration_examples/configuration_google_cloud.json
for all supported options.
TTS settings - Elevenlabs
- First you need to obtain an API key from ElevenLabs. Get it here.
- Change
tts_engine
toelevenlabs
in yourconfiguration.json
. - Add ElevenLabs settings to the
configuration.json
. Only the"api_key"
is required, the rest have default values:
{
"main": {
"tts_engine": "elevenlabs",
...
},
"elevenlabs": {
"api_key": "<your-elevenlabs-api-key>"
}
}
You can pass additional parameters in your configuration.json
, see configuration_examples/configuration_elevenlabs.json
for all supported options.
TTS settings - Wyoming-piper
- Change
tts_engine
topiper
in yourconfiguration.json
. - Add wyoming-piper settings to the
configuration.json
. All have default values (see below):
{
"main": {
"tts_engine": "piper",
...
},
"piper": {
"host": "127.0.0.1",
"port": "10300",
"voice_name": "en_GB-alan-medium"
}
}
You can pass additional parameters in your configuration.json
, see configuration_examples/configuration_piper.json
for all supported options.
Follow the instructions to install TTMG Conversation integration for Home Assistant. You will need to provide the url to your TTMG Server in the format of http://<TTMG-Server-ip->:<port>
.
Once you have installed TTMG Conversation integration, then in Home Assistant:
- Go to Settings => Voice Assistant
- Select the assistant you want to use with TTMG
- Set the Conversation agent and Text-to-speech to TTMG components
Now this assistant is talking to the TTMG Server! You can try to start a conversation in HASS with it and you will get back "Processing your request, please wait..." - this is OK! Unfortunately, integrating real-time streaming breaks this, but no worries: your HAVPE devices will work great - set them up in the next step!
The final step: let's get your HAVPE devices talking to the TTMG Server.
Each HAVPE device needs its own configuration. Don't worry, it's a one-time thing:
-
Get your
device_id
from home assistant. Go to Automations, create a new one, select "device" as a trigger, choose your HAVPE device and then switch to the yaml view to get it. Don't save the automation, you just need to find out thedevice_id
. -
Run
python generate_esphome_config.py
. It will ask you for thedevice_id
and the TTMG Server's host and port. It will also ask you for the preferred audio format - flac (default) or mp3 (faster, experimental). It will then grab the latest HAVPE config from the official repo and will apply two changes:- Increase the default timeout to 15s (not really needed, but a good safety net in case LLM takes a long time to start generating a response for some reason)
- Make it always fetch
/play/{client_id}.flac
(or/play/{client_id}.mp3
) from the TTMG Server instead of TTS responses. WARNING: this means that using any other assistants with this HAVPE device would fail since it will only talk to the TTMG Server.
-
The script will output a path to the folder with patched components for your particular HAVPE device.
-
Add the following code block to your device's yaml configuration in ESPHome Device Builder to apply the changes. To revert, simply remove/comment this block:
external_components: - source: type: local path: <output of tools/generate_esphome_config.py>
If you want to use the mp3 audio format, you also need to add this block to your config:
media_player: - platform: speaker id: !extend external_media_player announcement_pipeline: speaker: announcement_resampling_speaker format: MP3
-
Install the modified config and you are done!
Run the main script with ./start.sh
for Unix-based systems or start.bat
for Windows. It will start a small API server (at http://0.0.0.0:8888 using the default settings). See an example of a Unix systemd service to make it run automatically.
Now ask your HAVPE device to tell you a long story and see how it goes!
- Breaking Changes
-
The custom firmware overwrites TTS URLs, so you can't use other TTS engines with the HAVPE voice assistants. Using
tts.speak
action will still work with other TTS engines, though. See here for more details.Solution: Use the TTMG TTS for voice assistant settings, use native TTS integrations for things like
tts.speak
. -
TTMG integration doesn't play well with HASS web conversations (not fixable until HASS natively supports text and audio streaming).
Solution: Do not use it with the web assistant! ¯\(ツ)/¯
-
Debugging in HASS won’t show meaningful data since everything is handled by the TTMG Server.
Solution: Check the TTMG Server logs for some basic debug info.
-
Sentence Parsing:-
The sentence-splitting logic isn't perfect. It might split “What a nice day, Mr. Smith!” into “What a nice day, Mr.” and “Smith!” causing awkward pauses.Solution: Fixed in v1.0.3.
-
- Context-Awareness:
- TTS engines have no idea about the context because we send one sentence at a time. This sometimes affects intonation and/or pauses between sentences, but in my testing (mostly with Google Cloud TTS) the difference is negligible.
Added
- You can now choose to skip re-encoding to flac and stream MP3 data from the TTS engines directly into your HAVPE devices. To use MP3, you will have to generate & install a new HAVPE config with the updated
tools/generate_esphome_config.py
. Upside: now the responses start streaming a bit faster, I get 3-4 seconds on average before the audio starts playing (gpt-4o-mini + Google Cloud TTS). Downside: HAVPE by default uses FLAC since it is less demanding, so you tell me if you notice any issues!
Added
- New sentence parser! Just as fast as the old one, but handles titles, abbreviations and other tricky stuff correctly. No more awkward pauses!
- Updated
tools/generate_esphome_config.py
to support the latest HAVPE firmware and esphome 2025.2.0.
Fixed
- Windows compatibility (does not use
/dev/null
anymore)
Added
- Local conversation agent's responses are now working too! Make sure to update your TTMG Home Assistant integration!
Changed
- Minor code refactoring for readability
- Updated README.md
Changed
- Complete rewrite of the app to support native streaming to HAVPE devices, see the new README.md!
Added
- Support for Wyoming-piper TTS engine
Added
- Client store so we can support multiple HAVPE devices
- Per client event blockers
Added
- Flac support so we can play the output natively on HAVPE
Added
/tts
endpoint to generate audio from the TTS engine based on the provided text (skipping the LLM step)- Better logs
Added
- ElevenLabs TTS engine
- Initial commit
Most of the endpoints are used by TTMG components and usually you don't need to call them directly (marked as [Internal]). Instead use [External] ones for the best experience.
- [Internal]
/preload/{client_id}
(POST) - Accepts JSON {"messages", "tools", "model", "max_completion_tokens", "top_p" and "temperature"}. Used by TTMG Conversation and by internal TTMG Server logic. It is not recommended to call it unless you know what you are doing. - [Internal]
/history/{client_id}
(GET) - Returns LLM conversation history for a client. - [Internal]
/write_history/{client_id}
(POST) - Writes LLM conversation history for a client. - [External]
/preload-text/{client_id}
(POST) - Accepts JSON {"text": "..."}. Passes the text to/tts_say
endpoint. Use it if you want to send a long text directly to the TTS engine. - [External]
/tts_say/{client_id}.flac
(GET) - Triggers TTS and returns flac audio for a text preloaded via/preload-text/{client_id}
. Will not call your LLM, it will just announce the text. - [Internal/External]
/play/{client_id}.flac
(GET) - Triggers LLM+TTS and returns flac audio.- When the prompt and model settings were preloaded via
/preload/{client_id}
, it acts in an [Internal] mode and runs the LLM-TTS pipeline. - When the text was preloaded via
/preload-text/ttmg_tts
, it acts in an [Internal] mode and runs the TTS pipeline directly, skipping the LLM step (used for announcing local agent responses, for example) - When called directly with a prompt like
/play/{does-not-matter}.flac?prompt=Tell+me+a+story+about+home+assistant
, uses that prompt and llm settings from yourconfiguration.json
.
- When the prompt and model settings were preloaded via