Gemini Live

Exploring Gemini's Multimodal Live API.

script.py: A script to interact with the API.
app.py: A Flask app to interact with the API.

Installing (TODO)

Requirements.txt file
API Key

Voice activity detection

The model automatically performs voice activity detection. VAD is always enabled and is not configurable. This provides the opportunity for a natural flowing conversation but in practice is problematic if you are using a speaker and a microphone - the audio feedback stops the model itself. It also seems to be problematic in a noisy ambient environment. Wearing headphones at the moment seems like the only viable approach.

What I have done is change the script so that when the model is speaking, no input audio is sent.

Session duration

Sessions duration is limited to 15 minutes for audio. When the session duration exceeds the limit, the connection is terminated. 3 concurrent sessions per API key are allowed.

Video input

Video and audio input is limited to 2 minutes.

Things to look into

How to handle keyboard interrupt nicely?

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Live

Installing (TODO)

Voice activity detection

Session duration

Video input

Things to look into

About

Releases

Packages

Languages

mikeesto-dump/gemini-live

Folders and files

Latest commit

History

Repository files navigation

Gemini Live

Installing (TODO)

Voice activity detection

Session duration

Video input

Things to look into

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages