Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vram flushing support #256

Merged
merged 7 commits into from
Dec 16, 2024
Merged

Conversation

aidancrowther
Copy link
Contributor

@aidancrowther aidancrowther commented Dec 9, 2024

This change implements a monitoring thread to unload the whisper model after a user set timeout period. Timeout defaults to being disabled, and can be set with the IDLE_TIMEOUT environment variable.

Once unloaded some residual VRAM allocation appears to remain (~0.25GB), but memory usage remains consistent across reloads, leading me to believe that this is a limitation of Docker.

This closes #216 and closes #196

AngelOnFira and others added 2 commits December 9, 2024 11:25
Implement automatic VRAM clearing after a specified period of idleness.

* Add a mechanism to track the last activity time and implement a background thread to monitor idleness and clear VRAM after five minutes of inactivity in `app/faster_whisper/core.py` and `app/openai_whisper/core.py`.
* Update the `transcribe` and `language_detection` functions in both core files to reset the last activity time upon invocation.
* Add a function to fully release the model from memory using `del`, `torch.cuda.empty_cache()`, and `gc.collect()` in both core files.
* Add configuration options for the idleness timeout period and enabled/disabled state in the environment variables in `app/webservice.py`.
@ahmetoner ahmetoner merged commit 7d3e887 into ahmetoner:main Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flush VRAM when idle Possibility to unload/reload model from VRAM/RAM after IDLE timeout
3 participants