-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose openai api endpoints from vllm #112
Conversation
80a3a80
to
4a4396b
Compare
vllm_server_path = os.environ.get('VLLM_SERVER_PATH', 'vllm.entrypoints.openai.api_server') | ||
openai_api_server_port = int(os.environ.get('OPENAI_API_SERVER_PORT', 8003)) | ||
openai_api_base_url = os.environ.get('OPENAI_API_BASE_URL', f'http://localhost:{openai_api_server_port}') | ||
openai_api_server_port = int(os.environ.get('OPENAI_API_SERVER_PORT', app_port if use_vllm else 8003)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we get rid of this while we're here, and use the URL, defaulting to ollama's default port when not doing vllm?
sys.exit(1) | ||
|
||
log.info(f'Enabled modules: {modules}') | ||
|
||
if device == 'cuda' or is_mac: | ||
log.info('Using GPU') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how you solved this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We can get rid off the llama-cpp-server thing in a separate PR...
* master: (61 commits) deps: use pypi provided silero vad, upgrade to latest fix: remove public key validation (jitsi#123) fix: downgrade vllm (jitsi#122) feat: add fallback folder when looking up public keys (jitsi#119) fix: add ffmpeg dependency for pytorch ref: bypass queueing jobs with invalid payload (jitsi#121) fix: replace examplar usage with label for app_id feat: add instrumentation for app_id (jitsi#118) fix: re-enable vLLM multiprocessing (jitsi#116) fix: update incorrect prompt example fix: healthchecks failing due to missing internal id (jitsi#115) feat(openai-api) use Ollama for local development feat: expose openai api endpoints from vllm (jitsi#112) feat: update text hint type prompting (jitsi#111) feat: add meeting hint type and use it as default (jitsi#110) feat: enable requests batching (jitsi#109) metrics: add full duration metric metrics: add a skipped job status which will not count towards duration metrics fix: catch exceptions when echoing fails feat: add support for echoing requests (jitsi#107) ... # Conflicts: # Dockerfile # Makefile # requirements.txt
No description provided.