-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include in Readme how to Pass Custom Arguments to llama_cpp.server
in Docker
#1029
Comments
you could try my container: i implemented all supported options to an env variable. |
Thanks for ur attention. So I tried the Docker but the GPU isn't being activated even though the uvicorn server starting. This is my Docker run:
Does the Docker image run Cuda acceleration by default or I have to do some other thing? Also would you know which parameter to adjust should I wish to handle many concurrent requests through the server? i understand that for the llama cpp server it's done by ngl: ggerganov/llama.cpp#3228. Thanks for your advice! |
Unfortunately, this alpine based image is built with these CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF". An image optimized for GPUs with cuda would need a different base image anyway. Unfortunately I don't have an Nvidia GPU myself and therefore can't test or deploy anything. But maybe i can have a look activating gpu support (without cuda). But then llama-cpp-python needs to be recompiled after image creation. or i could deploy i on another tag. need to think about that. I have also not yet dealt with your question about parallel requests. But I would also be very interested into that too. Sorry |
Thanks! Perhaps that's true with |
@jaredquekjz there are two options really
The benefit to using the default entrypoint and environment variables with the official image is that it includes a compiler and will rebuild the image for any cpu architecture you deploy it to ensuring that it's going to be as fast as or faster than pre-built binaries. |
this is pretty cool, are all the server arguments can be set via ENV variable? (all capitalized?) |
Title:
Issue with Passing Custom Arguments to
llama_cpp.server
in DockerIssue Description:
Hello
abetlen
,I've been trying to use your Docker image
ghcr.io/abetlen/llama-cpp-python:v0.2.24
forllama_cpp.server
, and I encountered some difficulties when attempting to pass custom arguments (--n_gpu_layers 81
,--chat_format chatml
,--use_mlock False
) to the server through Docker.Steps to Reproduce:
Pull the Docker image:
docker pull ghcr.io/abetlen/llama-cpp-python:v0.2.24
Run the container with custom arguments:
This results in an error:
Error: No such option: --n_gpu_layers
.Expected Behavior:
I expected to be able to pass these arguments to the
llama_cpp.server
application inside the Docker container.Actual Behavior:
The
uvicorn
command does not recognize these arguments as it's designed for the ASGI server, not thellama_cpp.server
application.Potential Solutions:
I would appreciate any assistance or guidance you could provide on this issue.
Thank you for your time and for maintaining this project.
Best regards.
The text was updated successfully, but these errors were encountered: