-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-bench : add support for the RPC backend #7435
Conversation
The $ bin/llama-bench -m ../../models/tinyllama-1b/ggml-model-f16.gguf --rpc localhost:50052 -rpc localhost:50053 it won't run two separate tests with $ bin/llama-bench -m ../../models/tinyllama-1b/ggml-model-f16.gguf --rpc localhost:50052,localhost:50053 Another thing that should be noted is that we re-load the model on every single run when using RPC. This is because we cannot free the RPC backend if we still have allocated RPC buffers. |
I don't think this is ok, it is preferable to take each
This really should be fixed in the RPC backend rather than requiring specific workarounds in applications that are just using the llama.cpp API. |
Fair enough, will fix this.
Resource management becomes tricky if we allow buffer objects to outlive the backend which created them. One would expect that all resources allocated by the backend (like sockets) should be freed/closed when |
In ggml-backend, the buffers are not tied to a backend instance. So it cannot be said that a backend created these objects, that's not what is happening. My suggestion would be to keep an internal pool of connections in |
Ok, I will implement this in a separate PR and update this one when ready |
this one is ready for review |
No description provided.