-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: allow to override threads server pool with --threads-http #5794
Conversation
ccad425
to
a55e8fc
Compare
@ggerganov Hi, as I got one approval, should I wait for yours ? thanks |
If a PR touches just the Btw, isn't it better to reuse the |
Noted thanks.
No there are different meaning, |
Motivation
The default httplib threads pool is set to
std::thread::hardware_concurrency()
which is NOT always adapted for completions.
As far, completions tasks can be deferred, http server threads pool
can quickly be full by reaching the number of available host cpu cores
while actually continuous batching can accept more requests on the GPU device.
This fix allows the user to choose the balanced number of threads in the http server pool.
Changes
Introduce
--threads-http N
to override the default configuration.