server: allow to override threads server pool with --threads-http #5794

phymbert · 2024-02-29T10:25:44Z

Motivation

The default httplib threads pool is set to std::thread::hardware_concurrency()
which is NOT always adapted for completions.
As far, completions tasks can be deferred, http server threads pool
can quickly be full by reaching the number of available host cpu cores
while actually continuous batching can accept more requests on the GPU device.

This fix allows the user to choose the balanced number of threads in the http server pool.

Changes

Introduce --threads-http N to override the default configuration.

examples/server/server.cpp

phymbert · 2024-03-01T08:13:57Z

@ggerganov Hi, as I got one approval, should I wait for yours ? thanks

ggerganov · 2024-03-01T08:39:57Z

If a PR touches just the server code, you don't need to wait for my approval.

Btw, isn't it better to reuse the --parallel parameter instead of introducing a new one?

phymbert · 2024-03-01T09:08:02Z

If a PR touches just the server code, you don't need to wait for my approval.

Noted thanks.

Btw, isn't it better to reuse the --parallel parameter instead of introducing a new one?

No there are different meaning, --parallel for number of sequences/slots on the llama backend, --threads-http for number of concurrent http requests on httplib (which can generate deferred tasks if all slots are processing).

…ml-org#5794)

phymbert requested review from ggerganov and ngxson February 29, 2024 10:25

ngxson reviewed Feb 29, 2024

View reviewed changes

examples/server/server.cpp Show resolved Hide resolved

ngxson reviewed Feb 29, 2024

View reviewed changes

examples/server/server.cpp Show resolved Hide resolved

server: allow to override threads server pool with --threads-http

a55e8fc

phymbert force-pushed the feature/server-http-threads branch from ccad425 to a55e8fc Compare February 29, 2024 10:36

ngxson self-requested a review February 29, 2024 11:29

ngxson approved these changes Feb 29, 2024

View reviewed changes

phymbert changed the title ~~server: allow to override threads server pool with --threads-server~~ server: allow to override threads server pool with --threads-http Feb 29, 2024

phymbert merged commit 5cb02b4 into master Mar 1, 2024
61 checks passed

phymbert deleted the feature/server-http-threads branch March 1, 2024 09:08

phymbert mentioned this pull request Mar 2, 2024

server: init server http requests threads pool with --parallel if set #5836

Merged

hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024

server: allow to override threads server pool with --threads-http (gg…

996a2ff

…ml-org#5794)

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

server: allow to override threads server pool with --threads-http (gg…

13568a0

…ml-org#5794)

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

server: allow to override threads server pool with --threads-http (gg…

fd89833

…ml-org#5794)

HanClinto mentioned this pull request Jun 5, 2024

Feature Request: Multi session chat support #7758

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: allow to override threads server pool with --threads-http #5794

server: allow to override threads server pool with --threads-http #5794

phymbert commented Feb 29, 2024 •

edited

Loading

phymbert commented Mar 1, 2024

ggerganov commented Mar 1, 2024

phymbert commented Mar 1, 2024

server: allow to override threads server pool with --threads-http #5794

server: allow to override threads server pool with --threads-http #5794

Conversation

phymbert commented Feb 29, 2024 • edited Loading

Motivation

Changes

phymbert commented Mar 1, 2024

ggerganov commented Mar 1, 2024

phymbert commented Mar 1, 2024

phymbert commented Feb 29, 2024 •

edited

Loading