[misc] Add Torch profiler support #7451

SolitaryThinker · 2024-08-13T00:31:00Z

Utility PR to add torch profiler support to vllm worker, openai client, and benchmark_serving.py.
Enabled and configured through env vars:
VLLM_TORCH_PROFILER_DIR=/mnt/traces/

Will be useful for performance tuning and save me from always manually patching it in for #7000.

For example of what trace looks like see #6854

Traces can be viewed at https://ui.perfetto.dev/ and no need to untar.

Example commands:

VLLM_TORCH_PROFILER_DIR=/mnt/traces/ python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B

python benchmarks/benchmark_serving.py --backend vllm --model meta-llama/Meta-Llama-3-70B --dataset-name sharegpt --dataset-path sharegpt.json --profile --num-prompts 2

cc @Yard1 @comaniac

SolitaryThinker · 2024-08-13T00:31:10Z

/ready

github-actions · 2024-08-13T00:31:14Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

comaniac

Otherwise LGTM. This is super useful!

vllm/entrypoints/openai/api_server.py

vllm/envs.py

vllm/worker/worker.py

comaniac

LGTM!

benchmarks/benchmark_serving.py

vllm/entrypoints/openai/api_server.py

vllm/envs.py

vllm/worker/worker.py

SolitaryThinker · 2024-08-13T05:28:08Z

added docs

comaniac · 2024-08-13T05:45:00Z

docs/source/dev/profiling/profiling_index.rst

+Profiling vLLM 
+=================================
+
+We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/``


Suggest to use a more common path as an example, such as $HOME/traces/ or /tmp/traces

docs/source/index.rst

comaniac · 2024-08-13T05:51:35Z

docs/source/dev/profiling/profiling_index.rst

+
+We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/``
+
+The OpenAI server also needs to be started with the ``VLLM_TORCH_PROFILER_DIR`` environment variable set.


I'd suggest cover offline batching as well. It should be even easier by setting the environment variable before creating the engine.

rkooo567

right now, we have to manually trigger start/stop profile.

actually, why don't we add an env var like TORCH_PROFILER_NUM_REQUESTS or something. -1 by default (meaning it is infinite). If it is N, we trigger stop_profile after N requests? This could be useful with throughput benchmark (e.g., you can just do first N batch of requests)

rkooo567 · 2024-08-13T07:32:25Z

benchmarks/backend_request_func.py

@@ -224,9 +224,6 @@ async def async_request_openai_completions(
    pbar: Optional[tqdm] = None,
 ) -> RequestFuncOutput:
    api_url = request_func_input.api_url
-    assert api_url.endswith(


maybe instead of deleting it we add another condition for start_profile?

I feel the naming of this function should be changed instead.

docs/source/dev/profiling/profiling_index.rst

docs/source/index.rst

vllm/engine/protocol.py

vllm/worker/worker.py

youkaichao · 2024-08-14T16:59:54Z

this might be a late review, but do you think it is possible to move this as a plugin?

comaniac · 2024-08-14T17:06:45Z

this might be a late review, but do you think it is possible to move this as a plugin?

It might be a good idea. Is there related document/resource?

youkaichao · 2024-08-14T17:10:11Z

plugin system just landed in #7426 . docs will come later, but you can be early users.

sfc-gh-mkeralapura · 2024-08-19T15:52:20Z

@SolitaryThinker What is your plan on this ? Do you already have this setup as a plugin in a different PR ? I was looking to build on this - hence checking.

comaniac · 2024-08-19T17:02:54Z

I'll take a look at plugin system and update this PR today.

comaniac

After trying the plugin system, I feel this is not a good timing to plugin this feature because the plugin system currently doesn't support the following reasons:

This feature needs to extend API server with new APIs.
This feature needs to extend engine APIs.

With these limitations, the plugin version of this PR is more like a hacky patch, which I don't think is a good practice for "plugins". Ideally we should improve the plugin design to have various of plugin APIs such as plugin.plugin_api_server(...) to clearly define the plugin scope with proper "plugin" behaviors.

Accordingly, this PR should be good to go. cc @rkooo567 @sfc-gh-mkeralapura @youkaichao

comaniac · 2024-08-19T19:11:55Z

benchmarks/backend_request_func.py

@@ -224,9 +224,6 @@ async def async_request_openai_completions(
    pbar: Optional[tqdm] = None,
 ) -> RequestFuncOutput:
    api_url = request_func_input.api_url
-    assert api_url.endswith(


I feel the naming of this function should be changed instead.

format remove print Update vllm/worker/worker.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> comments add docs Update vllm/worker/worker.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Update vllm/entrypoints/openai/api_server.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Update benchmarks/benchmark_serving.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Update vllm/envs.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

DamonFool · 2024-08-22T08:22:20Z

Hi @SolitaryThinker , thanks for the great work.

How can we start/stop profiling for a real use case (not through benchmark_serving.py)?
Can you give us an example?
Thanks.

DamonFool · 2024-08-23T03:52:22Z

Here is the follow up for CPU-only devices #7806.
Could you please review it?
Thanks.

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Shawn314 · 2024-08-27T03:19:14Z

I encounter this error as follow, do you know why?

INFO 08-27 03:15:55 api_server.py:322] Stopping profiler...
INFO 08-27 03:15:55 server.py:138] Stopping profiler...
INFO: ::1:56200 - "POST /stop_profile HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/api_server.py", line 323, in stop_profile
await async_engine_client.stop_profile()
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/rpc/client.py", line 451, in stop_profile
await self._send_one_way_rpc_request(
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/rpc/client.py", line 257, in _send_one_way_rpc_request
response = await do_rpc_call(socket, request)
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/rpc/client.py", line 249, in do_rpc_call
raise TimeoutError("Server didn't reply within "
TimeoutError: Server didn't reply within 5000 ms

sfc-gh-mkeralapura · 2024-08-27T14:48:46Z

To stop the profiler - it flushes out all the profile trace files to the directory. This takes time, for example for about 100 requests worth of data for a llama 70b, it takes about 10 minutes to flush out on a H100.

Set the env variable VLLM_RPC_GET_DATA_TIMEOUT_MS to a big number before you start the server. Say something like 30 minutes. This theoretically affects the regular requests too, but the profiling works with it.

SolitaryThinker · 2024-08-28T06:11:32Z

@sfc-gh-mkeralapura thanks, super useful tip, will add to the docs

@DamonFool You would need a client that can ping the /start_profile and /stop_profile path. I will be working on a PR for offline engine and that should be easier to use.

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Alvant <alvasian@yandex.ru>

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 13, 2024

rkooo567 self-requested a review August 13, 2024 00:49

comaniac reviewed Aug 13, 2024

View reviewed changes

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

vllm/envs.py Outdated Show resolved Hide resolved

vllm/worker/worker.py Outdated Show resolved Hide resolved

comaniac approved these changes Aug 13, 2024

View reviewed changes

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

vllm/envs.py Outdated Show resolved Hide resolved

vllm/worker/worker.py Outdated Show resolved Hide resolved

comaniac reviewed Aug 13, 2024

View reviewed changes

rkooo567 reviewed Aug 13, 2024

View reviewed changes

SolitaryThinker commented Aug 13, 2024

View reviewed changes

vllm/worker/worker.py Show resolved Hide resolved

rkooo567 approved these changes Aug 14, 2024

View reviewed changes

comaniac reviewed Aug 19, 2024

View reviewed changes

SolitaryThinker force-pushed the torch_profiler branch 2 times, most recently from 5969fd4 to 18799de Compare August 21, 2024 03:27

SolitaryThinker and others added 5 commits August 21, 2024 11:30

format

6051386

add check back in

9c6ec46

format

5ed2127

Trigger CI

d754eda

SolitaryThinker force-pushed the torch_profiler branch from 18799de to d754eda Compare August 21, 2024 18:31

comaniac merged commit dd53c4b into vllm-project:main Aug 21, 2024
47 checks passed

SolitaryThinker deleted the torch_profiler branch August 21, 2024 23:02

DamonFool mentioned this pull request Aug 23, 2024

[misc] Add Torch profiler support for CPU-only devices #7806

Merged

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024

[misc] Add Torch profiler support (vllm-project#7451)

caa41ed

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

SolitaryThinker mentioned this pull request Aug 28, 2024

[misc] [doc] [frontend] LLM torch profiler support #7943

Merged

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[misc] Add Torch profiler support (vllm-project#7451)

a3d7058

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Alvant <alvasian@yandex.ru>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[misc] Add Torch profiler support (vllm-project#7451)

06aa974

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[misc] Add Torch profiler support #7451

[misc] Add Torch profiler support #7451

SolitaryThinker commented Aug 13, 2024 •

edited

Loading

SolitaryThinker commented Aug 13, 2024

github-actions bot commented Aug 13, 2024

comaniac left a comment

comaniac left a comment

SolitaryThinker commented Aug 13, 2024

comaniac Aug 13, 2024

comaniac Aug 13, 2024

rkooo567 left a comment

rkooo567 Aug 13, 2024

comaniac Aug 19, 2024

youkaichao commented Aug 14, 2024

comaniac commented Aug 14, 2024

youkaichao commented Aug 14, 2024

sfc-gh-mkeralapura commented Aug 19, 2024

comaniac commented Aug 19, 2024

comaniac left a comment

comaniac Aug 19, 2024

DamonFool commented Aug 22, 2024

DamonFool commented Aug 23, 2024

Shawn314 commented Aug 27, 2024

sfc-gh-mkeralapura commented Aug 27, 2024

SolitaryThinker commented Aug 28, 2024


		We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/``

		The OpenAI server also needs to be started with the ``VLLM_TORCH_PROFILER_DIR`` environment variable set.

[misc] Add Torch profiler support #7451

[misc] Add Torch profiler support #7451

Conversation

SolitaryThinker commented Aug 13, 2024 • edited Loading

SolitaryThinker commented Aug 13, 2024

github-actions bot commented Aug 13, 2024

comaniac left a comment

Choose a reason for hiding this comment

comaniac left a comment

Choose a reason for hiding this comment

SolitaryThinker commented Aug 13, 2024

comaniac Aug 13, 2024

Choose a reason for hiding this comment

comaniac Aug 13, 2024

Choose a reason for hiding this comment

rkooo567 left a comment

Choose a reason for hiding this comment

rkooo567 Aug 13, 2024

Choose a reason for hiding this comment

comaniac Aug 19, 2024

Choose a reason for hiding this comment

youkaichao commented Aug 14, 2024

comaniac commented Aug 14, 2024

youkaichao commented Aug 14, 2024

sfc-gh-mkeralapura commented Aug 19, 2024

comaniac commented Aug 19, 2024

comaniac left a comment

Choose a reason for hiding this comment

comaniac Aug 19, 2024

Choose a reason for hiding this comment

DamonFool commented Aug 22, 2024

DamonFool commented Aug 23, 2024

Shawn314 commented Aug 27, 2024

sfc-gh-mkeralapura commented Aug 27, 2024

SolitaryThinker commented Aug 28, 2024

SolitaryThinker commented Aug 13, 2024 •

edited

Loading