Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[misc] Add Torch profiler support #7451

Merged
merged 5 commits into from
Aug 21, 2024

Conversation

SolitaryThinker
Copy link
Contributor

@SolitaryThinker SolitaryThinker commented Aug 13, 2024

Utility PR to add torch profiler support to vllm worker, openai client, and benchmark_serving.py.
Enabled and configured through env vars:
VLLM_TORCH_PROFILER_DIR=/mnt/traces/

Will be useful for performance tuning and save me from always manually patching it in for #7000.

For example of what trace looks like see #6854

Traces can be viewed at https://ui.perfetto.dev/ and no need to untar.

Example commands:

VLLM_TORCH_PROFILER_DIR=/mnt/traces/ python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B 
python benchmarks/benchmark_serving.py --backend vllm --model meta-llama/Meta-Llama-3-70B --dataset-name sharegpt --dataset-path sharegpt.json --profile --num-prompts 2

cc @Yard1 @comaniac

@SolitaryThinker
Copy link
Contributor Author

/ready

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 13, 2024
@rkooo567 rkooo567 self-requested a review August 13, 2024 00:49
Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. This is super useful!

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved
vllm/envs.py Outdated Show resolved Hide resolved
vllm/worker/worker.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved
vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved
vllm/envs.py Outdated Show resolved Hide resolved
vllm/worker/worker.py Outdated Show resolved Hide resolved
@SolitaryThinker
Copy link
Contributor Author

added docs

Profiling vLLM
=================================

We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to use a more common path as an example, such as $HOME/traces/ or /tmp/traces

docs/source/index.rst Show resolved Hide resolved

We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/``

The OpenAI server also needs to be started with the ``VLLM_TORCH_PROFILER_DIR`` environment variable set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest cover offline batching as well. It should be even easier by setting the environment variable before creating the engine.

Copy link
Collaborator

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now, we have to manually trigger start/stop profile.

actually, why don't we add an env var like TORCH_PROFILER_NUM_REQUESTS or something. -1 by default (meaning it is infinite). If it is N, we trigger stop_profile after N requests? This could be useful with throughput benchmark (e.g., you can just do first N batch of requests)

@@ -224,9 +224,6 @@ async def async_request_openai_completions(
pbar: Optional[tqdm] = None,
) -> RequestFuncOutput:
api_url = request_func_input.api_url
assert api_url.endswith(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead of deleting it we add another condition for start_profile?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the naming of this function should be changed instead.

docs/source/dev/profiling/profiling_index.rst Show resolved Hide resolved
docs/source/dev/profiling/profiling_index.rst Show resolved Hide resolved
docs/source/dev/profiling/profiling_index.rst Show resolved Hide resolved
docs/source/dev/profiling/profiling_index.rst Show resolved Hide resolved
docs/source/index.rst Show resolved Hide resolved
vllm/engine/protocol.py Outdated Show resolved Hide resolved
@youkaichao
Copy link
Member

this might be a late review, but do you think it is possible to move this as a plugin?

@comaniac
Copy link
Collaborator

this might be a late review, but do you think it is possible to move this as a plugin?

It might be a good idea. Is there related document/resource?

@youkaichao
Copy link
Member

plugin system just landed in #7426 . docs will come later, but you can be early users.

@sfc-gh-mkeralapura
Copy link
Contributor

@SolitaryThinker What is your plan on this ? Do you already have this setup as a plugin in a different PR ? I was looking to build on this - hence checking.

@comaniac
Copy link
Collaborator

I'll take a look at plugin system and update this PR today.

Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trying the plugin system, I feel this is not a good timing to plugin this feature because the plugin system currently doesn't support the following reasons:

  1. This feature needs to extend API server with new APIs.
  2. This feature needs to extend engine APIs.

With these limitations, the plugin version of this PR is more like a hacky patch, which I don't think is a good practice for "plugins". Ideally we should improve the plugin design to have various of plugin APIs such as plugin.plugin_api_server(...) to clearly define the plugin scope with proper "plugin" behaviors.

Accordingly, this PR should be good to go. cc @rkooo567 @sfc-gh-mkeralapura @youkaichao

@@ -224,9 +224,6 @@ async def async_request_openai_completions(
pbar: Optional[tqdm] = None,
) -> RequestFuncOutput:
api_url = request_func_input.api_url
assert api_url.endswith(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the naming of this function should be changed instead.

@SolitaryThinker SolitaryThinker force-pushed the torch_profiler branch 2 times, most recently from 5969fd4 to 18799de Compare August 21, 2024 03:27
SolitaryThinker and others added 5 commits August 21, 2024 11:30
format

remove print

Update vllm/worker/worker.py

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

comments

add docs

Update vllm/worker/worker.py

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Update vllm/entrypoints/openai/api_server.py

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Update benchmarks/benchmark_serving.py

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Update vllm/envs.py

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
@comaniac comaniac merged commit dd53c4b into vllm-project:main Aug 21, 2024
47 checks passed
@SolitaryThinker SolitaryThinker deleted the torch_profiler branch August 21, 2024 23:02
@DamonFool
Copy link
Contributor

Hi @SolitaryThinker , thanks for the great work.

How can we start/stop profiling for a real use case (not through benchmark_serving.py)?
Can you give us an example?
Thanks.

@DamonFool
Copy link
Contributor

Here is the follow up for CPU-only devices #7806.
Could you please review it?
Thanks.

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
@Shawn314
Copy link

I encounter this error as follow, do you know why?

INFO 08-27 03:15:55 api_server.py:322] Stopping profiler...
INFO 08-27 03:15:55 server.py:138] Stopping profiler...
INFO: ::1:56200 - "POST /stop_profile HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/api_server.py", line 323, in stop_profile
await async_engine_client.stop_profile()
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/rpc/client.py", line 451, in stop_profile
await self._send_one_way_rpc_request(
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/rpc/client.py", line 257, in _send_one_way_rpc_request
response = await do_rpc_call(socket, request)
File "/mnt/project/skyllm/fangxiao/vllm/vllm/entrypoints/openai/rpc/client.py", line 249, in do_rpc_call
raise TimeoutError("Server didn't reply within "
TimeoutError: Server didn't reply within 5000 ms

@sfc-gh-mkeralapura
Copy link
Contributor

To stop the profiler - it flushes out all the profile trace files to the directory. This takes time, for example for about 100 requests worth of data for a llama 70b, it takes about 10 minutes to flush out on a H100.

Set the env variable VLLM_RPC_GET_DATA_TIMEOUT_MS to a big number before you start the server. Say something like 30 minutes. This theoretically affects the regular requests too, but the profiling works with it.

@SolitaryThinker
Copy link
Contributor Author

@sfc-gh-mkeralapura thanks, super useful tip, will add to the docs

@DamonFool You would need a client that can ping the /start_profile and /stop_profile path. I will be working on a PR for offline engine and that should be easier to use.

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Signed-off-by: Alvant <alvasian@yandex.ru>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants