-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[misc] Add Torch profiler support #7451
Conversation
/ready |
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM. This is super useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
added docs |
Profiling vLLM | ||
================================= | ||
|
||
We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use a more common path as an example, such as $HOME/traces/
or /tmp/traces
|
||
We support tracing vLLM workers using the ``torch.profiler`` module. You can enable tracing by setting the ``VLLM_TORCH_PROFILER_DIR`` environment variable to the directory where you want to save the traces: ``VLLM_TORCH_PROFILER_DIR=/mnt/traces/`` | ||
|
||
The OpenAI server also needs to be started with the ``VLLM_TORCH_PROFILER_DIR`` environment variable set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest cover offline batching as well. It should be even easier by setting the environment variable before creating the engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, we have to manually trigger start/stop profile.
actually, why don't we add an env var like TORCH_PROFILER_NUM_REQUESTS or something. -1 by default (meaning it is infinite). If it is N, we trigger stop_profile after N requests? This could be useful with throughput benchmark (e.g., you can just do first N batch of requests)
@@ -224,9 +224,6 @@ async def async_request_openai_completions( | |||
pbar: Optional[tqdm] = None, | |||
) -> RequestFuncOutput: | |||
api_url = request_func_input.api_url | |||
assert api_url.endswith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead of deleting it we add another condition for start_profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the naming of this function should be changed instead.
this might be a late review, but do you think it is possible to move this as a plugin? |
It might be a good idea. Is there related document/resource? |
plugin system just landed in #7426 . docs will come later, but you can be early users. |
@SolitaryThinker What is your plan on this ? Do you already have this setup as a plugin in a different PR ? I was looking to build on this - hence checking. |
I'll take a look at plugin system and update this PR today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After trying the plugin system, I feel this is not a good timing to plugin this feature because the plugin system currently doesn't support the following reasons:
- This feature needs to extend API server with new APIs.
- This feature needs to extend engine APIs.
With these limitations, the plugin version of this PR is more like a hacky patch, which I don't think is a good practice for "plugins". Ideally we should improve the plugin design to have various of plugin APIs such as plugin.plugin_api_server(...)
to clearly define the plugin scope with proper "plugin" behaviors.
Accordingly, this PR should be good to go. cc @rkooo567 @sfc-gh-mkeralapura @youkaichao
@@ -224,9 +224,6 @@ async def async_request_openai_completions( | |||
pbar: Optional[tqdm] = None, | |||
) -> RequestFuncOutput: | |||
api_url = request_func_input.api_url | |||
assert api_url.endswith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the naming of this function should be changed instead.
5969fd4
to
18799de
Compare
format remove print Update vllm/worker/worker.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> comments add docs Update vllm/worker/worker.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Update vllm/entrypoints/openai/api_server.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Update benchmarks/benchmark_serving.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Update vllm/envs.py Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
18799de
to
d754eda
Compare
Hi @SolitaryThinker , thanks for the great work. How can we start/stop profiling for a real use case (not through benchmark_serving.py)? |
Here is the follow up for CPU-only devices #7806. |
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
I encounter this error as follow, do you know why? INFO 08-27 03:15:55 api_server.py:322] Stopping profiler... |
To stop the profiler - it flushes out all the profile trace files to the directory. This takes time, for example for about 100 requests worth of data for a llama 70b, it takes about 10 minutes to flush out on a H100. Set the env variable VLLM_RPC_GET_DATA_TIMEOUT_MS to a big number before you start the server. Say something like 30 minutes. This theoretically affects the regular requests too, but the profiling works with it. |
@sfc-gh-mkeralapura thanks, super useful tip, will add to the docs @DamonFool You would need a client that can ping the |
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Alvant <alvasian@yandex.ru>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Utility PR to add torch profiler support to vllm worker, openai client, and benchmark_serving.py.
Enabled and configured through env vars:
VLLM_TORCH_PROFILER_DIR=/mnt/traces/
Will be useful for performance tuning and save me from always manually patching it in for #7000.
For example of what trace looks like see #6854
Traces can be viewed at https://ui.perfetto.dev/ and no need to untar.
Example commands:
cc @Yard1 @comaniac