-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] triton 22.04 crashes under heavy load from Morpheus+Kafka #259
Comments
@pdmack Can you test whether reducing the |
Reproduced with triton 22.06-py3 also. |
Reducing number of threads from 4 to 2 seems to stabilize it. Dropping the batch size alone from 8192 to 2048 had no improvement. But inference stage infrequently updates with the bottleneck. |
Update: triton crashed with 2 threads, batch size = 2048, use_cpp=False |
out of date |
Describe the bug
In testing #257, a large volume of jsonlines messages via Kafka can trigger an abort in triton on a 4xT4 (16Gb). Possibly a contention/exhaustion of GPU memory.
triton:
Morpheus CLI:
Steps/Code to reproduce bug
Launch CLI:
Then load a 25x replica of current pcap_dump.jsonlines into a single Kafka input topic (e.g., morpheus-input) for consumption by Morpheus.
Expected behavior
CLI and Triton are able to sustain Kafka stream load for inference.
Environment overview (please complete the following information)
Environment details
https://gist.github.com/pdmack/5ff438cc99105577b41f4c0c41f7131a
Additional context
nvcr.io/nvidia/tritonserver:22.04-py3
The text was updated successfully, but these errors were encountered: