[BUG]: C++ impl for Triton inference can incorrectly split inference inputs #680
Closed
2 tasks done
Labels
bug
Something isn't working
Version
23.03
Which installation method(s) does this occur on?
Docker, Conda, Source
Describe the bug.
The Triton inference stage often needs to split up the input based on the model's max batch size, which is quite often much smaller than the the number of rows in the message (
pipeline_batch_size
), and the input is broken up into what we call a "mini-batch".We can also have large input fields (typically variable length fields like text) which themselves are larger than the model can accept and need to be split up into multiple inference inputs, and then we perform a reduction on the multiple outputs to produce a single output for the row.
There are currently two related bugs, the first being common:
Minimum reproducible example
Relevant log output
No response
Full env printout
No response
Other/Misc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: