Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
GptSession
without OpenMPI Run GptSession without openmpi? #1220executor
API, see documentation and examples inexamples/bindings
examples/gpt/README.md
for the latest commandsexamples/qwen/README.md
for the latest commands.trtllm-build
command, to generalize the feature better to more models.trtllm-build --max_prompt_embedding_table_size
instead.trtllm-build --world_size
flag to--auto_parallel
flag, the option is used for auto parallel planner only.AsyncLLMEngine
is removed,tensorrt_llm.GenerationExecutor
class is refactored to work with both explicitly launching withmpirun
in the application level, and accept an MPI communicator created bympi4py
examples/server
are removed, seeexamples/app
instead.SamplingConfig
tensors inModelRunnerCpp
ModelRunnerCpp
does not transferSamplingConfig
Tensor fields correctly #1183examples/run.py
only load one line from--input_file
benchmarks/cpp/README.md
nvcr.io/nvidia/pytorch:24.02-py3
nvcr.io/nvidia/tritonserver:24.02-py3
executor
API, seedocs/source/executor.md