Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #1315

Merged
merged 1 commit into from
Mar 19, 2024
Merged

Update TensorRT-LLM #1315

merged 1 commit into from
Mar 19, 2024

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Mar 19, 2024

  • Features
    • Support run GptSession without OpenMPI Run GptSession without openmpi? #1220
    • Add Python bindings for new C++ executor API, see documentation and examples in examples/bindings
    • [BREAKING CHANGE] TopP sampling optimization with deterministic AIR TopP algorithm is enabled by default
  • API
    • [BREAKING CHANGE] Refactor GPT with unified building workflow, see examples/gpt/README.md for the latest commands
    • [BREAKING CHANGE] Refactored Qwen model to the unified build workflow, see examples/qwen/README.md for the latest commands.
    • [BREAKING CHANGE] Roved all the lora related flags from convert_checkpoint.py script and the checkpoint content to trtllm-build command, to generalize the feature better to more models.
    • [BREAKING CHANGE] Removed the use_prompt_tuning flag and options from convert_checkpoint.py script and the checkpoint content, to generalize the feature better to more models. Use the trtllm-build --max_prompt_embedding_table_size instead.
    • [BREAKING CHANGE] Changed the trtllm-build --world_size flag to --auto_parallel flag, the option is used for auto parallel planner only.
    • [BREAKING CHANGE] AsyncLLMEngine is removed, tensorrt_llm.GenerationExecutor class is refactored to work with both explicitly launching with mpirun in the application level, and accept an MPI communicator created by mpi4py
    • [BREAKING CHANGE] examples/server are removed, see examples/app instead.
  • Bug fixes
  • Benchmark
    • Support arbitrary dataset from HuggingFace for C++ benchmarks, see “Prepare dataset” section in benchmarks/cpp/README.md
  • Infra
    • Base Docker image for TensorRT-LLM is updated to nvcr.io/nvidia/pytorch:24.02-py3
      • The dependent PyTorch version is updated to 2.2.
    • Base Docker image for TensorRT-LLM backend is updated to nvcr.io/nvidia/tritonserver:24.02-py3
    • The dependent CUDA version is updated to 12.3.2 (a.k.a. 12.3 Update 2)
  • Documentation
    • Add documents for new C++ executor API, see docs/source/executor.md

@kaiyux kaiyux merged commit 66ca337 into main Mar 19, 2024
@Shixiaowei02 Shixiaowei02 deleted the kaiyu/update branch March 19, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants