fix: Device casting issues with certain `aten` operators #1416

gs-olive · 2022-10-25T05:41:51Z

Description

Investigated issue arising with BART-base model (https://huggingface.co/facebook/bart-base) where certain tensor inputs to TensorRT were on the cpu, despite users explicitly casting all inputs properly
Traced issue to internally-generated 0D tensors, mask tensors, and operations returning CPU tensors passed between Torch and Torch-TensorRT engines
Added lowering passes to ensure function edge cases are appropriately dealt with, and added validation check in runtime to avoid models crashing at runtime due to device mismatches

Type of change

Bug fix (non-breaking change which fixes an issue)
- Fixed device location errors relating to specific aten functions and casting
New feature (non-breaking change which adds functionality)
- Added runtime check to ensure all tensor inputs to TorchTRT are on GPU

Checklist:

[ x ] My code follows the style guidelines of this project (You can use the linters)
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas and hacks
[ x ] I have made corresponding changes to the documentation
[ x ] I have added tests to verify my fix or my feature
[ x ] New and existing unit tests pass locally with my changes
[ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

gs-olive · 2022-10-26T06:50:31Z

core/lowering/passes/device_casting.cpp

+      %false: bool = prim::Constant[value=0]()
+      %mask_cuda: Tensor = aten::to(%mask, %device, %dtype, %false, %false)
+      %self_cuda: Tensor = aten::to(%self, %device, %dtype, %false, %false)
+      %out: Tensor = aten::masked_fill_(%self_cuda, %mask_cuda, %value)


This could be replaced by aten::masked_fill, for which a converter exists, however there are a few bugs which arise in edge cases when %value is a float but %self_cuda is an int, and similar such scenarios which are not directly handled by the converter and can cause errors in TRT. As a result, I opted for the unimplemented aten::masked_fill_ version with casted tensors, in the meantime.

narendasan · 2022-10-31T19:17:18Z

core/lowering/passes/device_casting.cpp

+  // should be casted to CUDA to avoid device mismatch errors
+  std::string unpacked_pattern = R"IR(
+    graph(%self, %mask, %value):
+      %device: Device = prim::Constant[value="cuda"]()


What happens in the case of multi-gpu systems?

Potentially could add an argument to take the target device.

Take a look at snprintf and modifying core::CompileSpec::lower_info to add a device field which replicates the device info from the external API, then you should able to determine the target device at lower time.

narendasan · 2022-10-31T19:28:59Z

core/lowering/passes/device_casting.cpp

+      %false: bool = prim::Constant[value=0]()
+      %mask_cuda: Tensor = aten::to(%mask, %device, %dtype, %false, %false)
+      %self_cuda: Tensor = aten::to(%self, %device, %dtype, %false, %false)
+      %out: Tensor = aten::masked_fill_(%self_cuda, %mask_cuda, %value)


Right now this would handle the in place case. Can handle the functional case here too?

TensorRT/core/conversion/converters/converter_util.cpp

Line 62 in 5a7f00e

nvinfer1::DataType promote_types(nvinfer1::DataType type_a, nvinfer1::DataType type_b) {

Thanks for the suggestion - this actually turns out to be a bug in the aten::masked_fill converter, as it behaves differently than Torch when the types of the input and value are different. Specifically, the converter throws an error whereas Torch just inherits the type of the first argument. I will make a new PR + Test Cases for this, as it is an unrelated bug.

Fix in PR #1430

narendasan · 2022-10-31T19:29:15Z

core/lowering/passes/device_casting.cpp

+  std::string num_to_tensor_clean_pattern = R"IR(
+    graph(%1: Scalar):
+      %2: Tensor = prim::NumToTensor(%1)
+      %device: Device = prim::Constant[value="cuda"]()


See above about correct device

narendasan · 2022-10-31T19:31:13Z

core/lowering/passes/device_casting.cpp

+  // to avoid device mismatch issues
+  std::string full_clean_pattern = R"IR(
+    graph(%1, %2, %3, %4, %5, %6):
+      %cuda: Device = prim::Constant[value="cuda"]()


core/runtime/execute_engine.cpp

cpp/src/compile_spec.cpp

core/lowering/passes/device_casting.cpp

gs-olive · 2022-11-01T17:49:00Z

core/lowering/passes/device_casting.cpp

+  std::string clean_pattern_part_1 = R"IR(
+    graph(%1: Scalar):
+      %2: Tensor = prim::NumToTensor(%1)
+      %device: Device = prim::Constant[value=")IR";
+
+  std::string clean_pattern_part_2 = R"IR("]()
+      %dtype: NoneType = prim::Constant()
+      %false: bool = prim::Constant[value=0]()
+      %3: Tensor = aten::to(%2, %device, %dtype, %false, %false)
+      return (%3))IR";
+
+  auto num_to_tensor_clean_pattern = clean_pattern_part_1 + target_device_name + clean_pattern_part_2;


Had to use this paradigm instead of snprintf because the % symbols in the IR are registered as formatting for snprintf, which made it difficult to insert the device string

core/util/trt_util.h

narendasan · 2022-11-07T20:07:26Z

core/runtime/execute_engine.cpp


    for (auto& in : inputs) {
      in = in.to(torch::Device(target_device));
    }
+  } else {


Doesn't need to be an else, could just be a second check.

Updated the else block to just assign the cuda target device name, and now the runtime device check is applied as a second check

narendasan · 2022-11-08T19:29:41Z

fixes: #1446

- Investigated issue arising with BART-base model (https://huggingface.co/facebook/bart-base) where certain tensor inputs to TensorRT were on the cpu, despite users explicitly casting all inputs properly - Traced issue to internally-generated 0D tensors, mask tensors, and operations returning CPU tensors passed between Torch and Torch-TensorRT engines - Added lowering passes to ensure function edge cases are appropriately dealt with, tensors are located on the proper device at runtime, and added validation check in runtime to avoid models crashing at runtime due to device mismatches - Added testing for lowering passes to ensure output values are accurate

…evice - Adde field to LowerInfo to hold device information - Update internal Device struct location to allow streamlined imports - Update BUILD files - Build strings in lowering phase using user-specified target device - Update CMakeLists to reflect IR dependency in lowering - Update runtime device location code to run regardless of whether a switch is required or not.

facebook-github-bot added the cla signed label Oct 25, 2022

github-actions bot added component: core Issues re: The core compiler component: lowering Issues re: The lowering / preprocessing passes component: runtime labels Oct 25, 2022

github-actions bot requested review from andi4191, bowang007, narendasan and peri044 October 25, 2022 05:42

gs-olive added the release: v1.3 Tagged to be included in v1.3 label Oct 25, 2022

gs-olive commented Oct 26, 2022

View reviewed changes

github-actions bot added the component: tests Issues re: Tests label Oct 26, 2022

gs-olive marked this pull request as ready for review October 26, 2022 18:41

narendasan reviewed Oct 31, 2022

View reviewed changes

narendasan assigned gs-olive Nov 1, 2022

github-actions bot added the component: api [C++] Issues re: C++ API label Nov 1, 2022

gs-olive commented Nov 1, 2022

View reviewed changes

cpp/src/compile_spec.cpp Outdated Show resolved Hide resolved

gs-olive commented Nov 1, 2022

View reviewed changes

core/lowering/passes/device_casting.cpp Show resolved Hide resolved

gs-olive commented Nov 1, 2022

View reviewed changes

github-actions bot added the component: conversion Issues re: Conversion stage label Nov 1, 2022

gs-olive commented Nov 1, 2022

View reviewed changes

core/util/trt_util.h Outdated Show resolved Hide resolved

gs-olive force-pushed the cuda_cpu_bugfix branch from decd728 to cebc58f Compare November 1, 2022 22:00

gs-olive requested a review from narendasan November 3, 2022 18:36

narendasan reviewed Nov 7, 2022

View reviewed changes

narendasan linked an issue Nov 8, 2022 that may be closed by this pull request

🐛 [Bug] 0-D Tensor input for TensorRT subgraph is usually on cpu, not cuda #1446

Closed

gs-olive mentioned this pull request Nov 8, 2022

feat: support int64 <=> int32 auto conversion #1407

Merged

7 tasks

gs-olive added 2 commits November 14, 2022 10:15

gs-olive force-pushed the cuda_cpu_bugfix branch from 9e7822a to 8583a4c Compare November 14, 2022 18:20

narendasan merged commit 3d84b43 into pytorch:master Nov 14, 2022

This was referenced Jan 10, 2023

🐛 [Bug] Compilation failure for HuggingFace T5-base Model #1583

Closed

🐛 [Bug] Encountered bug when using Torch-TensorRT #1123

Closed

gs-olive mentioned this pull request Jun 21, 2023

🐛 [Bug] Tensors on CPU/CUDA issue encountered in TorchScript #2041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Device casting issues with certain `aten` operators #1416

fix: Device casting issues with certain `aten` operators #1416

gs-olive commented Oct 25, 2022 •

edited

Loading

gs-olive Oct 26, 2022

narendasan Oct 31, 2022

narendasan Oct 31, 2022

narendasan Oct 31, 2022

narendasan Oct 31, 2022

gs-olive Nov 1, 2022

gs-olive Nov 1, 2022

narendasan Oct 31, 2022

narendasan Oct 31, 2022

gs-olive Nov 1, 2022

narendasan Nov 7, 2022

gs-olive Nov 7, 2022

narendasan commented Nov 8, 2022

fix: Device casting issues with certain aten operators #1416

fix: Device casting issues with certain aten operators #1416

Conversation

gs-olive commented Oct 25, 2022 • edited Loading

Description

Type of change

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narendasan commented Nov 8, 2022

fix: Device casting issues with certain `aten` operators #1416

fix: Device casting issues with certain `aten` operators #1416

gs-olive commented Oct 25, 2022 •

edited

Loading