Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix]: fix bug in aten::to, when network only have aten::to layer wil… #1108

Merged
merged 2 commits into from
Jul 22, 2022

Conversation

inocsin
Copy link
Contributor

@inocsin inocsin commented Jun 11, 2022

…l change input name

Signed-off-by: inocsin vcheungyi@163.com

Description

When (1) network only have aten::to layer or (2) the output of aten::to is same as input and the input of aten::to is network input, will change the input tensor's name, which will case an error.

class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()

  def forward(self, data, index):
    index = index.to(torch.int64) # in trt, output == input
    src = 1
    data = data.scatter_(1,index,src) # in torch
    return data

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

…l change input name

Signed-off-by: inocsin <vcheungyi@163.com>
@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: core Issues re: The core compiler component: tests Issues re: Tests labels Jun 11, 2022
@inocsin
Copy link
Contributor Author

inocsin commented Jun 11, 2022

@narendasan please review this change

@narendasan
Copy link
Collaborator

This seems fine to me but I think it should be part of more comprehensive changes to catch this class of error. cc: @bowang007

@bowang007
Copy link
Collaborator

looks like this issue is related to this one #982. Both these issues are triggered by changing the names of ITensor.
I'm wondering if there are other similar issues comes from other converters, as what we have discussed @narendasan .
If we introduce some kind of detection mechanism to prevent renaming ITensors, then this change would be unnecessary.

@bowang007
Copy link
Collaborator

bowang007 commented Jun 22, 2022

what's the error message that you have now? @inocsin
I'm seeing a segmentation fault.

@inocsin
Copy link
Contributor Author

inocsin commented Jun 26, 2022

what's the error message that you have now? @inocsin I'm seeing a segmentation fault.

Error message is here, because the input with name input_0 is changed to output value named 4, so the binding will fail.

DEBUG: [Torch-TensorRT - Debug Build] - Running JIT version
DEBUG: [Torch-TensorRT - Debug Build] - Running TRT version
DEBUG: [Torch-TensorRT - Debug Build] - Pairing 0: y.1 : Input(shape: [3], dtype: Float32, format: NCHW\Contiguous\Linear)
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init CUDA: CPU +318, GPU +0, now: CPU 3018, GPU 1632 (MiB)
INFO: [Torch-TensorRT - Debug Build] - Settings requested for TensorRT engine:
    Enabled Precisions: Float32
    TF32 Floating Point Computation Enabled: 1
    Truncate Long and Double: 0
    Make Refittable Engine: 0
    Debuggable Engine: 0
    GPU ID: 0
    Allow GPU Fallback (if running on DLA): 0
    Min Timing Iterations: 2
    Avg Timing Iterations: 1
    Max Workspace Size: 1073741824
    Device Type: GPU
    GPU ID: 0
    Engine Capability: standard
    Calibrator Created: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - Converting Block
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - graph(%y.1 : Tensor):
  %1 : int = prim::Constant[value=6]()
  %2 : bool = prim::Constant[value=0]()
  %3 : NoneType = prim::Constant()
  %4 : Tensor = aten::to(%y.1, %1, %2, %2, %3)
  return (%4)

DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Input Dimension Specs: {
    y.1 : Input(shape: [3], dtype: Float32, format: NCHW\Contiguous\Linear),}
INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Input y.1 (named: input_0): Input(shape: [3], dtype: Float32, format: NCHW\Contiguous\Linear) in engine (conversion.AddInputs)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %1 : int = prim::Constant[value=6]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: 6
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %2 : bool = prim::Constant[value=0]()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: False
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %3 : NoneType = prim::Constant()
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: None
INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Layer %4 : Tensor = aten::to(%y.1, %1, %2, %2, %3) (ctx.AddLayer)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is an already converted tensor
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Node input is a result of a previously evaluated value
DEBUG: [Torch-TensorRT - Debug Build] - ITensor shape: [3]
DEBUG: [Torch-TensorRT - Debug Build] - ITensor type: Float32
DEBUG: [Torch-TensorRT - Debug Build] - [aten::to.dtype] Output tensor shape: [3]
DEBUG: [Torch-TensorRT - Debug Build] - One of the inputs named 4 to the network is marked as an output tensor. Applying an identity layer and marking this tensor as output
INFO: [Torch-TensorRT TorchScript Conversion Context] - Marking Output 4 named output_0 in engine (ctx.MarkOutput)
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageSnapshot] Builder begin: CPU 3018 MiB, GPU 1632 MiB
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Applying generic optimizations to the graph for inference.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Original: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After dead-layer removal: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After Myelin optimization: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After scale fusion: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After vertical fusions: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After dupe layer removal: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After final dead-layer removal: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After tensor merging: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After concat removal: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Graph construction and optimization completed in 0.0130252 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Using cublasLt a tactic source
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +322, GPU +166, now: CPU 3340, GPU 1798 (MiB)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Using cuDNN as a tactic source
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuDNN: CPU +454, GPU +204, now: CPU 3794, GPU 2002 (MiB)
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected invalid timing cache, setup a local cache instead
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Constructing optimization profile number 0 [1/1].
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - *************** Autotuning format combination: Float(1) -> Float(1) ***************
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: (Unnamed Layer* 0) [Identity] (Cast)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Cast has no valid tactics for this config, skipping
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - --------------- Timing Runner: (Unnamed Layer* 0) [Identity] (Reformat)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 1002 Time: 0.011776
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Tactic: 0 Time: 0.006272
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Fastest Tactic: 0 Time: 0.006272
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Formats and tactics selection completed in 0.00822353 seconds.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - After reformat layers: 1 layers
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Block size 1073741824
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Total Activation Memory: 1073741824
INFO: [Torch-TensorRT TorchScript Conversion Context] - Detected 1 inputs and 1 output network tensors.
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Layer: (Unnamed Layer* 0) [Identity] HostPersistent: 0 DevicePersistent: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - Total Host Persistent Memory: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - Total Device Persistent Memory: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - Total Scratch Memory: 0
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Using cublasLt a tactic source
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 3794, GPU 2010 (MiB)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Using cuDNN as a tactic source
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3794, GPU 2018 (MiB)
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 3794, GPU 2002 (MiB)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Engine generation completed in 1.7908 seconds.
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 3794, GPU 1984 (MiB)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Engine Layer Information:
Layer(Reformat): (Unnamed Layer* 0) [Identity], Tactic: 0, 4[Float(3)] -> output_0[Float(3)]
INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageSnapshot] Builder end: CPU 3794 MiB, GPU 1984 MiB
DEBUG: [Torch-TensorRT - Debug Build] - Running TRT version
DEBUG: [Torch-TensorRT - Debug Build] - Target Device: Device(ID: 0, Name: Tesla T4, SM Capability: 7.5, Type: GPU)
DEBUG: [Torch-TensorRT - Debug Build] - Setting Device(ID: 0, Name: Tesla T4, SM Capability: 7.5, Type: GPU) as active device
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 3794, GPU 1984 (MiB)
INFO: [Torch-TensorRT - Debug Build] - Loaded engine size: 0 MB
INFO: [Torch-TensorRT - Debug Build] - [MemUsageSnapshot] deserializeCudaEngine begin: CPU 3794 MiB, GPU 1984 MiB
DEBUG: [Torch-TensorRT - Debug Build] - Using cublasLt a tactic source
WARNING: [Torch-TensorRT - Debug Build] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 3794, GPU 1994 (MiB)
DEBUG: [Torch-TensorRT - Debug Build] - Using cuDNN as a tactic source
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3794, GPU 2002 (MiB)
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 3794, GPU 1984 (MiB)
DEBUG: [Torch-TensorRT - Debug Build] - Deserialization required 25742 microseconds.
INFO: [Torch-TensorRT - Debug Build] - [MemUsageSnapshot] deserializeCudaEngine end: CPU 3794 MiB, GPU 1984 MiB
INFO: [Torch-TensorRT - Debug Build] - [MemUsageSnapshot] ExecutionContext creation begin: CPU 3794 MiB, GPU 1984 MiB
DEBUG: [Torch-TensorRT - Debug Build] - Using cublasLt a tactic source
WARNING: [Torch-TensorRT - Debug Build] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 3794, GPU 1994 (MiB)
DEBUG: [Torch-TensorRT - Debug Build] - Using cuDNN as a tactic source
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3794, GPU 2002 (MiB)
DEBUG: [Torch-TensorRT - Debug Build] - Total per-runner device memory is 0
DEBUG: [Torch-TensorRT - Debug Build] - Total per-runner host memory is 0
DEBUG: [Torch-TensorRT - Debug Build] - Allocated activation device memory of size 0
INFO: [Torch-TensorRT - Debug Build] - [MemUsageSnapshot] ExecutionContext creation end: CPU 3794 MiB, GPU 2002 MiB
DEBUG: [Torch-TensorRT - Debug Build] - Binding name: 4
INFO: [Torch-TensorRT - Debug Build] - [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 3794, GPU 1984 (MiB)
unknown file: Failure
C++ exception with description "[Error thrown at core/runtime/TRTEngine.cpp:65] Expected delim != std::string::npos to be true but got false
Unable to determine binding index for input 4
Ensure module was compiled with Torch-TensorRT.ts or follows Torch-TensorRT Runtime conventions
" thrown in the test body.
[  FAILED  ] Converters.ATenToSingleConvertsCorrectly (7865 ms)
[----------] 1 test from Converters (7865 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (7865 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Converters.ATenToSingleConvertsCorrectly

 1 FAILED TEST

@bowang007
Copy link
Collaborator

same error with #982.

@inocsin
Copy link
Contributor Author

inocsin commented Jun 29, 2022

@bowang007
Copy link
Collaborator

@bowang007 Delete this line will also solve the problem https://github.com/pytorch/TensorRT/blob/master/core/conversion/conversionctx/ConversionCtx.cpp#L133

yes, we discussed this WAR in the channel.
However, not sure if this deletion would trigger other issues.

Copy link
Collaborator

@bowang007 bowang007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ncomly-nvidia ncomly-nvidia added the release: v1.2 Tagged to be included in v1.2 label Jul 15, 2022
@github-actions github-actions bot requested a review from bowang007 July 15, 2022 01:20
@narendasan narendasan added the Story: Binding Names Issues related to binding names, format and uniqueness label Jul 15, 2022
Signed-off-by: inocsin <vcheungyi@163.com>
@inocsin
Copy link
Contributor Author

inocsin commented Jul 22, 2022

@dheerajperi reverted change

@peri044 peri044 merged commit fc04d4a into pytorch:master Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: core Issues re: The core compiler component: tests Issues re: Tests release: v1.2 Tagged to be included in v1.2 Story: Binding Names Issues related to binding names, format and uniqueness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants