scatter_add_decomposition #2740

apbose · 2024-04-09T21:12:34Z

The issue in this PR is for cases where there is index collision. Example of such cases-
scatter_add(input_tensor = torch.zeros(3,5), dim=1, index_tensor = torch.tensor([[0, 1, 2, 0]]).cuda(), src_tensor = torch.tensor([[1, 2, 3, 1]], dtype=torch.int32).cuda())
Looks like straighforward decomposition by stacking the src_tensor would not work, since it would again lead to collision for some cases.
The better way to implement is to do converter implementation and allot the values using advanced indexing.
Working on this now.

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-05-14 21:04:24.249027+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-05-14 21:06:14.833958+00:00
@@ -181,13 +181,16 @@
    input_tensor: torch.Tensor,
    src_tensor: torch.Tensor,
    dim: int,
    index: torch.Tensor,
) -> torch.Tensor:
-    input_tensor_to_add = torch.scatter(torch.empty_like(input_tensor), dim, index, src_tensor)
+    input_tensor_to_add = torch.scatter(
+        torch.empty_like(input_tensor), dim, index, src_tensor
+    )
    scatter_add_tensor = torch.add(input_tensor, input_tensor_to_add.cuda())
    return scatter_add_tensor
+

def get_decompositions(
    enable_experimental_decompositions: bool = False,
) -> Dict[OpOverload, Callable[[Any], Any]]:
    if enable_experimental_decompositions:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-05-14 21:04:33.408462+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-05-14 21:06:24.734368+00:00
@@ -181,13 +181,16 @@
    input_tensor: torch.Tensor,
    src_tensor: torch.Tensor,
    dim: int,
    index: torch.Tensor,
) -> torch.Tensor:
-    input_tensor_to_add = torch.scatter(torch.empty_like(input_tensor), dim, index, src_tensor)
+    input_tensor_to_add = torch.scatter(
+        torch.empty_like(input_tensor), dim, index, src_tensor
+    )
    scatter_add_tensor = torch.add(input_tensor, input_tensor_to_add.cuda())
    return scatter_add_tensor
+

def get_decompositions(
    enable_experimental_decompositions: bool = False,
) -> Dict[OpOverload, Callable[[Any], Any]]:
    if enable_experimental_decompositions:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-05-14 21:01:28.751182+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-05-14 21:09:51.626381+00:00
@@ -181,13 +181,16 @@
    input_tensor: torch.Tensor,
    src_tensor: torch.Tensor,
    dim: int,
    index: torch.Tensor,
) -> torch.Tensor:
-    input_tensor_to_add = torch.scatter(torch.empty_like(input_tensor), dim, index, src_tensor)
+    input_tensor_to_add = torch.scatter(
+        torch.empty_like(input_tensor), dim, index, src_tensor
+    )
    scatter_add_tensor = torch.add(input_tensor, input_tensor_to_add.cuda())
    return scatter_add_tensor
+

def get_decompositions(
    enable_experimental_decompositions: bool = False,
) -> Dict[OpOverload, Callable[[Any], Any]]:
    if enable_experimental_decompositions:

gs-olive · 2024-05-15T16:44:28Z

py/torch_tensorrt/dynamo/lowering/_decompositions.py

+    dim: int,
+    index: torch.Tensor,
+) -> torch.Tensor:
+    input_tensor_to_add = torch.scatter(torch.empty_like(input_tensor), dim, index, src_tensor)


Could torch.empty_like(input_tensor) instead be just input_tensor, or is it required to be empty?

Yes it would be required. The logic is like we would apply scatter with the indices to an empty tensor and then add it to the input tensor.

gs-olive · 2024-05-15T16:44:43Z

py/torch_tensorrt/dynamo/lowering/_decompositions.py

+    index: torch.Tensor,
+) -> torch.Tensor:
+    input_tensor_to_add = torch.scatter(torch.empty_like(input_tensor), dim, index, src_tensor)
+    scatter_add_tensor = torch.add(input_tensor, input_tensor_to_add.cuda())


Without .cuda() does the decomposition fail?

Yes in this case it complains that the two tensors are on different devices.

apbose · 2024-05-16T22:52:02Z

@gs-olive though the test cases pass, I don't think that the decomposition is taking place.
It fails in the backend at aot_export_joint_simple. Shows me something like

CRITICAL:torch_tensorrt.dynamo.backend.backends:Halting compilation on build failure since pass_through_build_failures was specified as True. To
return the default Torch implementation and avoid halting compilation on engine build failures, specify pass_through_build_failures=False.
W0516 15:51:10.818862 139669023303488 torch/_dynamo/exc.py:201] [0/0] Backend compiler failed with a fake tensor exception at
W0516 15:51:10.818862 139669023303488 torch/_dynamo/exc.py:201] [0/0]   File "<eval_with_key>.5 from /home/abose/Documents/work/torchTRT_TRT10_scatter_5_2/TensorRT/tests/py/dynamo/lowering/test_decompositions.py:523 in forward", line 9, in forward
W0516 15:51:10.818862 139669023303488 torch/_dynamo/exc.py:201] [0/0]     return scatter_add_default
W0516 15:51:10.818862 139669023303488 torch/_dynamo/exc.py:201] [0/0] Adding a graph break.

Is this due to torch.unique?

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-07-11 20:45:53.887948+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/_decompositions.py	2024-07-11 20:47:45.424382+00:00
@@ -259,23 +259,36 @@
    select_src_dim = src_copy.shape[dim]
    to_stack_dummy_src = tuple(torch.empty(src_shape) for _ in range(select_src_dim))
    for index_src_dim in range(0, select_src_dim, 1):
        select_tensor_dim = torch.select(src_copy, dim, index_src_dim)
        to_stack_src = to_stack_dummy_src
-        if(index_src_dim == 0):
-            to_stack_src = (select_tensor_dim.cpu(),) + to_stack_dummy_src[index_src_dim+1:] 
-        elif(index_src_dim == select_src_dim - 1 ):
-            to_stack_src = to_stack_dummy_src[:index_src_dim] + (select_tensor_dim.cpu(),)
+        if index_src_dim == 0:
+            to_stack_src = (select_tensor_dim.cpu(),) + to_stack_dummy_src[
+                index_src_dim + 1 :
+            ]
+        elif index_src_dim == select_src_dim - 1:
+            to_stack_src = to_stack_dummy_src[:index_src_dim] + (
+                select_tensor_dim.cpu(),
+            )
        else:
-            to_stack_src = to_stack_dummy_src[:index_src_dim] + (select_tensor_dim.cpu(),) + to_stack_dummy_src[index_src_dim+1:]
+            to_stack_src = (
+                to_stack_dummy_src[:index_src_dim]
+                + (select_tensor_dim.cpu(),)
+                + to_stack_dummy_src[index_src_dim + 1 :]
+            )

        stacked_src = torch.stack(to_stack_src, dim)
-        input_tensor_to_add = torch.scatter(torch.empty_like(input_tensor, dtype= torch.float32), dim, index, stacked_src.cuda())
+        input_tensor_to_add = torch.scatter(
+            torch.empty_like(input_tensor, dtype=torch.float32),
+            dim,
+            index,
+            stacked_src.cuda(),
+        )
        scatter_add_tensor = torch.add(scatter_add_tensor, input_tensor_to_add)
    return scatter_add_tensor

-    
+
def get_decompositions(
    enable_experimental_decompositions: bool = False,
) -> Dict[OpOverload, Callable[[Any], Any]]:
    if enable_experimental_decompositions:
        CORE_ATEN_DECOMPOSITIONS_FILTERED: Dict[OpOverload, Callable[[Any], Any]] = {

Fixing scatter_add test cases. To do: fix the index collision cases Index collision cases Index collision cases- removing the torch.unique checl

zewenli98 · 2024-07-16T00:34:18Z

py/torch_tensorrt/dynamo/lowering/_decompositions.py

+def scatter_add_decomposition(
+    input_tensor: torch.Tensor,
+    src_tensor: torch.Tensor,
+    dim: int,
+    index: torch.Tensor,
+) -> torch.Tensor:


It seems the order of args doesn't match with tests, which made CI failed.

Thanks for pointing this out! Corrected.

zewenli98

LGTM

apbose self-assigned this Apr 9, 2024

apbose marked this pull request as draft April 9, 2024 21:12

facebook-github-bot added the cla signed label Apr 9, 2024

github-actions bot added component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Apr 9, 2024

github-actions bot requested a review from peri044 April 9, 2024 21:12

apbose mentioned this pull request Apr 9, 2024

aten.scatter_add #2737

Open

apbose removed the request for review from peri044 April 9, 2024 21:13

apbose force-pushed the scatter_add_decomposition branch from a49c420 to e8c7b50 Compare May 14, 2024 20:58

apbose marked this pull request as ready for review May 14, 2024 21:04

apbose requested a review from gs-olive May 14, 2024 21:04

github-actions bot requested changes May 14, 2024

View reviewed changes

gs-olive reviewed May 15, 2024

View reviewed changes

apbose marked this pull request as draft June 6, 2024 23:00

apbose force-pushed the scatter_add_decomposition branch 2 times, most recently from bcc09f9 to 10a384e Compare July 11, 2024 20:45

github-actions bot requested changes Jul 11, 2024

View reviewed changes

scatter_add_decomposition

10a384e

Fixing scatter_add test cases. To do: fix the index collision cases Index collision cases Index collision cases- removing the torch.unique checl

apbose marked this pull request as ready for review July 11, 2024 22:57

apbose requested review from peri044 and zewenli98 July 11, 2024 22:58

apbose mentioned this pull request Jul 15, 2024

scatter reduce decomposition #3008

Merged

zewenli98 reviewed Jul 16, 2024

View reviewed changes

apbose force-pushed the scatter_add_decomposition branch from 8027b9d to 2f7221e Compare July 16, 2024 18:00

changing the implementation and adding more test cases

2f7221e

zewenli98 approved these changes Jul 16, 2024

View reviewed changes

lanluo-nvidia merged commit 225fe3b into main Jul 24, 2024
42 of 61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scatter_add_decomposition #2740

scatter_add_decomposition #2740

apbose commented Apr 9, 2024 •

edited

Loading

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

gs-olive May 15, 2024

apbose May 16, 2024

gs-olive May 15, 2024

apbose May 16, 2024

apbose commented May 16, 2024 •

edited

Loading

github-actions bot left a comment

zewenli98 Jul 16, 2024

apbose Jul 16, 2024

zewenli98 left a comment

scatter_add_decomposition #2740

scatter_add_decomposition #2740

Conversation

apbose commented Apr 9, 2024 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

gs-olive May 15, 2024

Choose a reason for hiding this comment

apbose May 16, 2024

Choose a reason for hiding this comment

gs-olive May 15, 2024

Choose a reason for hiding this comment

apbose May 16, 2024

Choose a reason for hiding this comment

apbose commented May 16, 2024 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

zewenli98 Jul 16, 2024

Choose a reason for hiding this comment

apbose Jul 16, 2024

Choose a reason for hiding this comment

zewenli98 left a comment

Choose a reason for hiding this comment

apbose commented Apr 9, 2024 •

edited

Loading

apbose commented May 16, 2024 •

edited

Loading