New and Improved MapFusion (#1629)

This PR introduces a new and improved version of `MapFusion`. A summary of the changes can also be found [here](https://github.com/user-attachments/files/18516299/new_map_fusion_summary_of_changes.pdf), it compares the resulting SDFGs generated by the old and new transformation of some unit tests. #### Fixed Bugs and removed Limitations - The subsets (not the `.subset` member of the Memlet; I mean the concept) of the new intermediate data descriptor were not computed correctly in some cases, especially in presence of offsets. See the `test_offset_correction_range_read()`, `test_offset_correction_scalar_read()` and the `test_offset_correction_empty()` tests. - Upon the propagation of the subsets, due to the changed intermediate, was not handled properly. Essentially, the transformation only updated `.subset` and ignored `.other_subset`. Which is correct in most cases but not always. See the `test_fusion_intrinsic_memlet_direction()` for more. - During the check if two maps could be fused the `.dynamic` property of the Memelts were fully ignored leading to wrong code. - The read-write conflict checks were refined, before all arrays needed to be accessed the wrong way, i.e. before a fusion was rejected when one map accessed `A[i, j]` and the other map was accessing `B[i + 1, j]`. Now this is possible as long as every access is point wise. See the `test_fusion_different_global_accesses()` test for an example. - The shape of the reduced intermediate is cleaned, i.e. unnecessary dimensions of size 1, are removed, except they were present in the original shape. To make an example, the intermediate array, `T`, had shape `(10, 1, 20)` and inside the map was accessed `T[__i, 0, __j]`, then the old transformation would have created an reduced intermediate of shape `(1, 1, 1)`, new its shape is `(1)`. Note if the intermediate has shape `(10, 20)` instead and would be accessed as `T[__i, __j]` then a `Scalar` would have been created. See also the `struct_dataflow` flag below. #### New Flags - `only_toplevel_maps`: If `True` the transformation will only fuse maps that are located at the top level, i.e. maps inside maps will not be merged. - `only_inner_maps`: If `True` then the transformation will only fuse maps that are inside other maps. - assume_always_shared`: If `True` then the transformation will assume that every intermediate is shared, i.e. the referenced data is used somewhere else in the SDFG and has to become an output of the fused maps. This will create dead data flow, but avoids a scan of the full SDFG. - `strict_dataflow`: This flag is enabled by default. It has two effects, first it will disable the cleaning of reduced intermediate storage. The second effect is more important as it will preserve a much stricter data flow. Most importantly, if the intermediate array is used downstream (this is not limited to the case that the array is the output of the second map) then the maps will not be fused together. This is mostly to work around some other bugs in DaCe, where other transformations failed to pink up the dependency. Note that the fused map would be correct, the problem are other transformations. #### `FullMapFusion` This PR also introduced the `FullMapFusion` pass, which makes use of the `FindSingleUseData` pass that was introduced in [PR#1906](#1906). The `FullMapFusion` applies MapFusion as long as possible, i.e. fuses all maps that can be fused. But instead of scanning the SDFG every time an intermediate node has to be classified, i.e. can it be deleted or not, it is done once and then reused which will speed up fusion process as it will remove the need to traverse the full SDFG many times. This new pass also replaced the direct application of MapFusion in `auto_optimizer`. #### References Collection of known issues in other transformation: - [RedundantArrayRemoval (auto optimize)](#1644) - [Bug in `RefineNestedAccess` and `SDFGState._read_and_write_sets()](#1643) - [Error in Composite Fusion (auto optimize)](#1642) --------- Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
spcl · Feb 27, 2025 · b74e62c · b74e62c
1 parent e843361
commit b74e62c
Show file tree

Hide file tree

Showing 20 changed files with 3,814 additions and 578 deletions.
diff --git a/.gitignore b/.gitignore
@@ -192,3 +192,9 @@ _build/
 
 .*cache/
 /doc/source/config_schema.rst
+
+# Ignoring the test junk
+_all_tests/
+
+
+
diff --git a/dace/subsets.py b/dace/subsets.py
@@ -80,7 +80,7 @@ def covers(self, other):
         # Subsets of different dimensionality can never cover each other.
         if self.dims() != other.dims():
             return ValueError(
-                    f"A subset of dimensionality {self.dim()} cannot test covering a subset of dimensionality {other.dims()}"
+                    f"A subset of dimensionality {self.dims()} cannot test covering a subset of dimensionality {other.dims()}"
             )
 
         if not Config.get('optimizer', 'symbolic_positive'):
@@ -106,7 +106,7 @@ def covers_precise(self, other):
         # Subsets of different dimensionality can never cover each other.
         if self.dims() != other.dims():
             return ValueError(
-                    f"A subset of dimensionality {self.dim()} cannot test covering a subset of dimensionality {other.dims()}"
+                    f"A subset of dimensionality {self.dims()} cannot test covering a subset of dimensionality {other.dims()}"
             )
 
         # If self does not cover other with a bounding box union, return false.

diff --git a/dace/transformation/auto/auto_optimize.py b/dace/transformation/auto/auto_optimize.py
@@ -14,11 +14,12 @@
 import warnings
 
 # Transformations
-from dace.transformation.dataflow import MapCollapse, TrivialMapElimination, MapFusion, ReduceExpansion
+from dace.transformation.passes import FullMapFusion
+from dace.transformation.dataflow import MapCollapse, TrivialMapElimination, ReduceExpansion
 from dace.transformation.interstate import LoopToMap, RefineNestedAccess
 from dace.transformation.subgraph.composite import CompositeFusion
 from dace.transformation.subgraph import helpers as xfsh
-from dace.transformation import helpers as xfh
+from dace.transformation import helpers as xfh, pass_pipeline as ppl
 
 # Environments
 from dace.libraries.blas.environments import intel_mkl as mkl, openblas
@@ -57,8 +58,13 @@ def greedy_fuse(graph_or_subgraph: GraphViewType,
         if isinstance(graph_or_subgraph, SDFG):
             # If we have an SDFG, recurse into graphs
             graph_or_subgraph.simplify(validate_all=validate_all)
-            # MapFusion for trivial cases
-            graph_or_subgraph.apply_transformations_repeated(MapFusion, validate_all=validate_all)
+            # Apply MapFusion for the more trivial cases
+            full_map_fusion_pass = FullMapFusion(
+                    strict_dataflow=True,
+                    validate_all=validate_all,
+            )
+            full_map_fusion_pileline = ppl.Pipeline([full_map_fusion_pass])
+            full_map_fusion_pileline.apply_pass(graph_or_subgraph, {})
 
         # recurse into graphs
         for graph in graph_or_subgraph.nodes():
@@ -76,7 +82,13 @@ def greedy_fuse(graph_or_subgraph: GraphViewType,
         sdfg, graph, subgraph = None, None, None
         if isinstance(graph_or_subgraph, SDFGState):
             sdfg = graph_or_subgraph.parent
-            sdfg.apply_transformations_repeated(MapFusion, validate_all=validate_all)
+            # Apply MapFusion for the more trivial cases
+            full_map_fusion_pass = FullMapFusion(
+                    strict_dataflow=True,
+                    validate_all=validate_all,
+            )
+            full_map_fusion_pileline = ppl.Pipeline([full_map_fusion_pass])
+            full_map_fusion_pileline.apply_pass(sdfg, {})
             graph = graph_or_subgraph
             subgraph = SubgraphView(graph, graph.nodes())
         else:

diff --git a/dace/transformation/dataflow/buffer_tiling.py b/dace/transformation/dataflow/buffer_tiling.py
@@ -98,7 +98,13 @@ def apply(self, graph, sdfg):
 
         # Fuse maps
         some_buffer = next(iter(buffers))  # some dummy to pass to MapFusion.apply_to()
-        MapFusion.apply_to(sdfg, first_map_exit=tile_map1_exit, array=some_buffer, second_map_entry=tile_map2_entry)
+        MapFusion.apply_to(
+                sdfg,
+                first_map_exit=tile_map1_exit,
+                array=some_buffer,
+                second_map_entry=tile_map2_entry,
+                verify=True,
+        )
 
         # Optimize the simple cases
         map1_entry.range.ranges = [