Added `FindSingleUseData` Pass #1906

philip-paul-mueller · 2025-01-27T08:36:59Z

Added FindSingleUseData analysis pass

This new pass scans the SDFG and computes the data that is only used in one single place.
Essentially, this boils down to the check how often a data descriptor is accessed
A data descriptor is classified if the following statements are true:

There is exactly one AccessNode referring to the data descriptor.
The data descriptor is not referred to on an interstate edge, nor in the condition parts of ConditionBlocks or LoopRegions.

This pass will be needed to speed up fusion passes.

Is used to make the MapFusion PR going. Originally I wanted to compute the schared set, i.e. the non exclusive data, but I now ended up computing the exclusive set.

I was using the `symetric_difference()` instead of the `difference()` function to handle the inter state edges. However, this also lead to a clarification and a stricter handling of interstate edges. I.e. as soon as data is accessed on an interstate edge, it can never be exclusive. Furthermore, now not only the read symbols but all free symbols are used.

Instead of scanning the SDFG once it will now scan the SDFG for every intermediate that is processed. However, there is a fallback that allows to use the data that is computed by `FindExclusiveData`, which is not yet pressent, but will be introduced by [PR#1906](spcl#1906). But there are a few things to consider: - The `FindExclusiveData` is not passed as argument but through the `self._pipeline_results` variable. This is done to make it stateless, as it would be the same as before, just computed outside. - Before the cache stored the names of the data that could be bot be removed, now it is the data that can be removed. This makes it a little bit more stable. - Even if we use the `FindExclusiveData` mechanism, we also have to perform some limited checks. This is because the old scan computed some slightly more restrictive. However, the mechanism is not yet introduced.

phschaad

Very nice addition, thank you!

Needs some adaptation to control flow regions, but otherwise looks good

phschaad · 2025-01-27T16:35:47Z

dace/transformation/passes/analysis/analysis.py

+
+    def should_reapply(self, modified: ppl.Modifies) -> bool:
+        # If anything was modified, reapply
+        return modified & ppl.Modifies.AccessNodes & ppl.Modifies.States


This should include interstate edges too, no? I think to be on the safe side, given that we can access things on loop ranges and conditionals as well, this may need to be AccessNodes & CFG

You are right.

phschaad · 2025-01-27T16:39:13Z

dace/transformation/passes/analysis/analysis.py

+
+        # Compute the set of all data that is accessed, i.e. read, by the edges.
+        interstate_read_symbols: Set[str] = set()
+        for edge in sdfg.edges():


Two things here, because of Control flow regions:

You need to check sdfg.all_interstate_edges() to go through all possible interstate edges - sdfg.edges() only checks the top-level edges.

You must also check conditionals/loops - for this you can go over all control flow regions (sdfg.all_control_flow_regions) and query their used symbols with the used_symbols method, passing with_contents=False. That gives you the used symbols in things like conditional branch conditions, loop conditions, etc, without recursing into the graph.

Thanks for the info (and lesson) about them.
I have definitely to dig more into them.
Do you know where I can find more information about that, I checked the documentation, but it is not so verbose in that regard?

tbennun · 2025-01-27T17:08:42Z

I don’t like the name “exclusive”. What is exclusive about them? Maybe Immutable, Removable (or the negative), Shared?

phschaad · 2025-01-27T17:15:26Z

I vote for "Removable"

philip-paul-mueller · 2025-01-28T05:39:24Z

I agree the name is not good.
But I do not like "Shared" or the idea of computing the inverse (although this is the very original name that was used in MapFusion).
I also do not like "Removable" because I think the concept is too broad.

So I propose "SingleUseData", because this is the core, it is used at one single place, i.e. to transmit data between two Maps, nowhere else.

@phschaad @tbennun

This includes - The renaming to `SingleUseData`. - The correct scanning of InterstateEdges, before no nested edges were considered. - Inclusion of the conditions, such as the conditions used by `ConditionalBlock`, in the scan, they are now handled in the same way as interstate edges. There is also a new test for the last case.

phschaad

LGTM, thanks!

phschaad · 2025-01-28T08:07:15Z

dace/transformation/passes/analysis/analysis.py

+        :return: A dictionary mapping SDFGs to a `set` of strings containing the name
+            of the data descriptors that are only used once.
+        """
+        # TODO(pschaad): Should we index on cfg or the SDFG itself.


For something related to data containers and the question of 'can I remove this' I think it makes sense to index on the SDFG level

This PR introduces a new and improved version of `MapFusion`. A summary of the changes can also be found [here](https://github.com/user-attachments/files/18516299/new_map_fusion_summary_of_changes.pdf), it compares the resulting SDFGs generated by the old and new transformation of some unit tests. #### Fixed Bugs and removed Limitations - The subsets (not the `.subset` member of the Memlet; I mean the concept) of the new intermediate data descriptor were not computed correctly in some cases, especially in presence of offsets. See the `test_offset_correction_range_read()`, `test_offset_correction_scalar_read()` and the `test_offset_correction_empty()` tests. - Upon the propagation of the subsets, due to the changed intermediate, was not handled properly. Essentially, the transformation only updated `.subset` and ignored `.other_subset`. Which is correct in most cases but not always. See the `test_fusion_intrinsic_memlet_direction()` for more. - During the check if two maps could be fused the `.dynamic` property of the Memelts were fully ignored leading to wrong code. - The read-write conflict checks were refined, before all arrays needed to be accessed the wrong way, i.e. before a fusion was rejected when one map accessed `A[i, j]` and the other map was accessing `B[i + 1, j]`. Now this is possible as long as every access is point wise. See the `test_fusion_different_global_accesses()` test for an example. - The shape of the reduced intermediate is cleaned, i.e. unnecessary dimensions of size 1, are removed, except they were present in the original shape. To make an example, the intermediate array, `T`, had shape `(10, 1, 20)` and inside the map was accessed `T[__i, 0, __j]`, then the old transformation would have created an reduced intermediate of shape `(1, 1, 1)`, new its shape is `(1)`. Note if the intermediate has shape `(10, 20)` instead and would be accessed as `T[__i, __j]` then a `Scalar` would have been created. See also the `struct_dataflow` flag below. #### New Flags - `only_toplevel_maps`: If `True` the transformation will only fuse maps that are located at the top level, i.e. maps inside maps will not be merged. - `only_inner_maps`: If `True` then the transformation will only fuse maps that are inside other maps. - assume_always_shared`: If `True` then the transformation will assume that every intermediate is shared, i.e. the referenced data is used somewhere else in the SDFG and has to become an output of the fused maps. This will create dead data flow, but avoids a scan of the full SDFG. - `strict_dataflow`: This flag is enabled by default. It has two effects, first it will disable the cleaning of reduced intermediate storage. The second effect is more important as it will preserve a much stricter data flow. Most importantly, if the intermediate array is used downstream (this is not limited to the case that the array is the output of the second map) then the maps will not be fused together. This is mostly to work around some other bugs in DaCe, where other transformations failed to pink up the dependency. Note that the fused map would be correct, the problem are other transformations. #### `FullMapFusion` This PR also introduced the `FullMapFusion` pass, which makes use of the `FindSingleUseData` pass that was introduced in [PR#1906](#1906). The `FullMapFusion` applies MapFusion as long as possible, i.e. fuses all maps that can be fused. But instead of scanning the SDFG every time an intermediate node has to be classified, i.e. can it be deleted or not, it is done once and then reused which will speed up fusion process as it will remove the need to traverse the full SDFG many times. This new pass also replaced the direct application of MapFusion in `auto_optimizer`. #### References Collection of known issues in other transformation: - [RedundantArrayRemoval (auto optimize)](#1644) - [Bug in `RefineNestedAccess` and `SDFGState._read_and_write_sets()](#1643) - [Error in Composite Fusion (auto optimize)](#1642) --------- Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>

philip-paul-mueller added 5 commits January 27, 2025 08:14

Added an analysis pass that computes the exclusively used data.

85cc87d

Is used to make the MapFusion PR going. Originally I wanted to compute the schared set, i.e. the non exclusive data, but I now ended up computing the exclusive set.

Fixed some wrong typing.

6c6a012

Updated and clarified the working of the exclusive pass.

041d535

Added tests for the FindExclusiveData pass.

2ad4b98

philip-paul-mueller requested a review from phschaad January 27, 2025 08:41

phschaad requested changes Jan 27, 2025

View reviewed changes

philip-paul-mueller changed the title ~~Added FindExclusiveData Pass~~ Added FindSingleUseData Pass Jan 28, 2025

philip-paul-mueller requested a review from phschaad January 28, 2025 06:54

phschaad approved these changes Jan 28, 2025

View reviewed changes

philip-paul-mueller enabled auto-merge January 28, 2025 08:44

philip-paul-mueller added this pull request to the merge queue Jan 28, 2025

Merged via the queue into spcl:main with commit e1886a0 Jan 28, 2025
9 checks passed

philip-paul-mueller deleted the new-scan-path branch January 28, 2025 14:41

philip-paul-mueller mentioned this pull request Jan 29, 2025

New and Improved MapFusion #1629

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `FindSingleUseData` Pass #1906

Added `FindSingleUseData` Pass #1906

philip-paul-mueller commented Jan 27, 2025 •

edited

Loading

phschaad left a comment

phschaad Jan 27, 2025

philip-paul-mueller Jan 28, 2025

phschaad Jan 27, 2025

philip-paul-mueller Jan 28, 2025

tbennun commented Jan 27, 2025

phschaad commented Jan 27, 2025

philip-paul-mueller commented Jan 28, 2025

phschaad left a comment

phschaad Jan 28, 2025

Added FindSingleUseData Pass #1906

Added FindSingleUseData Pass #1906

Conversation

philip-paul-mueller commented Jan 27, 2025 • edited Loading

phschaad left a comment

Choose a reason for hiding this comment

phschaad Jan 27, 2025

Choose a reason for hiding this comment

philip-paul-mueller Jan 28, 2025

Choose a reason for hiding this comment

phschaad Jan 27, 2025

Choose a reason for hiding this comment

philip-paul-mueller Jan 28, 2025

Choose a reason for hiding this comment

tbennun commented Jan 27, 2025

phschaad commented Jan 27, 2025

philip-paul-mueller commented Jan 28, 2025

phschaad left a comment

Choose a reason for hiding this comment

phschaad Jan 28, 2025

Choose a reason for hiding this comment

Added `FindSingleUseData` Pass #1906

Added `FindSingleUseData` Pass #1906

philip-paul-mueller commented Jan 27, 2025 •

edited

Loading