Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in RefineNestedAccess and SDFGState._read_and_write_sets() #1643

Open
philip-paul-mueller opened this issue Sep 6, 2024 · 6 comments
Open

Comments

@philip-paul-mueller
Copy link
Collaborator

philip-paul-mueller commented Sep 6, 2024

During the implementation of the new MapFusion transformation it became necessary to reimplement _read_and_write_sets() there several bugs were discovered and fixed. However, one bug could not be fixed that is related to how the read set is cleaned.

If the following patch is applied

From f61d4e142b2b2097250b77e5b56b636b67ac9219 Mon Sep 17 00:00:00 2001
From: Philip Mueller <philip.mueller@cscs.ch>
Date: Fri, 6 Sep 2024 15:23:59 +0200
Subject: [PATCH] Undo the compatibility mode in _read_and_write_sets

---
 dace/sdfg/state.py | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/dace/sdfg/state.py b/dace/sdfg/state.py
index 23ac2f763..0c912d4d1 100644
--- a/dace/sdfg/state.py
+++ b/dace/sdfg/state.py
@@ -791,16 +791,6 @@ class DataflowGraphView(BlockGraphView, abc.ABC):
                 # Filter out memlets which go out but the same data is written to the AccessNode by another memlet
                 for out_edge in list(out_edges):
                     for in_edge in in_edges:
-                        if out_edge.data.data != in_edge.data.data:
-                            # NOTE: This check does not make any sense, and is in my view wrong.
-                            #   If we consider a memlet between two access nodes, to which access
-                            #   node the `data` attribute of the memlet refers to is arbitrary and
-                            #   does not matter. However, the test will filter _some_ out but not
-                            #   all. See also the tests inside `tests/sdfg/state_test.py` for the
-                            #   wrong behaviour this check induces.
-                            #   This check is is retained for  compatibility with `RefineNestedAccess`,
-                            #   see `tests/numpy/ufunc_support_test.py::test_ufunc_add_accumulate_simple`.
-                            continue
                         if in_subsets[in_edge].covers(out_subsets[out_edge]):
                             out_edges.remove(out_edge)
                             break
-- 
2.46.0

then tests/transformations/move_loop_into_map_test.py::MoveLoopIntoMapTest::test_apply_multiple_times will fail with an invalid SDFG error. The bug is probably located somewhere inside RefineNestedAccess.

Another error that surfaces is in tests/numpy/ufunc_support_test.py::test_ufunc_add_accumulate_simple, however, it only surfaces if the config variables optimizer.automatic_simplification and optimizer.autooptimize are set to True.

Since the regression test (see below) respect the bug it will also fail, in addition tests/transformations/move_loop_into_map_test.py::MoveLoopIntoMapTest::test_more_than_a_map will also fail, because now a transformation, that was previously classified as not applicable, became applicable (actually here I am not sure, but since my regression tests show that the old implementation is not correct, I am pretty positive).

@philip-paul-mueller philip-paul-mueller changed the title Wrong Bhabiour in SDFGState.read_and_write_sets() Wrong Behaviour in SDFGState.read_and_write_sets() Sep 6, 2024
@philip-paul-mueller philip-paul-mueller changed the title Wrong Behaviour in SDFGState.read_and_write_sets() Wrong Behaviour RefineNestedAccess Sep 6, 2024
@philip-paul-mueller philip-paul-mueller changed the title Wrong Behaviour RefineNestedAccess Bug in RefineNestedAccess and SDFGState._read_and_write_sets() Sep 6, 2024
@philip-paul-mueller
Copy link
Collaborator Author

philip-paul-mueller commented Sep 9, 2024

After I merged master into the new-map-fusion branch some tests that were previously failing (if the bug described here is fixed) no longer failed, thus in commit 5c49eee66 I removed the check, the tests should now still pass.

It turned out, it does not run, there was an error in the tests/numpy/ufunc_support_test.py::test_ufunc_add_accumulate_simple test, I forgot about it as it only happens in autoopt mode.

philip-paul-mueller added a commit to philip-paul-mueller/dace that referenced this issue Sep 20, 2024
…t is maintained and when not.

However, this has some consequences.
I added a test, that shows that this leads, depending on the memlet configuration to different outcome of the transformation.
Okay, it is only affects if the transformation can be applied or not, but still.
This is also in line with my [issue#1643](spcl#1643) that shows that this is a problem.
@philip-paul-mueller
Copy link
Collaborator Author

See also commit e1f2bf6 for an example where this behaviour is kicking in.

@acalotoiu
Copy link
Contributor

Prune connectors is one of the transformations that has repeatedly caused problems and I believe is not currently used because of it. Maybe we should deprecate or fix @phschaad ?

@philip-paul-mueller
Copy link
Collaborator Author

philip-paul-mueller commented Sep 25, 2024

Do you mean RefineNestedAccesse or PruneConnectors.
My comment about the commit was that now PC was behaving the old way, i.e. should an input connector be pruned or not.
Furthermore as far as I know PC is not run by default and GT4Py depends on it to make inline nested SDFG working.

@phschaad
Copy link
Collaborator

Prune connectors is one of the transformations that has repeatedly caused problems and I believe is not currently used because of it. Maybe we should deprecate or fix @phschaad ?

If PC is indeed the issue, I would vote for fixing since I understand that it still serves a purpose, but to be honest I am not very familiar with either PC or RNA..

@philip-paul-mueller
Copy link
Collaborator Author

@lukastruemper Has made some fixes to the transformation, he can probably give more insights.

github-merge-queue bot pushed a commit that referenced this issue Oct 23, 2024
During my work on the [new map
fusion](#1643) I discovered a bug in
`SDFGState._read_and_write_set()`.
Originally I solved it there, but it was decided to move it into its own
PR.


Lets look at the first, super silly example, that is not useful on its
own.
The main point here, is that the `data` attribute of the Memlet does not
refer to the source of the connection but of the destination.


![test_1](https://github.com/user-attachments/assets/740ee4fc-cfe5-4844-a999-e316cb8f9c16)


BTW: The Webviewer outputs something like `B[0] -> [0, 0]` however, the
parser of the Memlet constructor does not understand this, it must be
written as `B[0] -> 0, 0`, i.e. the second set of brackets must be
omitted, this should be changed!

From the above we would expect the following sets:
- Reads:
	- `A`: `[Range (0, 0)]`
- `B`: Should not be listed in this set, because it is fully read and
written, thus it is excluded.
- Writes
	- `B`: `[Range (0)]`
	- `C`: `[Range (0, 0), Range (1, 1)]`

However, the current implementation gives us:
- Reads: `{'A': [Range (0)], 'B': [Range (1, 1)]}`
- Write: `{'B': [Range (0)], 'C': [Range (1, 1), Range (0)]}`

The current behaviour is wrong because:
- `A` is a `2x2` array, thus the read set should also have two
dimensions.
- `B` inside the read set, it is a scalar, but the range has two
dimensions, furthermore, it is present at all.
- `C` the first member of the write set (`Range(1, 1)`) is correct,
while the second (`Range(0)`) is horrible wrong.


The second example is even more simple.


![test_2](https://github.com/user-attachments/assets/da3d03af-6f10-411f-952e-ab057ed057c6)


From the SDFG we expect the following sets:
- Reads:
	- `A`: `[Range(0, 0)]`
- Writes:
	- `B`: `[Range(0)]`

It is important that in the above example `other_subset` is `None` and
`data` is set to `A`, so it is not one of these "crazy" non standard
Memlets we have seen in the first test.
However, the current implementation gives us:
- Reads: `{'A': [Range (0, 0)]}`
- Writes: `{'B': [Range (0, 0)]}`

This clearly shows, that whatever the implementation does is not
correct.
github-merge-queue bot pushed a commit that referenced this issue Feb 27, 2025
This PR introduces a new and improved version of `MapFusion`.
A summary of the changes can also be found
[here](https://github.com/user-attachments/files/18516299/new_map_fusion_summary_of_changes.pdf),
it compares the resulting SDFGs generated by the old and new
transformation of some unit tests.

#### Fixed Bugs and removed Limitations
- The subsets (not the `.subset` member of the Memlet; I mean the
concept) of the new intermediate data descriptor were not computed
correctly in some cases, especially in presence of offsets. See the
`test_offset_correction_range_read()`,
`test_offset_correction_scalar_read()` and the
`test_offset_correction_empty()` tests.
- Upon the propagation of the subsets, due to the changed intermediate,
was not handled properly. Essentially, the transformation only updated
`.subset` and ignored `.other_subset`. Which is correct in most cases
but not always. See the `test_fusion_intrinsic_memlet_direction()` for
more.
- During the check if two maps could be fused the `.dynamic` property of
the Memelts were fully ignored leading to wrong code.
- The read-write conflict checks were refined, before all arrays needed
to be accessed the wrong way, i.e. before a fusion was rejected when one
map accessed `A[i, j]` and the other map was accessing `B[i + 1, j]`.
Now this is possible as long as every access is point wise. See the
`test_fusion_different_global_accesses()` test for an example.
- The shape of the reduced intermediate is cleaned, i.e. unnecessary
dimensions of size 1, are removed, except they were present in the
original shape. To make an example, the intermediate array, `T`, had
shape `(10, 1, 20)` and inside the map was accessed `T[__i, 0, __j]`,
then the old transformation would have created an reduced intermediate
of shape `(1, 1, 1)`, new its shape is `(1)`. Note if the intermediate
has shape `(10, 20)` instead and would be accessed as `T[__i, __j]` then
a `Scalar` would have been created. See also the `struct_dataflow` flag
below.

#### New Flags
- `only_toplevel_maps`: If `True` the transformation will only fuse maps
that are located at the top level, i.e. maps inside maps will not be
merged.
- `only_inner_maps`: If `True` then the transformation will only fuse
maps that are inside other maps.
- assume_always_shared`: If `True` then the transformation will assume
that every intermediate is shared, i.e. the referenced data is used
somewhere else in the SDFG and has to become an output of the fused
maps. This will create dead data flow, but avoids a scan of the full
SDFG.
- `strict_dataflow`: This flag is enabled by default. It has two
effects, first it will disable the cleaning of reduced intermediate
storage. The second effect is more important as it will preserve a much
stricter data flow. Most importantly, if the intermediate array is used
downstream (this is not limited to the case that the array is the output
of the second map) then the maps will not be fused together. This is
mostly to work around some other bugs in DaCe, where other
transformations failed to pink up the dependency. Note that the fused
map would be correct, the problem are other transformations.

#### `FullMapFusion`
This PR also introduced the `FullMapFusion` pass, which makes use of the
`FindSingleUseData` pass that was introduced in
[PR#1906](#1906).
The `FullMapFusion` applies MapFusion as long as possible, i.e. fuses
all maps that can be fused.
But instead of scanning the SDFG every time an intermediate node has to
be classified, i.e. can it be deleted or not, it is done once and then
reused which will speed up fusion process as it will remove the need to
traverse the full SDFG many times.
This new pass also replaced the direct application of MapFusion in
`auto_optimizer`.


#### References
Collection of known issues in other transformation:
- [RedundantArrayRemoval (auto
optimize)](#1644)
- [Bug in `RefineNestedAccess` and
`SDFGState._read_and_write_sets()](#1643)
- [Error in Composite Fusion (auto
optimize)](#1642)

---------

Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants