Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of GT4Py-DaCe bridge to expose all control-flow to Dace #53

Open
2 tasks
FlorianDeconinck opened this issue Jul 10, 2024 · 4 comments
Open
2 tasks
Assignees

Comments

@FlorianDeconinck
Copy link
Collaborator

The current bridge between gt4py.cartesian and DaCe uses a concept of expansion for stencils. This means that stencils, once they got through the entire GT4Py workflow, are pickled and shoved down a library node. Then the expansion mechanism turns those into a single map/tasklet per Horizontal Computation. This means that a lot of control-flow (mask, while loop, etc.) is hidden within from DaCe, leading to some optimization power being hidden.

We need to undo this and describe everything back to DaCe.

Expansion code lives around src/gt4py/cartesian/gtc/dace/expansion/*.py in GT4Py

Parent: GEOS-ESM/SMT-Nebulae#31

NB: This is a requirement for DaCe AD future features to kick in


  • Expose more code to DaCe by exposing the mask/while to the SDFG instead of keeping them in the Tasklet
  • ALTERNATIVELY: subtask further if the refactor work is bigger
@FlorianDeconinck
Copy link
Collaborator Author

Below is a small example showing the above problem

from gt4py.cartesian.gtscript import (
    computation,
    interval,
    PARALLEL,
)
from ndsl.boilerplate import get_factories_single_tile_orchestrated_cpu
from ndsl.constants import X_DIM, Y_DIM, Z_DIM
from ndsl.dsl.typing import FloatField
from ndsl import StencilFactory, orchestrate
import numpy as np

domain = (3, 3, 4)

stcil_fctry, ijk_qty_fctry = get_factories_single_tile_orchestrated_cpu(domain[0], domain[1], domain[2], 0)

def small_conditional(
    in_field: FloatField,
    out_field: FloatField
):
    with computation(PARALLEL), interval(...):
        if in_field > 5.0 and in_field < 20:
            out_field = in_field


class DaCeGT4Py_Bridge:
    def __init__(self, stencil_factory: StencilFactory):
        orchestrate(
            obj=self,
            config=stencil_factory.config.dace_config)
        self.stencil = stcil_fctry.from_dims_halo(
            func=small_conditional,
            compute_dims=[X_DIM, Y_DIM, Z_DIM],
        )

    def __call__(self, in_field: FloatField, out_field: FloatField):
        self.stencil(in_field, out_field)



if __name__ == "__main__":
    I = np.arange(domain[0]*domain[1]*domain[2], dtype=np.float64).reshape(domain)
    O = np.zeros(domain)

    bridge = DaCeGT4Py_Bridge(stcil_fctry)
    bridge(I, O)

    print(f"Input:\n{I}\n")
    print(f"Output:\n{O}\n")

This generate the following SDFG (under dace:cpu)

Image

where the tasklet code carries the conditional:

mask_140660478452496_gen_0: dace.bool_
mask_140660478452496_gen_0 = ((gtIN__in_field > dace.float64(5.0)) and (gtIN__in_field < dace.float64(dace.int64(20))))
if mask_140660478452496_gen_0:
    gtOUT__out_field = gtIN__in_field

We want this conditional to be extracted before the tasklet. See dace.sdfg.SDFG:add_loop code to see how to dynamically create an if condition using the guard mechanism

@FlorianDeconinck
Copy link
Collaborator Author

Known native construct that are folded inside the Tasklet instead of described to DaCe:

  • conditional
  • while loops

GT4Py-DaCe uses an "expansion" mechanism. The stencils are dace.library_nodes while the code is being parsed (orchestration). Then expansion of those nodes is triggered and unrolls the stencil into an SDFG compatible description. It's within this code that the decision to write all code has tasklet is finalized. Refactor might need to impact pre-expansion as well in the DaceIRBuilder

egparedes pushed a commit to GridTools/gt4py that referenced this issue Sep 4, 2024
`MaskInlining` will be a key feature of the improved gt4py / DaCe bridge
as envisioned in GEOS-ESM/NDSL#53, which will
propagate more control flow elements from gt4py to DaCe. We thus want to
make sure if `MaskInlining` itself is posing any problems. We tracked
the git history of this explicit exclude back to PR
#300, which not only added the
npir backend, but also refactored how pipeline stages are defined.
Previous to this PR optimization passes to be specified explicitly and
the PR changed this to an exclude list. It looks like the `MaskInlining`
pass was added after the list of optimization passes was formed and thus
excluded (to keep the list constant). From this history we thus don't
expect `MaskInlining` to break anything.

Co-authored-by: Roman Cattaneo <>
egparedes pushed a commit to GridTools/gt4py that referenced this issue Sep 11, 2024
…on and other cleanups (#1630)

Being new to the codebase and in preparation of the [GT4Py-DaCe bridge
refactor](GEOS-ESM/NDSL#53) I was reading a
lot of code. Some things that caught my eye are summarized in this PR.
Bigger changes include

1. Moving `daceir.py` into the `dace/` backend folder. This follows the
convention of all other backends who have their IR inside the backend
folder. (last commit)
2. Readability improvements in `gtir_to_oir.py` (the "Cleanup visit_*"
commits): In my opinion, there's no point in shortening "statement" to
"stmt". I wouldn't go as far as renaming the classes. For the renamed
local variables, I think this adds a lot to readability.
3. `visit_While` in `gtir_to_oir.py` was having an extra mask statement
around the `while` loop (in case a mask was defined. I inlined the
potential `mask` with the condition of `while` loop.
4. In `gtir_to_oir.py` don't translate every body statement into a
single `MaskStmt`. Instead, create one (or two incase of `else`)
`MaskStmt` with multiple statements in the `body` (which is already
typed as a list of statements).

Tested locally by running the test suite. All tests passing.

Co-authored-by: Roman Cattaneo <>
@romanc
Copy link

romanc commented Nov 5, 2024

Latest work is in https://github.com/romanc/gt4py/romanc/bridge-on-top-of-cleanups, which is re-based on top of the cleanups in PR GridTools/gt4py#1724 and should only contain changes necessary for the new bridge and no cleanups anymore.

The new bridge is currently pending on the following two DaCe issues:

Working around those two, we still (only sometimes) have an issue with writes (in k) inside a k-loop (i.e. with computation FORWARD / BACKWARD). To be investigated further at a later point.

FlorianDeconinck pushed a commit to GridTools/gt4py that referenced this issue Nov 15, 2024
## Description

This PR is split off the work for the new GT4Py - DaCe bridge, which
should allow to expose control flow statements (`if` and `while`) to
DaCe to better use DaCe's analytics capabilities. This PR is concerned
with adding type hints and generally improving code readability. Main
parts are

- `daceir_builder.py`: early returns and renamed variable
- `sdfg_builder.py`: type hints and early returns
- `tasklet_codegen.py`: type hints and early returns

`TaskletCodegen` was given `sdfg_ctx`, which wasn't used. That parameter
was thus removed.

Parent issue: GEOS-ESM/NDSL#53

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
  Assumed to be covered by existing tests.
- [ ] Important design decisions have been documented in the approriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

---------

Co-authored-by: Roman Cattaneo <>
@romanc
Copy link

romanc commented Dec 4, 2024

Latest work is still in https://github.com/romanc/gt4py/romanc/bridge-on-top-of-cleanups, which can be rebased on top of mainline gt4py once GridTools/gt4py#1752 to get a clearer picture.

Main ideas in that branch are the following

  1. Introduce oir.CodeBlock to (recursively) break down oir.HorizontalExecutions into smaller pieces, which are then evaluated in a dcir.Tasklet.
  2. Replace dcir.MaskStmt / dcir.While with dcir.Condition and dcir.WhileLoop (names aren't perfect and could be renamed again)
  3. Add support for if statements and while loops in the state machine of sdfg_builder, i.e. add_condition and add_while on SDFGContext together with visit_Condition and visit_WhileLoop in the visitor (for the new dcir nodes).
  4. Tasklets might define temporaries (scalars and/or tmp fields) in one tasklet and re-use them in others. We thus opt to "export" all temporaries and then "re-import" them again if needed. A DaCe pass called "DeadDataflowElimination" will take care of removing unnecessary "exports".
  5. Memlets will need to be generated per tasklet at a more fine grained level compared to now where memlets are calculated just once per horizontal execution. Commit GridTools/gt4py@19f59a3 outlines the necessary changes. One thing I forgot in this PR is the fact that both, Conditions and WhileLoops generate tasklets on the fly in sdfg_builder. They will also need to calculate their memlets at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants