feature[next] GPU backend from Python (#1325)

* add support for gtfn cuda backend * reconcile new code with type hints by relaxing type hints. * add ADR and todos for library x builsys matrix design * add cpu/gpu tox variants for next, update ci files --------- Co-authored-by: Rico Häuselmann <ricoh@cscs.ch>
GridTools · Oct 24, 2023 · af7ff8a · af7ff8a
1 parent d11246e
commit af7ff8a
Show file tree

Hide file tree

Showing 36 changed files with 507 additions and 206 deletions.
diff --git a/.github/workflows/test-next.yml b/.github/workflows/test-next.yml
@@ -57,13 +57,13 @@ jobs:
       run: |
         pyversion=${{ matrix.python-version }}
         pyversion_no_dot=${pyversion//./}
-        tox run -e next-py${pyversion_no_dot}-${{ matrix.tox-env-factor }}
-    #     mv coverage.json coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}.json
+        tox run -e next-py${pyversion_no_dot}-${{ matrix.tox-env-factor }}-cpu
+    #     mv coverage.json coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu.json
     # - name: Upload coverage.json artifact
     #   uses: actions/upload-artifact@v3
     #   with:
-    #     name: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}
-    #     path: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}.json
+    #     name: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu
+    #     path: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu.json
     # - name: Gather info
     #   run: |
     #     echo ${{ github.ref_type }} >> info.txt
@@ -76,5 +76,5 @@ jobs:
     # - name: Upload info artifact
     #   uses: actions/upload-artifact@v3
     #   with:
-    #     name: info-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}
+    #     name: info-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu
     #     path: info.txt
diff --git a/ci/cscs-ci.yml b/ci/cscs-ci.yml
@@ -117,3 +117,4 @@ test py310:
     - SUBPACKAGE: eve
     - SUBPACKAGE: next
       VARIANT: [-nomesh, -atlas]
+      SUBVARIANT: [-cuda11x, -cpu]
diff --git a/docs/development/ADRs/0009-Compiled-Backend-Integration.md b/docs/development/ADRs/0009-Compiled-Backend-Integration.md
@@ -159,7 +159,7 @@ Compiled backends may generate code which depends on libraries and tools written
 
 1. can be installed with `pip` (from `PyPI` or another source) automatically.
 2. can not be installed with `pip` and not commonly found on HPC machines.
-3. libraries and tools which are left to the user to install and make discoverable: `pybind11`, C++ compilers
+3. libraries and tools which are left to the user to install and make discoverable: `boost`, C++ compilers
 
 Category 1 are made dependencies of `GT4Py`. Examples include `pybind11`, `cmake`, `ninja`.
 

diff --git a/docs/development/ADRs/0016-Multiple-Backends-and-Build-Systems.md b/docs/development/ADRs/0016-Multiple-Backends-and-Build-Systems.md
@@ -0,0 +1,118 @@
+---
+tags: [backend, gridtools, bindings, libraries, otf]
+---
+
+# Support for Multiple Backends, Build Systems and Libraries
+
+- **Status**: valid
+- **Authors**: Rico Häuselmann (@DropD)
+- **Created**: 2023-10-11
+- **Updated**: 2023-10-11
+
+In the process of enabling CUDA for the GTFN backend, we encountered a potential support matrix of build systems x target language libraries. The current design requires build systems about all the libraries they can be used with. We decided that the matrix is too small for now and to not revisit the existing design yet.
+
+## Context
+
+ADRs [0009](0009-Compiled_Backend_Integration.md), [0011](0011-On_The_Fly_Compilation.md) and [0012](0012-GridTools_Cpp_OTF_Steps.md) detail the design decisions around what is loosely referred as "gt4py.next backends". In summary the goals are:
+
+- extensibility
+  - adding backends should not require changing existing code
+  - adding / modifying backend modules like build systems / compilers should not be blocked by assumptions in other modules.
+- modularity
+  - increase the chance that two different backends (for example GTFN and another C++ backend) can share code.
+
+Therefore the concerns of generating code in the target language, generating python bindings in the target language and of building (compiling) the generated code are separated it code generator, bindings generator and compile step / build system. The compile step is written to be build system agnostic.
+
+There is one category that connects all these concerns: libraries written in the target language and used in generated / bindings code.
+
+Current design:
+
+```mermaid
+graph LR
+
+gtgen("GTFN code generator (C++/Cuda)") --> |GridTools::fn_naive| Compiler
+gtgen("GTFN code generator (C++/Cuda)") --> |GridTools::fn_gpu| Compiler
+nb("nanobind bindings generator") --> |nanobind| Compiler
+Compiler --> CMakeProject --> CMakeListsGenerator
+Compiler --> CompiledbProject --> CMakeListsGenerator
+```
+
+The current design contains two mappings:
+
+- library name -> CMake `find_package()` call
+- library name -> CMake target name
+
+and the gridtools cpu/gpu link targets are differentiated by internally separating between two fictitious "gridtools_cpu" and "gridtools_gpu" libraries.
+
+## concerns
+
+### Usage
+
+The "gridtools_cpu" and "gridtools_gpu" fake library names add to the learning curve for this part of the code. Reuse of the existing components might require this knowledge.
+
+### Scalability
+
+Adding a new backend using the existing build systems but relying on different libraries has to modify existing build system components (at the very least CMakeListsGenerator).
+
+### Separation of concerns
+
+It makes more sense to separate the concerns of how to generate a valid build system configuration and how to use a particular library in a particular build system than to mix the two.
+
+## Decision
+
+Currently the code overhead is in the tens of lines, and there are no concrete plans to add more compiled backends or different build systems. Therefore we decide to keep the current design for now but to redesign as soon as the matrix grows.
+To this end ToDo comments are added in the relevant places
+
+## Consequences
+
+Initial GTFN gpu support will not be blocked by design work.
+
+## Alternatives Considered
+
+### Push build system support to the LibraryDependency instance
+
+```
+#src/gt4py/next/otf/binding/interface.py
+
+...
+class LibraryDependency:
+    name: str
+    version: str
+    link_targets: list[str]
+    include_headers: list[str]
+```
+
+- Simple, choice is made at code generator level, where the knowledge should be
+- Interface might not suit every build system
+- Up to the implementer to make the logic for choosing reusable (or not)
+
+### Create additional data structures to properly separate concerns
+
+```
+class BuildSystemConfig:
+    device_type: core_defs.DeviceType
+    ...
+
+
+class LibraryAdaptor:
+    library: LibraryDependency
+    build_system: CMakeProject
+
+    def config_phase(self, config: BuildSystemConfig) -> str:
+        import gridtools_cpp
+        cmake_dir = gridtools_cpp.get_cmake_dir()
+
+        return f"find_package(... {cmake_dir} ... )"
+
+def build_phase(self, config: BuildSystemConfig) -> str:
+        return "" # header only library
+
+def link_phase(self, main_target_name: str, config: BuildSystemConfig) -> str:
+        return f"target_link_libraries({main_target_name} ...)"
+```
+
+- More general and fully extensible, adaptors can be added for any required library / build system combination without touching existing code (depending on the registering mechanism).
+- More likely to be reusable as choices are explicit and can be overridden separately by sub classing.
+- More design work required. Open questions:
+  - Design the interface to work with any build system
+  - How to register adaptors? entry points? global dictionary?
diff --git a/src/gt4py/next/iterator/embedded.py b/src/gt4py/next/iterator/embedded.py
@@ -685,7 +685,7 @@ def _single_vertical_idx(
     indices: NamedFieldIndices, column_axis: Tag, column_index: common.IntIndex
 ) -> NamedFieldIndices:
     transformed = {
-        axis: (index if axis != column_axis else index.start + column_index)  # type: ignore[union-attr] # trust me, `index` is range in case of `column_axis`
+        axis: (index if axis != column_axis else index.start + column_index)  # type: ignore[union-attr] # trust me, `index` is range in case of `column_axis` # fmt: off
         for axis, index in indices.items()
     }
     return transformed
@@ -1050,7 +1050,7 @@ def __gt_origin__(self) -> tuple[int, ...]:
         return (0,)
 
     @classmethod
-    def __gt_builtin_func__(func: Callable, /) -> NoReturn:  # type: ignore[override] # Signature incompatible with supertype
+    def __gt_builtin_func__(func: Callable, /) -> NoReturn:  # type: ignore[override] # Signature incompatible with supertype # fmt: off
         raise NotImplementedError()
 
     @property
@@ -1070,7 +1070,7 @@ def remap(self, index_field: common.Field) -> common.Field:
         raise NotImplementedError()
 
     def restrict(self, item: common.AnyIndexSpec) -> common.Field | core_defs.int32:
-        if common.is_absolute_index_sequence(item) and all(common.is_named_index(e) for e in item):  # type: ignore[arg-type] # we don't want to pollute the typing of `is_absolute_index_sequence` for this temporary code
+        if common.is_absolute_index_sequence(item) and all(common.is_named_index(e) for e in item):  # type: ignore[arg-type] # we don't want to pollute the typing of `is_absolute_index_sequence` for this temporary code # fmt: off
             d, r = item[0]
             assert d == self._dimension
             assert isinstance(r, int)
@@ -1156,7 +1156,7 @@ def __gt_origin__(self) -> tuple[int, ...]:
         return tuple()
 
     @classmethod
-    def __gt_builtin_func__(func: Callable, /) -> NoReturn:  # type: ignore[override] # Signature incompatible with supertype
+    def __gt_builtin_func__(func: Callable, /) -> NoReturn:  # type: ignore[override] # Signature incompatible with supertype # fmt: off
         raise NotImplementedError()
 
     @property

diff --git a/src/gt4py/next/otf/binding/nanobind.py b/src/gt4py/next/otf/binding/nanobind.py
@@ -17,7 +17,7 @@
 
 from __future__ import annotations
 
-from typing import Any, Sequence, Union
+from typing import Any, Sequence, TypeVar, Union
 
 import gt4py.eve as eve
 from gt4py.eve.codegen import JinjaTemplate as as_jinja, TemplatedGenerator
@@ -26,6 +26,9 @@
 from gt4py.next.type_system import type_info as ti, type_specifications as ts
 
 
+SrcL = TypeVar("SrcL", bound=languages.NanobindSrcL, covariant=True)
+
+
 class Expr(eve.Node):
     pass
 
@@ -191,8 +194,8 @@ def make_argument(name: str, type_: ts.TypeSpec) -> str | BufferSID | CompositeS
 
 
 def create_bindings(
-    program_source: stages.ProgramSource[languages.Cpp, languages.LanguageWithHeaderFilesSettings],
-) -> stages.BindingSource[languages.Cpp, languages.Python]:
+    program_source: stages.ProgramSource[SrcL, languages.LanguageWithHeaderFilesSettings],
+) -> stages.BindingSource[SrcL, languages.Python]:
     """
     Generate Python bindings through which a C++ function can be called.
 
@@ -201,7 +204,7 @@ def create_bindings(
     program_source
         The program source for which the bindings are created
     """
-    if program_source.language is not languages.Cpp:
+    if program_source.language not in [languages.Cpp, languages.Cuda]:
         raise ValueError(
             f"Can only create bindings for C++ program sources, received {program_source.language}."
         )
@@ -221,7 +224,6 @@ def create_bindings(
             "gridtools/common/tuple_util.hpp",
             "gridtools/fn/unstructured.hpp",
             "gridtools/fn/cartesian.hpp",
-            "gridtools/fn/backend/naive.hpp",
             "gridtools/storage/adapter/nanobind_adapter.hpp",
         ],
         wrapper=WrapperFunction(
@@ -266,8 +268,6 @@ def create_bindings(
 
 @workflow.make_step
 def bind_source(
-    inp: stages.ProgramSource[languages.Cpp, languages.LanguageWithHeaderFilesSettings],
-) -> stages.CompilableSource[
-    languages.Cpp, languages.LanguageWithHeaderFilesSettings, languages.Python
-]:
+    inp: stages.ProgramSource[SrcL, languages.LanguageWithHeaderFilesSettings],
+) -> stages.CompilableSource[SrcL, languages.LanguageWithHeaderFilesSettings, languages.Python]:
     return stages.CompilableSource(program_source=inp, binding_source=create_bindings(inp))
diff --git a/src/gt4py/next/otf/compilation/build_systems/cmake.py b/src/gt4py/next/otf/compilation/build_systems/cmake.py
@@ -38,7 +38,7 @@ def _generate_next_value_(name, start, count, last_values):
 @dataclasses.dataclass
 class CMakeFactory(
     compiler.BuildSystemProjectGenerator[
-        languages.Cpp, languages.LanguageWithHeaderFilesSettings, languages.Python
+        languages.Cpp | languages.Cuda, languages.LanguageWithHeaderFilesSettings, languages.Python
     ]
 ):
     """Create a CMakeProject from a ``CompilableSource`` stage object with given CMake settings."""
@@ -50,7 +50,7 @@ class CMakeFactory(
     def __call__(
         self,
         source: stages.CompilableSource[
-            languages.Cpp,
+            languages.Cpp | languages.Cuda,
             languages.LanguageWithHeaderFilesSettings,
             languages.Python,
         ],
@@ -63,16 +63,21 @@ def __call__(
         name = source.program_source.entry_point.name
         header_name = f"{name}.{source.program_source.language_settings.header_extension}"
         bindings_name = f"{name}_bindings.{source.program_source.language_settings.file_extension}"
+        cmake_languages = [cmake_lists.Language(name="CXX")]
+        if source.program_source.language is languages.Cuda:
+            cmake_languages = [*cmake_languages, cmake_lists.Language(name="CUDA")]
+        cmake_lists_src = cmake_lists.generate_cmakelists_source(
+            name,
+            source.library_deps,
+            [header_name, bindings_name],
+            languages=cmake_languages,
+        )
         return CMakeProject(
             root_path=cache.get_cache_folder(source, cache_strategy),
             source_files={
                 header_name: source.program_source.source_code,
                 bindings_name: source.binding_source.source_code,
-                "CMakeLists.txt": cmake_lists.generate_cmakelists_source(
-                    name,
-                    source.library_deps,
-                    [header_name, bindings_name],
-                ),
+                "CMakeLists.txt": cmake_lists_src,
             },
             program_name=name,
             generator_name=self.cmake_generator_name,

diff --git a/src/gt4py/next/otf/compilation/build_systems/cmake_lists.py b/src/gt4py/next/otf/compilation/build_systems/cmake_lists.py
@@ -30,22 +30,31 @@ class LinkDependency(eve.Node):
     target: str
 
 
+class Language(eve.Node):
+    name: str
+
+
 class CMakeListsFile(eve.Node):
     project_name: str
     find_deps: Sequence[FindDependency]
     link_deps: Sequence[LinkDependency]
     source_names: Sequence[str]
     bin_output_suffix: str
+    languages: Sequence[Language]
 
 
 class CMakeListsGenerator(eve.codegen.TemplatedGenerator):
     CMakeListsFile = as_jinja(
         """
-        project({{project_name}})
         cmake_minimum_required(VERSION 3.20.0)
 
+        project({{project_name}})
+
         # Languages
-        enable_language(CXX)
+        if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
+            set(CMAKE_CUDA_ARCHITECTURES 60)
+        endif()
+        {{"\\n".join(languages)}}
 
         # Paths
         list(APPEND CMAKE_MODULE_PATH ${CMAKE_BINARY_DIR})
@@ -77,32 +86,34 @@ class CMakeListsGenerator(eve.codegen.TemplatedGenerator):
     )
 
     def visit_FindDependency(self, dep: FindDependency):
+        # TODO(ricoh): do not add more libraries here
+        #   and do not use this design in a new build system.
+        #   Instead, design this to be extensible (refer to ADR-0016).
         match dep.name:
-            case "pybind11":
-                import pybind11
-
-                return f"find_package(pybind11 CONFIG REQUIRED PATHS {pybind11.get_cmake_dir()} NO_DEFAULT_PATH)"
             case "nanobind":
                 import nanobind
 
                 py = "find_package(Python COMPONENTS Interpreter Development REQUIRED)"
                 nb = f"find_package(nanobind CONFIG REQUIRED PATHS {nanobind.cmake_dir()} NO_DEFAULT_PATHS)"
                 return py + "\n" + nb
-            case "gridtools":
+            case "gridtools_cpu" | "gridtools_gpu":
                 import gridtools_cpp
 
                 return f"find_package(GridTools REQUIRED PATHS {gridtools_cpp.get_cmake_dir()} NO_DEFAULT_PATH)"
             case _:
                 raise ValueError("Library {name} is not supported".format(name=dep.name))
 
     def visit_LinkDependency(self, dep: LinkDependency):
+        # TODO(ricoh): do not add more libraries here
+        #   and do not use this design in a new build system.
+        #   Instead, design this to be extensible (refer to ADR-0016).
         match dep.name:
-            case "pybind11":
-                lib_name = "pybind11::module"
             case "nanobind":
                 lib_name = "nanobind-static"
-            case "gridtools":
+            case "gridtools_cpu":
                 lib_name = "GridTools::fn_naive"
+            case "gridtools_gpu":
+                lib_name = "GridTools::fn_gpu"
             case _:
                 raise ValueError("Library {name} is not supported".format(name=dep.name))
 
@@ -118,11 +129,14 @@ def visit_LinkDependency(self, dep: LinkDependency):
         lnk = f"target_link_libraries({dep.target} PUBLIC {lib_name})"
         return cfg + "\n" + lnk
 
+    Language = as_jinja("enable_language({{name}})")
+
 
 def generate_cmakelists_source(
     project_name: str,
     dependencies: tuple[interface.LibraryDependency, ...],
     source_names: Sequence[str],
+    languages: Sequence[Language] = (Language(name="CXX"),),
 ) -> str:
     """
     Generate CMakeLists file contents.
@@ -135,5 +149,6 @@ def generate_cmakelists_source(
         link_deps=[LinkDependency(name=d.name, target=project_name) for d in dependencies],
         source_names=source_names,
         bin_output_suffix=common.python_module_suffix(),
+        languages=languages,
     )
     return CMakeListsGenerator.apply(cmakelists_file)