Skip to content

Commit

Permalink
feature[next] GPU backend from Python (#1325)
Browse files Browse the repository at this point in the history
* add support for gtfn cuda backend

* reconcile new code with type hints by relaxing type hints.

* add ADR and todos for library x builsys matrix design

* add cpu/gpu tox variants for next, update ci files

---------

Co-authored-by: Rico Häuselmann <ricoh@cscs.ch>
  • Loading branch information
petiaccja and Rico Häuselmann authored Oct 24, 2023
1 parent d11246e commit af7ff8a
Show file tree
Hide file tree
Showing 36 changed files with 507 additions and 206 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/test-next.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ jobs:
run: |
pyversion=${{ matrix.python-version }}
pyversion_no_dot=${pyversion//./}
tox run -e next-py${pyversion_no_dot}-${{ matrix.tox-env-factor }}
# mv coverage.json coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}.json
tox run -e next-py${pyversion_no_dot}-${{ matrix.tox-env-factor }}-cpu
# mv coverage.json coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu.json
# - name: Upload coverage.json artifact
# uses: actions/upload-artifact@v3
# with:
# name: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}
# path: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}.json
# name: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu
# path: coverage-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu.json
# - name: Gather info
# run: |
# echo ${{ github.ref_type }} >> info.txt
Expand All @@ -76,5 +76,5 @@ jobs:
# - name: Upload info artifact
# uses: actions/upload-artifact@v3
# with:
# name: info-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}
# name: info-py${{ matrix.python-version }}-${{ matrix.os }}-${{ matrix.tox-env-factor }}-cpu
# path: info.txt
1 change: 1 addition & 0 deletions ci/cscs-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,4 @@ test py310:
- SUBPACKAGE: eve
- SUBPACKAGE: next
VARIANT: [-nomesh, -atlas]
SUBVARIANT: [-cuda11x, -cpu]
2 changes: 1 addition & 1 deletion docs/development/ADRs/0009-Compiled-Backend-Integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ Compiled backends may generate code which depends on libraries and tools written

1. can be installed with `pip` (from `PyPI` or another source) automatically.
2. can not be installed with `pip` and not commonly found on HPC machines.
3. libraries and tools which are left to the user to install and make discoverable: `pybind11`, C++ compilers
3. libraries and tools which are left to the user to install and make discoverable: `boost`, C++ compilers

Category 1 are made dependencies of `GT4Py`. Examples include `pybind11`, `cmake`, `ninja`.

Expand Down
118 changes: 118 additions & 0 deletions docs/development/ADRs/0016-Multiple-Backends-and-Build-Systems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
tags: [backend, gridtools, bindings, libraries, otf]
---

# Support for Multiple Backends, Build Systems and Libraries

- **Status**: valid
- **Authors**: Rico Häuselmann (@DropD)
- **Created**: 2023-10-11
- **Updated**: 2023-10-11

In the process of enabling CUDA for the GTFN backend, we encountered a potential support matrix of build systems x target language libraries. The current design requires build systems about all the libraries they can be used with. We decided that the matrix is too small for now and to not revisit the existing design yet.

## Context

ADRs [0009](0009-Compiled_Backend_Integration.md), [0011](0011-On_The_Fly_Compilation.md) and [0012](0012-GridTools_Cpp_OTF_Steps.md) detail the design decisions around what is loosely referred as "gt4py.next backends". In summary the goals are:

- extensibility
- adding backends should not require changing existing code
- adding / modifying backend modules like build systems / compilers should not be blocked by assumptions in other modules.
- modularity
- increase the chance that two different backends (for example GTFN and another C++ backend) can share code.

Therefore the concerns of generating code in the target language, generating python bindings in the target language and of building (compiling) the generated code are separated it code generator, bindings generator and compile step / build system. The compile step is written to be build system agnostic.

There is one category that connects all these concerns: libraries written in the target language and used in generated / bindings code.

Current design:

```mermaid
graph LR
gtgen("GTFN code generator (C++/Cuda)") --> |GridTools::fn_naive| Compiler
gtgen("GTFN code generator (C++/Cuda)") --> |GridTools::fn_gpu| Compiler
nb("nanobind bindings generator") --> |nanobind| Compiler
Compiler --> CMakeProject --> CMakeListsGenerator
Compiler --> CompiledbProject --> CMakeListsGenerator
```

The current design contains two mappings:

- library name -> CMake `find_package()` call
- library name -> CMake target name

and the gridtools cpu/gpu link targets are differentiated by internally separating between two fictitious "gridtools_cpu" and "gridtools_gpu" libraries.

## concerns

### Usage

The "gridtools_cpu" and "gridtools_gpu" fake library names add to the learning curve for this part of the code. Reuse of the existing components might require this knowledge.

### Scalability

Adding a new backend using the existing build systems but relying on different libraries has to modify existing build system components (at the very least CMakeListsGenerator).

### Separation of concerns

It makes more sense to separate the concerns of how to generate a valid build system configuration and how to use a particular library in a particular build system than to mix the two.

## Decision

Currently the code overhead is in the tens of lines, and there are no concrete plans to add more compiled backends or different build systems. Therefore we decide to keep the current design for now but to redesign as soon as the matrix grows.
To this end ToDo comments are added in the relevant places

## Consequences

Initial GTFN gpu support will not be blocked by design work.

## Alternatives Considered

### Push build system support to the LibraryDependency instance

```
#src/gt4py/next/otf/binding/interface.py
...
class LibraryDependency:
name: str
version: str
link_targets: list[str]
include_headers: list[str]
```

- Simple, choice is made at code generator level, where the knowledge should be
- Interface might not suit every build system
- Up to the implementer to make the logic for choosing reusable (or not)

### Create additional data structures to properly separate concerns

```
class BuildSystemConfig:
device_type: core_defs.DeviceType
...
class LibraryAdaptor:
library: LibraryDependency
build_system: CMakeProject
def config_phase(self, config: BuildSystemConfig) -> str:
import gridtools_cpp
cmake_dir = gridtools_cpp.get_cmake_dir()
return f"find_package(... {cmake_dir} ... )"
def build_phase(self, config: BuildSystemConfig) -> str:
return "" # header only library
def link_phase(self, main_target_name: str, config: BuildSystemConfig) -> str:
return f"target_link_libraries({main_target_name} ...)"
```

- More general and fully extensible, adaptors can be added for any required library / build system combination without touching existing code (depending on the registering mechanism).
- More likely to be reusable as choices are explicit and can be overridden separately by sub classing.
- More design work required. Open questions:
- Design the interface to work with any build system
- How to register adaptors? entry points? global dictionary?
8 changes: 4 additions & 4 deletions src/gt4py/next/iterator/embedded.py
Original file line number Diff line number Diff line change
Expand Up @@ -685,7 +685,7 @@ def _single_vertical_idx(
indices: NamedFieldIndices, column_axis: Tag, column_index: common.IntIndex
) -> NamedFieldIndices:
transformed = {
axis: (index if axis != column_axis else index.start + column_index) # type: ignore[union-attr] # trust me, `index` is range in case of `column_axis`
axis: (index if axis != column_axis else index.start + column_index) # type: ignore[union-attr] # trust me, `index` is range in case of `column_axis` # fmt: off
for axis, index in indices.items()
}
return transformed
Expand Down Expand Up @@ -1050,7 +1050,7 @@ def __gt_origin__(self) -> tuple[int, ...]:
return (0,)

@classmethod
def __gt_builtin_func__(func: Callable, /) -> NoReturn: # type: ignore[override] # Signature incompatible with supertype
def __gt_builtin_func__(func: Callable, /) -> NoReturn: # type: ignore[override] # Signature incompatible with supertype # fmt: off
raise NotImplementedError()

@property
Expand All @@ -1070,7 +1070,7 @@ def remap(self, index_field: common.Field) -> common.Field:
raise NotImplementedError()

def restrict(self, item: common.AnyIndexSpec) -> common.Field | core_defs.int32:
if common.is_absolute_index_sequence(item) and all(common.is_named_index(e) for e in item): # type: ignore[arg-type] # we don't want to pollute the typing of `is_absolute_index_sequence` for this temporary code
if common.is_absolute_index_sequence(item) and all(common.is_named_index(e) for e in item): # type: ignore[arg-type] # we don't want to pollute the typing of `is_absolute_index_sequence` for this temporary code # fmt: off
d, r = item[0]
assert d == self._dimension
assert isinstance(r, int)
Expand Down Expand Up @@ -1156,7 +1156,7 @@ def __gt_origin__(self) -> tuple[int, ...]:
return tuple()

@classmethod
def __gt_builtin_func__(func: Callable, /) -> NoReturn: # type: ignore[override] # Signature incompatible with supertype
def __gt_builtin_func__(func: Callable, /) -> NoReturn: # type: ignore[override] # Signature incompatible with supertype # fmt: off
raise NotImplementedError()

@property
Expand Down
18 changes: 9 additions & 9 deletions src/gt4py/next/otf/binding/nanobind.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

from __future__ import annotations

from typing import Any, Sequence, Union
from typing import Any, Sequence, TypeVar, Union

import gt4py.eve as eve
from gt4py.eve.codegen import JinjaTemplate as as_jinja, TemplatedGenerator
Expand All @@ -26,6 +26,9 @@
from gt4py.next.type_system import type_info as ti, type_specifications as ts


SrcL = TypeVar("SrcL", bound=languages.NanobindSrcL, covariant=True)


class Expr(eve.Node):
pass

Expand Down Expand Up @@ -191,8 +194,8 @@ def make_argument(name: str, type_: ts.TypeSpec) -> str | BufferSID | CompositeS


def create_bindings(
program_source: stages.ProgramSource[languages.Cpp, languages.LanguageWithHeaderFilesSettings],
) -> stages.BindingSource[languages.Cpp, languages.Python]:
program_source: stages.ProgramSource[SrcL, languages.LanguageWithHeaderFilesSettings],
) -> stages.BindingSource[SrcL, languages.Python]:
"""
Generate Python bindings through which a C++ function can be called.
Expand All @@ -201,7 +204,7 @@ def create_bindings(
program_source
The program source for which the bindings are created
"""
if program_source.language is not languages.Cpp:
if program_source.language not in [languages.Cpp, languages.Cuda]:
raise ValueError(
f"Can only create bindings for C++ program sources, received {program_source.language}."
)
Expand All @@ -221,7 +224,6 @@ def create_bindings(
"gridtools/common/tuple_util.hpp",
"gridtools/fn/unstructured.hpp",
"gridtools/fn/cartesian.hpp",
"gridtools/fn/backend/naive.hpp",
"gridtools/storage/adapter/nanobind_adapter.hpp",
],
wrapper=WrapperFunction(
Expand Down Expand Up @@ -266,8 +268,6 @@ def create_bindings(

@workflow.make_step
def bind_source(
inp: stages.ProgramSource[languages.Cpp, languages.LanguageWithHeaderFilesSettings],
) -> stages.CompilableSource[
languages.Cpp, languages.LanguageWithHeaderFilesSettings, languages.Python
]:
inp: stages.ProgramSource[SrcL, languages.LanguageWithHeaderFilesSettings],
) -> stages.CompilableSource[SrcL, languages.LanguageWithHeaderFilesSettings, languages.Python]:
return stages.CompilableSource(program_source=inp, binding_source=create_bindings(inp))
19 changes: 12 additions & 7 deletions src/gt4py/next/otf/compilation/build_systems/cmake.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def _generate_next_value_(name, start, count, last_values):
@dataclasses.dataclass
class CMakeFactory(
compiler.BuildSystemProjectGenerator[
languages.Cpp, languages.LanguageWithHeaderFilesSettings, languages.Python
languages.Cpp | languages.Cuda, languages.LanguageWithHeaderFilesSettings, languages.Python
]
):
"""Create a CMakeProject from a ``CompilableSource`` stage object with given CMake settings."""
Expand All @@ -50,7 +50,7 @@ class CMakeFactory(
def __call__(
self,
source: stages.CompilableSource[
languages.Cpp,
languages.Cpp | languages.Cuda,
languages.LanguageWithHeaderFilesSettings,
languages.Python,
],
Expand All @@ -63,16 +63,21 @@ def __call__(
name = source.program_source.entry_point.name
header_name = f"{name}.{source.program_source.language_settings.header_extension}"
bindings_name = f"{name}_bindings.{source.program_source.language_settings.file_extension}"
cmake_languages = [cmake_lists.Language(name="CXX")]
if source.program_source.language is languages.Cuda:
cmake_languages = [*cmake_languages, cmake_lists.Language(name="CUDA")]
cmake_lists_src = cmake_lists.generate_cmakelists_source(
name,
source.library_deps,
[header_name, bindings_name],
languages=cmake_languages,
)
return CMakeProject(
root_path=cache.get_cache_folder(source, cache_strategy),
source_files={
header_name: source.program_source.source_code,
bindings_name: source.binding_source.source_code,
"CMakeLists.txt": cmake_lists.generate_cmakelists_source(
name,
source.library_deps,
[header_name, bindings_name],
),
"CMakeLists.txt": cmake_lists_src,
},
program_name=name,
generator_name=self.cmake_generator_name,
Expand Down
35 changes: 25 additions & 10 deletions src/gt4py/next/otf/compilation/build_systems/cmake_lists.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,31 @@ class LinkDependency(eve.Node):
target: str


class Language(eve.Node):
name: str


class CMakeListsFile(eve.Node):
project_name: str
find_deps: Sequence[FindDependency]
link_deps: Sequence[LinkDependency]
source_names: Sequence[str]
bin_output_suffix: str
languages: Sequence[Language]


class CMakeListsGenerator(eve.codegen.TemplatedGenerator):
CMakeListsFile = as_jinja(
"""
project({{project_name}})
cmake_minimum_required(VERSION 3.20.0)
project({{project_name}})
# Languages
enable_language(CXX)
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES 60)
endif()
{{"\\n".join(languages)}}
# Paths
list(APPEND CMAKE_MODULE_PATH ${CMAKE_BINARY_DIR})
Expand Down Expand Up @@ -77,32 +86,34 @@ class CMakeListsGenerator(eve.codegen.TemplatedGenerator):
)

def visit_FindDependency(self, dep: FindDependency):
# TODO(ricoh): do not add more libraries here
# and do not use this design in a new build system.
# Instead, design this to be extensible (refer to ADR-0016).
match dep.name:
case "pybind11":
import pybind11

return f"find_package(pybind11 CONFIG REQUIRED PATHS {pybind11.get_cmake_dir()} NO_DEFAULT_PATH)"
case "nanobind":
import nanobind

py = "find_package(Python COMPONENTS Interpreter Development REQUIRED)"
nb = f"find_package(nanobind CONFIG REQUIRED PATHS {nanobind.cmake_dir()} NO_DEFAULT_PATHS)"
return py + "\n" + nb
case "gridtools":
case "gridtools_cpu" | "gridtools_gpu":
import gridtools_cpp

return f"find_package(GridTools REQUIRED PATHS {gridtools_cpp.get_cmake_dir()} NO_DEFAULT_PATH)"
case _:
raise ValueError("Library {name} is not supported".format(name=dep.name))

def visit_LinkDependency(self, dep: LinkDependency):
# TODO(ricoh): do not add more libraries here
# and do not use this design in a new build system.
# Instead, design this to be extensible (refer to ADR-0016).
match dep.name:
case "pybind11":
lib_name = "pybind11::module"
case "nanobind":
lib_name = "nanobind-static"
case "gridtools":
case "gridtools_cpu":
lib_name = "GridTools::fn_naive"
case "gridtools_gpu":
lib_name = "GridTools::fn_gpu"
case _:
raise ValueError("Library {name} is not supported".format(name=dep.name))

Expand All @@ -118,11 +129,14 @@ def visit_LinkDependency(self, dep: LinkDependency):
lnk = f"target_link_libraries({dep.target} PUBLIC {lib_name})"
return cfg + "\n" + lnk

Language = as_jinja("enable_language({{name}})")


def generate_cmakelists_source(
project_name: str,
dependencies: tuple[interface.LibraryDependency, ...],
source_names: Sequence[str],
languages: Sequence[Language] = (Language(name="CXX"),),
) -> str:
"""
Generate CMakeLists file contents.
Expand All @@ -135,5 +149,6 @@ def generate_cmakelists_source(
link_deps=[LinkDependency(name=d.name, target=project_name) for d in dependencies],
source_names=source_names,
bin_output_suffix=common.python_module_suffix(),
languages=languages,
)
return CMakeListsGenerator.apply(cmakelists_file)
Loading

0 comments on commit af7ff8a

Please sign in to comment.