Move GPU ukernel selection to KernelConfig #19440

bjacob · 2024-12-10T18:31:41Z

This moves the logic deciding whether an op should be a ukernel out of the GPULowerToUKernels pass, into KernelConfig.

So KernelConfig decides whether the op should be a ukernel, and encodes that into the resulting lowering_config, in a new parameter, that is a new attribute, UKernelSpecAttr. That attribute is directly modeled after the equivalent C++ data structure that we have had in LowerToUKernels passes, FnNameAndDefAttrs, which it replaces. If the attribute is present, it means that the op was selected for ukernel lowering, with the fields telling the ukernel name and some function definition attributes (to import any dependencies, such as the rocm module for runtime support symbols).

All the details about supplying the ukernel bitcode in a hal.executable.object are also moved there, becoming a side effect of KernelConfig.

The GPULowerToUKernels becomes much simpler, since all the decision-making was already done for it. It just looks at the LoweringConfigAttr and if it's there, it performs the requested lowering.

The motivation for this split is that we need to know in KernelConfig whether it's going to be a ukernel, because ops that will get lowered to a ukernel require a different configuration. The important example for us is multi_mma, which in the ukernel case needs to avoid reduction-dimension tiling to 1 so that the ukernel gets to see the reduction loop.

A few simplifications arise already in the current argmax ukernel logic, confirming that this was the right design choice: the old ukernel's matching logic was checking that the distribution tile sizes matched what the ukernel could handle; now that is turned upside down: the ukernel matching happens as a helper within KernelConfig where we know we are setting the appropriate tile sizes on purpose.

Another nice improvement is that this puts just enough distance between ukernel selection (which creates the hal.executable.object) and ukernel lowering, that we are able to insert HoistExecutableObjectsPass in between, simplifying the ukernel lowering as it doesn't need to worry anymore about preserving the hal.executable.object.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

kuhar

The serialization/lookup logic looks good to me.

Why do we use iree_codegen.lowering_config instead of iree_gpu.lowering_config?

compiler/plugins/target/ROCM/test/config_ukernel_argmax_gfx942.mlir

compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_lower_to_ukernels.mlir

compiler/src/iree/compiler/Codegen/Dialect/Codegen/IR/IREECodegenAttrs.cpp

compiler/src/iree/compiler/Codegen/Dialect/Codegen/IR/IREECodegenAttrs.h

compiler/src/iree/compiler/Codegen/LLVMGPU/Utils/LLVMGPUSelectUKernels.cpp

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

bjacob · 2024-12-14T03:11:14Z

The serialization/lookup logic looks good to me.

Why do we use iree_codegen.lowering_config instead of iree_gpu.lowering_config?

good idea, done.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

kuhar

LGTM with a few suggestions

compiler/src/iree/compiler/Codegen/Common/GPU/GPULowerToUKernels.cpp

kuhar · 2024-12-14T05:27:31Z

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

+  MLIRContext *context = op->getContext();
+  Builder b(context);
+  SmallVector<NamedAttribute, 2> attrs;
+  attrs.emplace_back(StringAttr::get(context, "workgroup"),


IIRC, we have a helper that map tiling level enums to the exact string. This should be make future refactoring easier.

That could be nice, but I'm tempted to leave this out (I couldn't immediately find the helper, and the VPN is disconnecting every minute...)

kuhar · 2024-12-14T05:29:11Z

compiler/src/iree/compiler/Codegen/LLVMGPU/Utils/LLVMGPUSelectUKernels.cpp

+
+namespace {
+
+static StringLiteral executableObjectsAttrName = "hal.executable.objects";


Suggested change

static StringLiteral executableObjectsAttrName = "hal.executable.objects";

constexpr StringLiteral executableObjectsAttrName = "hal.executable.objects";

constexpr in global scope already implies static

Done.

Note, I realized that it is nontrivial whether StringLiteral is a good thing to use here. As this constructor takes the address of a string literal, it could in principle cause a relocation, or whatever is responsible, at DSO load time, of computing the actual pointer value, as in https://glandium.org/blog/?p=2361 . I suppose that either static or constexpr allows the compiler to reason that the string doesn't escape, making this a non-issue when all the uses are evaluated at compile-time. But here, it's not clear that they are, this depends on the functions that we call, all getting inlined.

It should be possible to design a StringLiteral class that doesn't have this problem. It's just that it needs to store the const char[N] by value, and be templated in N, instead of storing a (inheriting from) StringRef.

I thought that static constexpr always meant no dynamic initialization?

This is very much the intended usecase for StringLiteral -- constant names that eventually get passed as StringRef (without the need of ever calling strlen).

Whenever address-of is "constexpr", there is, one way or another, a hidden caveat. It simply isn't possible to tell the address at compile time. What can be known at compile time is the offset at which something is allocated relatively to the start of a segment, but the segment base address won't be known until the DSO is loaded. So at runtime, once that is known, the base+offset addition can be computed, and its result written as the actual initializer for such pointer constants. IIUC that is what toolchain experts call a "relocation", but I am talking out of my depth here, so please feel free to point out any misunderstanding.

Avoiding strlen only matters if the strlen can't be constant evaluated. When the string is a char char [], it can be const evaluated.

I looked up the Phabricator review that introduced StringLiteral. The stated motivation is to avoid constructors (ie strlen) in arrays of string literals. That almost makes sense: the idea seems to be unless they all have the same length, strings in a plain array of string literals can't be stored by value, so one is led to storing an array of char pointers, and then strlen can't constant evaluate anymore. What I think that is missing, is that one could simply have an array of StringRef! At least in sufficiently recent compilers, the usual StringRef constructor would consteval the strlen. It's possible that 8 years ago when StringLiteral was introduced, it didn't. I think StringLiteral is just a historical workaround for that. Instead of subclassing StringRef with a template constructor, it should just be a template constructor on StringRef. And then on modern compilers we would see that it doesn't matter anymore which constructor is used.

Anyway, the rationale for StingLiteral is specific to arrays of StringLiteral. It isn't applying here, as we have a single string.

Also, an array of StringLiteral only solves the constructors problem but leaves the (less severe) relocation problem. A solution for arrays of string literals that would not have relocations would be a helper to store the string literals as (constexpr evaluated) concatenated string plus a "TOC" array of offsets.

I tried all possible combinations of storage class and type for this executableObjectsAttrName,

{static, constexpr, none} x {StringLiteral, StringRef, const char[], const char*}

and that makes no difference to the number of relocations in libIREECompiler.so. Probably because regardless of storage class, the uses within the same translation unit get rewritten to just the initializer value. It's UB anyway to mutate a static-duration const object, so the rewrite is legal.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob force-pushed the select-ukernels branch from 84a939a to 510fbd5 Compare December 12, 2024 21:27

bjacob changed the title ~~GPUSelectUKernelsPass~~ Move GPU ukernel selection to KernelConfig Dec 12, 2024

bjacob marked this pull request as ready for review December 12, 2024 21:55

bjacob requested review from kuhar, MaheshRavishankar, qedawkins, Groverkss and antiagainst as code owners December 12, 2024 21:55

hanhanW self-requested a review December 13, 2024 03:07

bjacob force-pushed the select-ukernels branch from 510fbd5 to a6e5117 Compare December 13, 2024 03:36

select-ukernels

906aada

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob force-pushed the select-ukernels branch from a6e5117 to 906aada Compare December 13, 2024 03:58

kuhar reviewed Dec 13, 2024

View reviewed changes

review-comments

61b42fc

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob requested a review from kuhar December 14, 2024 03:22

kuhar approved these changes Dec 14, 2024

View reviewed changes

review-comments

fb16c6e

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>

bjacob merged commit dc29ee7 into iree-org:main Dec 15, 2024
39 checks passed

bjacob mentioned this pull request Dec 19, 2024

Use of llvm::StringLiteral for string literals is a cause of relocations #19529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move GPU ukernel selection to KernelConfig #19440

Move GPU ukernel selection to KernelConfig #19440

bjacob commented Dec 10, 2024 •

edited

Loading

kuhar left a comment •

edited

Loading

bjacob commented Dec 14, 2024

kuhar left a comment

kuhar Dec 14, 2024

bjacob Dec 14, 2024

kuhar Dec 14, 2024

bjacob Dec 14, 2024

bjacob Dec 14, 2024

kuhar Dec 14, 2024

kuhar Dec 14, 2024

bjacob Dec 14, 2024 •

edited

Loading

bjacob Dec 15, 2024 •

edited

Loading


		namespace {

		static StringLiteral executableObjectsAttrName = "hal.executable.objects";

	static StringLiteral executableObjectsAttrName = "hal.executable.objects";
	constexpr StringLiteral executableObjectsAttrName = "hal.executable.objects";

Move GPU ukernel selection to KernelConfig #19440

Move GPU ukernel selection to KernelConfig #19440

Conversation

bjacob commented Dec 10, 2024 • edited Loading

kuhar left a comment • edited Loading

Choose a reason for hiding this comment

bjacob commented Dec 14, 2024

kuhar left a comment

Choose a reason for hiding this comment

kuhar Dec 14, 2024

Choose a reason for hiding this comment

bjacob Dec 14, 2024

Choose a reason for hiding this comment

kuhar Dec 14, 2024

Choose a reason for hiding this comment

bjacob Dec 14, 2024

Choose a reason for hiding this comment

bjacob Dec 14, 2024

Choose a reason for hiding this comment

kuhar Dec 14, 2024

Choose a reason for hiding this comment

kuhar Dec 14, 2024

Choose a reason for hiding this comment

bjacob Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

bjacob Dec 15, 2024 • edited Loading

Choose a reason for hiding this comment

bjacob commented Dec 10, 2024 •

edited

Loading

kuhar left a comment •

edited

Loading

bjacob Dec 14, 2024 •

edited

Loading

bjacob Dec 15, 2024 •

edited

Loading