-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move GPU ukernel selection to KernelConfig #19440
Conversation
84a939a
to
510fbd5
Compare
510fbd5
to
a6e5117
Compare
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
a6e5117
to
906aada
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialization/lookup logic looks good to me.
Why do we use iree_codegen.lowering_config
instead of iree_gpu.lowering_config
?
compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_lower_to_ukernels.mlir
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_lower_to_ukernels.mlir
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/Codegen/IR/IREECodegenAttrs.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Dialect/Codegen/IR/IREECodegenAttrs.h
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/LLVMGPU/Utils/LLVMGPUSelectUKernels.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/LLVMGPU/Utils/LLVMGPUSelectUKernels.cpp
Outdated
Show resolved
Hide resolved
good idea, done. |
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few suggestions
MLIRContext *context = op->getContext(); | ||
Builder b(context); | ||
SmallVector<NamedAttribute, 2> attrs; | ||
attrs.emplace_back(StringAttr::get(context, "workgroup"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we have a helper that map tiling level enums to the exact string. This should be make future refactoring easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be nice, but I'm tempted to leave this out (I couldn't immediately find the helper, and the VPN is disconnecting every minute...)
|
||
namespace { | ||
|
||
static StringLiteral executableObjectsAttrName = "hal.executable.objects"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static StringLiteral executableObjectsAttrName = "hal.executable.objects"; | |
constexpr StringLiteral executableObjectsAttrName = "hal.executable.objects"; |
constexpr in global scope already implies static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Note, I realized that it is nontrivial whether StringLiteral is a good thing to use here. As this constructor takes the address of a string literal, it could in principle cause a relocation, or whatever is responsible, at DSO load time, of computing the actual pointer value, as in https://glandium.org/blog/?p=2361 . I suppose that either static
or constexpr
allows the compiler to reason that the string doesn't escape, making this a non-issue when all the uses are evaluated at compile-time. But here, it's not clear that they are, this depends on the functions that we call, all getting inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to design a StringLiteral
class that doesn't have this problem. It's just that it needs to store the const char[N]
by value, and be templated in N, instead of storing a (inheriting from) StringRef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that static constexpr
always meant no dynamic initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very much the intended usecase for StringLiteral -- constant names that eventually get passed as StringRef (without the need of ever calling strlen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever address-of is "constexpr", there is, one way or another, a hidden caveat. It simply isn't possible to tell the address at compile time. What can be known at compile time is the offset at which something is allocated relatively to the start of a segment, but the segment base address won't be known until the DSO is loaded. So at runtime, once that is known, the base+offset addition can be computed, and its result written as the actual initializer for such pointer constants. IIUC that is what toolchain experts call a "relocation", but I am talking out of my depth here, so please feel free to point out any misunderstanding.
Avoiding strlen only matters if the strlen can't be constant evaluated. When the string is a char char [], it can be const evaluated.
I looked up the Phabricator review that introduced StringLiteral. The stated motivation is to avoid constructors (ie strlen) in arrays of string literals. That almost makes sense: the idea seems to be unless they all have the same length, strings in a plain array of string literals can't be stored by value, so one is led to storing an array of char pointers, and then strlen can't constant evaluate anymore. What I think that is missing, is that one could simply have an array of StringRef! At least in sufficiently recent compilers, the usual StringRef constructor would consteval the strlen. It's possible that 8 years ago when StringLiteral was introduced, it didn't. I think StringLiteral is just a historical workaround for that. Instead of subclassing StringRef with a template constructor, it should just be a template constructor on StringRef. And then on modern compilers we would see that it doesn't matter anymore which constructor is used.
Anyway, the rationale for StingLiteral is specific to arrays of StringLiteral. It isn't applying here, as we have a single string.
Also, an array of StringLiteral only solves the constructors problem but leaves the (less severe) relocation problem. A solution for arrays of string literals that would not have relocations would be a helper to store the string literals as (constexpr evaluated) concatenated string plus a "TOC" array of offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried all possible combinations of storage class and type for this executableObjectsAttrName
,
{static, constexpr, none} x {StringLiteral, StringRef, const char[], const char*}
and that makes no difference to the number of relocations in libIREECompiler.so
. Probably because regardless of storage class, the uses within the same translation unit get rewritten to just the initializer value. It's UB anyway to mutate a static-duration const object, so the rewrite is legal.
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
This moves the logic deciding whether an op should be a ukernel out of the GPULowerToUKernels pass, into KernelConfig.
So KernelConfig decides whether the op should be a ukernel, and encodes that into the resulting
lowering_config
, in a new parameter, that is a new attribute, UKernelSpecAttr. That attribute is directly modeled after the equivalent C++ data structure that we have had in LowerToUKernels passes,FnNameAndDefAttrs
, which it replaces. If the attribute is present, it means that the op was selected for ukernel lowering, with the fields telling the ukernel name and some function definition attributes (to import any dependencies, such as therocm
module for runtime support symbols).All the details about supplying the ukernel bitcode in a
hal.executable.object
are also moved there, becoming a side effect ofKernelConfig
.The GPULowerToUKernels becomes much simpler, since all the decision-making was already done for it. It just looks at the
LoweringConfigAttr
and if it's there, it performs the requested lowering.The motivation for this split is that we need to know in KernelConfig whether it's going to be a ukernel, because ops that will get lowered to a ukernel require a different configuration. The important example for us is
multi_mma
, which in the ukernel case needs to avoid reduction-dimension tiling to 1 so that the ukernel gets to see the reduction loop.A few simplifications arise already in the current argmax ukernel logic, confirming that this was the right design choice: the old ukernel's matching logic was checking that the distribution tile sizes matched what the ukernel could handle; now that is turned upside down: the ukernel matching happens as a helper within KernelConfig where we know we are setting the appropriate tile sizes on purpose.
Another nice improvement is that this puts just enough distance between ukernel selection (which creates the
hal.executable.object
) and ukernel lowering, that we are able to insertHoistExecutableObjectsPass
in between, simplifying the ukernel lowering as it doesn't need to worry anymore about preserving thehal.executable.object
.