-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR][GPU-LLVM] Convert gpu.func
to llvm.func
#101664
[MLIR][GPU-LLVM] Convert gpu.func
to llvm.func
#101664
Conversation
Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. **Note**: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. Signed-off-by: Victor Perez <victor.perez@codeplay.com>
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir-gpu Author: Victor Perez (victor-eds) ChangesAdd support in
Patch is 53.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101664.diff 13 Files Affected:
diff --git a/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
new file mode 100644
index 0000000000000..a99dd0fe6f133
--- /dev/null
+++ b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
@@ -0,0 +1,18 @@
+//===- AttrToLLVMConverter.h - SPIR-V attributes conversion to LLVM - C++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+#define MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
+
+namespace mlir {
+unsigned storageClassToAddressSpace(spirv::ClientAPI clientAPI,
+ spirv::StorageClass storageClass);
+} // namespace mlir
+
+#endif // MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
diff --git a/mlir/lib/Conversion/CMakeLists.txt b/mlir/lib/Conversion/CMakeLists.txt
index 80c8b84d9ae89..813f700c5556e 100644
--- a/mlir/lib/Conversion/CMakeLists.txt
+++ b/mlir/lib/Conversion/CMakeLists.txt
@@ -53,6 +53,7 @@ add_subdirectory(SCFToGPU)
add_subdirectory(SCFToOpenMP)
add_subdirectory(SCFToSPIRV)
add_subdirectory(ShapeToStandard)
+add_subdirectory(SPIRVCommon)
add_subdirectory(SPIRVToLLVM)
add_subdirectory(TensorToLinalg)
add_subdirectory(TensorToSPIRV)
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
index 6053e34f30a41..0007294b3ff27 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
@@ -25,29 +25,58 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
Location loc = gpuFuncOp.getLoc();
SmallVector<LLVM::GlobalOp, 3> workgroupBuffers;
- workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
- for (const auto [idx, attribution] :
- llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
- auto type = dyn_cast<MemRefType>(attribution.getType());
- assert(type && type.hasStaticShape() && "unexpected type in attribution");
-
- uint64_t numElements = type.getNumElements();
-
- auto elementType =
- cast<Type>(typeConverter->convertType(type.getElementType()));
- auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
- std::string name =
- std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
- uint64_t alignment = 0;
- if (auto alignAttr =
- dyn_cast_or_null<IntegerAttr>(gpuFuncOp.getWorkgroupAttributionAttr(
- idx, LLVM::LLVMDialect::getAlignAttrName())))
- alignment = alignAttr.getInt();
- auto globalOp = rewriter.create<LLVM::GlobalOp>(
- gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
- LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
- workgroupAddrSpace);
- workgroupBuffers.push_back(globalOp);
+ if (encodeWorkgroupAttributionsAsArguments) {
+ ArrayRef<BlockArgument> workgroupAttributions =
+ gpuFuncOp.getWorkgroupAttributions();
+ std::size_t numAttributions = workgroupAttributions.size();
+
+ // Insert all arguments at the end.
+ unsigned index = gpuFuncOp.getNumArguments();
+ SmallVector<unsigned> argIndices(numAttributions, index);
+
+ // New arguments will simply be `llvm.ptr` with the correct address space
+ Type workgroupPtrType =
+ rewriter.getType<LLVM::LLVMPointerType>(workgroupAddrSpace);
+ SmallVector<Type> argTypes(numAttributions, workgroupPtrType);
+
+ // No argument attributes will be added
+ DictionaryAttr emptyDict = rewriter.getDictionaryAttr({});
+ SmallVector<DictionaryAttr> argAttrs(numAttributions, emptyDict);
+
+ // Location match function location
+ SmallVector<Location> argLocs(numAttributions, gpuFuncOp.getLoc());
+
+ // Perform signature modification
+ rewriter.modifyOpInPlace(
+ gpuFuncOp, [gpuFuncOp, &argIndices, &argTypes, &argAttrs, &argLocs]() {
+ static_cast<FunctionOpInterface>(gpuFuncOp).insertArguments(
+ argIndices, argTypes, argAttrs, argLocs);
+ });
+ } else {
+ workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
+ for (const auto [idx, attribution] :
+ llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
+ auto type = dyn_cast<MemRefType>(attribution.getType());
+ assert(type && type.hasStaticShape() && "unexpected type in attribution");
+
+ uint64_t numElements = type.getNumElements();
+
+ auto elementType =
+ cast<Type>(typeConverter->convertType(type.getElementType()));
+ auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
+ std::string name =
+ std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
+ uint64_t alignment = 0;
+ if (auto alignAttr = dyn_cast_or_null<IntegerAttr>(
+ gpuFuncOp.getWorkgroupAttributionAttr(
+ idx, LLVM::LLVMDialect::getAlignAttrName())))
+ alignment = alignAttr.getInt();
+ auto globalOp = rewriter.create<LLVM::GlobalOp>(
+ gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
+ LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
+ workgroupAddrSpace);
+ workgroupBuffers.push_back(globalOp);
+ }
}
// Remap proper input types.
@@ -101,16 +130,20 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
// attribute. The former is necessary for further translation while the
// latter is expected by gpu.launch_func.
if (gpuFuncOp.isKernel()) {
- attributes.emplace_back(kernelAttributeName, rewriter.getUnitAttr());
+ if (kernelAttributeName)
+ attributes.emplace_back(*kernelAttributeName, rewriter.getUnitAttr());
// Set the dialect-specific block size attribute if there is one.
if (kernelBlockSizeAttributeName.has_value() && knownBlockSize) {
attributes.emplace_back(kernelBlockSizeAttributeName.value(),
knownBlockSize);
}
}
+ LLVM::CConv callingConvention = gpuFuncOp.isKernel()
+ ? kernelCallingConvention
+ : nonKernelCallingConvention;
auto llvmFuncOp = rewriter.create<LLVM::LLVMFuncOp>(
gpuFuncOp.getLoc(), gpuFuncOp.getName(), funcType,
- LLVM::Linkage::External, /*dsoLocal=*/false, /*cconv=*/LLVM::CConv::C,
+ LLVM::Linkage::External, /*dsoLocal=*/false, callingConvention,
/*comdat=*/nullptr, attributes);
{
@@ -125,24 +158,49 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
rewriter.setInsertionPointToStart(&gpuFuncOp.front());
unsigned numProperArguments = gpuFuncOp.getNumArguments();
- for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
- auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
- global.getAddrSpace());
- Value address = rewriter.create<LLVM::AddressOfOp>(
- loc, ptrType, global.getSymNameAttr());
- Value memory =
- rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(), address,
- ArrayRef<LLVM::GEPArg>{0, 0});
-
- // Build a memref descriptor pointing to the buffer to plug with the
- // existing memref infrastructure. This may use more registers than
- // otherwise necessary given that memref sizes are fixed, but we can try
- // and canonicalize that away later.
- Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
- auto type = cast<MemRefType>(attribution.getType());
- auto descr = MemRefDescriptor::fromStaticShape(
- rewriter, loc, *getTypeConverter(), type, memory);
- signatureConversion.remapInput(numProperArguments + idx, descr);
+ if (encodeWorkgroupAttributionsAsArguments) {
+ unsigned numAttributions = gpuFuncOp.getNumWorkgroupAttributions();
+ assert(numProperArguments >= numAttributions &&
+ "Expecting attributions to be encoded as arguments already");
+
+ // Arguments encoding workgroup attributions will be in positions
+ // [numProperArguments, numProperArguments+numAttributions)
+ ArrayRef<BlockArgument> attributionArguments =
+ gpuFuncOp.getArguments().slice(numProperArguments - numAttributions,
+ numAttributions);
+ for (auto [idx, vals] : llvm::enumerate(llvm::zip_equal(
+ gpuFuncOp.getWorkgroupAttributions(), attributionArguments))) {
+ auto [attribution, arg] = vals;
+ auto type = cast<MemRefType>(attribution.getType());
+
+ // Arguments are of llvm.ptr type and attributions are of memref type:
+ // we need to wrap them in memref descriptors.
+ Value descr = MemRefDescriptor::fromStaticShape(
+ rewriter, loc, *getTypeConverter(), type, arg);
+
+ // And remap the arguments
+ signatureConversion.remapInput(numProperArguments + idx, descr);
+ }
+ } else {
+ for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
+ auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
+ global.getAddrSpace());
+ Value address = rewriter.create<LLVM::AddressOfOp>(
+ loc, ptrType, global.getSymNameAttr());
+ Value memory =
+ rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(),
+ address, ArrayRef<LLVM::GEPArg>{0, 0});
+
+ // Build a memref descriptor pointing to the buffer to plug with the
+ // existing memref infrastructure. This may use more registers than
+ // otherwise necessary given that memref sizes are fixed, but we can try
+ // and canonicalize that away later.
+ Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
+ auto type = cast<MemRefType>(attribution.getType());
+ auto descr = MemRefDescriptor::fromStaticShape(
+ rewriter, loc, *getTypeConverter(), type, memory);
+ signatureConversion.remapInput(numProperArguments + idx, descr);
+ }
}
// Rewrite private memory attributions to alloca'ed buffers.
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
index 92e69badc27dd..781bea6b09406 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
@@ -35,16 +35,39 @@ struct GPUDynamicSharedMemoryOpLowering
unsigned alignmentBit;
};
+struct GPUFuncOpLoweringOptions {
+ /// The address space to use for `alloca`s in private memory.
+ unsigned allocaAddrSpace;
+ /// The address space to use declaring workgroup memory.
+ unsigned workgroupAddrSpace;
+
+ /// The attribute name to use instead of `gpu.kernel`.
+ std::optional<StringAttr> kernelAttributeName = std::nullopt;
+ /// The attribute name to to set block size
+ std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt;
+
+ /// The calling convention to use for kernel functions
+ LLVM::CConv kernelCallingConvention = LLVM::CConv::C;
+ /// The calling convention to use for non-kernel functions
+ LLVM::CConv nonKernelCallingConvention = LLVM::CConv::C;
+
+ /// Whether to encode workgroup attributions as additional arguments instead
+ /// of a global variable.
+ bool encodeWorkgroupAttributionsAsArguments = false;
+};
+
struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
- GPUFuncOpLowering(
- const LLVMTypeConverter &converter, unsigned allocaAddrSpace,
- unsigned workgroupAddrSpace, StringAttr kernelAttributeName,
- std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt)
+ GPUFuncOpLowering(const LLVMTypeConverter &converter,
+ const GPUFuncOpLoweringOptions &options)
: ConvertOpToLLVMPattern<gpu::GPUFuncOp>(converter),
- allocaAddrSpace(allocaAddrSpace),
- workgroupAddrSpace(workgroupAddrSpace),
- kernelAttributeName(kernelAttributeName),
- kernelBlockSizeAttributeName(kernelBlockSizeAttributeName) {}
+ allocaAddrSpace(options.allocaAddrSpace),
+ workgroupAddrSpace(options.workgroupAddrSpace),
+ kernelAttributeName(options.kernelAttributeName),
+ kernelBlockSizeAttributeName(options.kernelBlockSizeAttributeName),
+ kernelCallingConvention(options.kernelCallingConvention),
+ nonKernelCallingConvention(options.nonKernelCallingConvention),
+ encodeWorkgroupAttributionsAsArguments(
+ options.encodeWorkgroupAttributionsAsArguments) {}
LogicalResult
matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
@@ -57,10 +80,18 @@ struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
unsigned workgroupAddrSpace;
/// The attribute name to use instead of `gpu.kernel`.
- StringAttr kernelAttributeName;
-
+ std::optional<StringAttr> kernelAttributeName;
/// The attribute name to to set block size
std::optional<StringAttr> kernelBlockSizeAttributeName;
+
+ /// The calling convention to use for kernel functions
+ LLVM::CConv kernelCallingConvention;
+ /// The calling convention to use for non-kernel functions
+ LLVM::CConv nonKernelCallingConvention;
+
+ /// Whether to encode workgroup attributions as additional arguments instead
+ /// of a global variable.
+ bool encodeWorkgroupAttributionsAsArguments;
};
/// The lowering of gpu.printf to a call to HIP hostcalls
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
index da5650b2b68dd..d47c5e679d86e 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
@@ -6,7 +6,9 @@ add_mlir_conversion_library(MLIRGPUToLLVMSPV
LINK_LIBS PUBLIC
MLIRGPUDialect
+ MLIRGPUToGPURuntimeTransforms
MLIRLLVMCommonConversion
MLIRLLVMDialect
+ MLIRSPIRVAttrToLLVMConversion
MLIRSPIRVDialect
)
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
index 27d63b5f8948d..74dd5f19c20f5 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
@@ -8,15 +8,18 @@
#include "mlir/Conversion/GPUToLLVMSPV/GPUToLLVMSPVPass.h"
+#include "../GPUCommon/GPUOpsLowering.h"
#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
#include "mlir/Conversion/LLVMCommon/Pattern.h"
#include "mlir/Conversion/LLVMCommon/TypeConverter.h"
+#include "mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMTypes.h"
#include "mlir/Dialect/SPIRV/IR/SPIRVDialect.h"
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
#include "mlir/Dialect/SPIRV/IR/TargetAndABI.h"
#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/Matchers.h"
@@ -321,8 +324,8 @@ struct GPUToLLVMSPVConversionPass final
LLVMConversionTarget target(*context);
target.addIllegalOp<gpu::BarrierOp, gpu::BlockDimOp, gpu::BlockIdOp,
- gpu::GlobalIdOp, gpu::GridDimOp, gpu::ShuffleOp,
- gpu::ThreadIdOp>();
+ gpu::GPUFuncOp, gpu::GlobalIdOp, gpu::GridDimOp,
+ gpu::ReturnOp, gpu::ShuffleOp, gpu::ThreadIdOp>();
populateGpuToLLVMSPVConversionPatterns(converter, patterns);
@@ -340,11 +343,27 @@ struct GPUToLLVMSPVConversionPass final
namespace mlir {
void populateGpuToLLVMSPVConversionPatterns(LLVMTypeConverter &typeConverter,
RewritePatternSet &patterns) {
- patterns.add<GPUBarrierConversion, GPUShuffleConversion,
+ patterns.add<GPUBarrierConversion, GPUReturnOpLowering, GPUShuffleConversion,
LaunchConfigOpConversion<gpu::BlockIdOp>,
LaunchConfigOpConversion<gpu::GridDimOp>,
LaunchConfigOpConversion<gpu::BlockDimOp>,
LaunchConfigOpConversion<gpu::ThreadIdOp>,
LaunchConfigOpConversion<gpu::GlobalIdOp>>(typeConverter);
+ constexpr spirv::ClientAPI clientAPI = spirv::ClientAPI::OpenCL;
+ MLIRContext *context = &typeConverter.getContext();
+ unsigned privateAddressSpace =
+ storageClassToAddressSpace(clientAPI, spirv::StorageClass::Function);
+ unsigned localAddressSpace =
+ storageClassToAddressSpace(clientAPI, spirv::StorageClass::Workgroup);
+ OperationName llvmFuncOpName(LLVM::LLVMFuncOp::getOperationName(), context);
+ StringAttr kernelBlockSizeAttributeName =
+ LLVM::LLVMFuncOp::getReqdWorkGroupSizeAttrName(llvmFuncOpName);
+ patterns.add<GPUFuncOpLowering>(
+ typeConverter,
+ GPUFuncOpLoweringOptions{
+ privateAddressSpace, localAddressSpace,
+ /*kernelAttributeName=*/std::nullopt, kernelBlockSizeAttributeName,
+ LLVM::CConv::SPIR_KERNEL, LLVM::CConv::SPIR_FUNC,
+ /*encodeWorkgroupAttributionsAsArguments=*/true});
}
} // namespace mlir
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index faa97caacb885..060a1e1e82f75 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -365,13 +365,15 @@ void mlir::populateGpuToNVVMConversionPatterns(LLVMTypeConverter &converter,
// attributions since NVVM models it as `alloca`s in the default
// memory space and does not support `alloca`s with addrspace(5).
patterns.add<GPUFuncOpLowering>(
- converter, /*allocaAddrSpace=*/0,
- /*workgroupAddrSpace=*/
- static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
- StringAttr::get(&converter.getContext(),
- NVVM::NVVMDialect::getKernelFuncAttrName()),
- StringAttr::get(&converter.getContext(),
- NVVM::NVVMDialect::getMaxntidAttrName()));
+ converter,
+ GPUFuncOpLoweringOptions{
+ /*allocaAddrSpace=*/0,
+ /*workgroupAddrSpace=*/
+ static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
+ StringAttr::get(&converter.getContext(),
+ NVVM::NVVMDialect::getKernelFuncAttrName()),
+ StringAttr::get(&converter.getContext(),
+ NVVM::NVVMDialect::getMaxntidAttrName())});
populateOpPatterns<arith::RemFOp>(converter, patterns, "__nv_fmodf",
"__nv_fmod");
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 100181cdc69fe..564bab1ad92b9 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -372,10 +372,11 @@ void mlir::populateGpuToROCDLConversionPatterns(
patterns.add<GPUReturnOpLowering>(converter);
patterns.add<GPUFuncOpLowering>(
converter,
- /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
- /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
- rocdlDialect->getKernelAttrHelper().getName(),
- rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName());
+ GPUFuncOpLoweringOptions{
+ /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
+ /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
+ rocdlDialect->getKernelAttrHelper().getName(),
+ rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName()});
if (Runtime::HIP == runtime) {
patterns.add<GPUPrintfOpToHIPLowering>(converter);
} else if (Runtime::OpenCL == runtime) {
diff --git a/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp b/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp
new file mode 100644
index 0000000000000..924bd1643f...
[truncated]
|
@llvm/pr-subscribers-mlir Author: Victor Perez (victor-eds) ChangesAdd support in
Patch is 53.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101664.diff 13 Files Affected:
diff --git a/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
new file mode 100644
index 0000000000000..a99dd0fe6f133
--- /dev/null
+++ b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
@@ -0,0 +1,18 @@
+//===- AttrToLLVMConverter.h - SPIR-V attributes conversion to LLVM - C++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+#define MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
+
+namespace mlir {
+unsigned storageClassToAddressSpace(spirv::ClientAPI clientAPI,
+ spirv::StorageClass storageClass);
+} // namespace mlir
+
+#endif // MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
diff --git a/mlir/lib/Conversion/CMakeLists.txt b/mlir/lib/Conversion/CMakeLists.txt
index 80c8b84d9ae89..813f700c5556e 100644
--- a/mlir/lib/Conversion/CMakeLists.txt
+++ b/mlir/lib/Conversion/CMakeLists.txt
@@ -53,6 +53,7 @@ add_subdirectory(SCFToGPU)
add_subdirectory(SCFToOpenMP)
add_subdirectory(SCFToSPIRV)
add_subdirectory(ShapeToStandard)
+add_subdirectory(SPIRVCommon)
add_subdirectory(SPIRVToLLVM)
add_subdirectory(TensorToLinalg)
add_subdirectory(TensorToSPIRV)
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
index 6053e34f30a41..0007294b3ff27 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
@@ -25,29 +25,58 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
Location loc = gpuFuncOp.getLoc();
SmallVector<LLVM::GlobalOp, 3> workgroupBuffers;
- workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
- for (const auto [idx, attribution] :
- llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
- auto type = dyn_cast<MemRefType>(attribution.getType());
- assert(type && type.hasStaticShape() && "unexpected type in attribution");
-
- uint64_t numElements = type.getNumElements();
-
- auto elementType =
- cast<Type>(typeConverter->convertType(type.getElementType()));
- auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
- std::string name =
- std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
- uint64_t alignment = 0;
- if (auto alignAttr =
- dyn_cast_or_null<IntegerAttr>(gpuFuncOp.getWorkgroupAttributionAttr(
- idx, LLVM::LLVMDialect::getAlignAttrName())))
- alignment = alignAttr.getInt();
- auto globalOp = rewriter.create<LLVM::GlobalOp>(
- gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
- LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
- workgroupAddrSpace);
- workgroupBuffers.push_back(globalOp);
+ if (encodeWorkgroupAttributionsAsArguments) {
+ ArrayRef<BlockArgument> workgroupAttributions =
+ gpuFuncOp.getWorkgroupAttributions();
+ std::size_t numAttributions = workgroupAttributions.size();
+
+ // Insert all arguments at the end.
+ unsigned index = gpuFuncOp.getNumArguments();
+ SmallVector<unsigned> argIndices(numAttributions, index);
+
+ // New arguments will simply be `llvm.ptr` with the correct address space
+ Type workgroupPtrType =
+ rewriter.getType<LLVM::LLVMPointerType>(workgroupAddrSpace);
+ SmallVector<Type> argTypes(numAttributions, workgroupPtrType);
+
+ // No argument attributes will be added
+ DictionaryAttr emptyDict = rewriter.getDictionaryAttr({});
+ SmallVector<DictionaryAttr> argAttrs(numAttributions, emptyDict);
+
+ // Location match function location
+ SmallVector<Location> argLocs(numAttributions, gpuFuncOp.getLoc());
+
+ // Perform signature modification
+ rewriter.modifyOpInPlace(
+ gpuFuncOp, [gpuFuncOp, &argIndices, &argTypes, &argAttrs, &argLocs]() {
+ static_cast<FunctionOpInterface>(gpuFuncOp).insertArguments(
+ argIndices, argTypes, argAttrs, argLocs);
+ });
+ } else {
+ workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
+ for (const auto [idx, attribution] :
+ llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
+ auto type = dyn_cast<MemRefType>(attribution.getType());
+ assert(type && type.hasStaticShape() && "unexpected type in attribution");
+
+ uint64_t numElements = type.getNumElements();
+
+ auto elementType =
+ cast<Type>(typeConverter->convertType(type.getElementType()));
+ auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
+ std::string name =
+ std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
+ uint64_t alignment = 0;
+ if (auto alignAttr = dyn_cast_or_null<IntegerAttr>(
+ gpuFuncOp.getWorkgroupAttributionAttr(
+ idx, LLVM::LLVMDialect::getAlignAttrName())))
+ alignment = alignAttr.getInt();
+ auto globalOp = rewriter.create<LLVM::GlobalOp>(
+ gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
+ LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
+ workgroupAddrSpace);
+ workgroupBuffers.push_back(globalOp);
+ }
}
// Remap proper input types.
@@ -101,16 +130,20 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
// attribute. The former is necessary for further translation while the
// latter is expected by gpu.launch_func.
if (gpuFuncOp.isKernel()) {
- attributes.emplace_back(kernelAttributeName, rewriter.getUnitAttr());
+ if (kernelAttributeName)
+ attributes.emplace_back(*kernelAttributeName, rewriter.getUnitAttr());
// Set the dialect-specific block size attribute if there is one.
if (kernelBlockSizeAttributeName.has_value() && knownBlockSize) {
attributes.emplace_back(kernelBlockSizeAttributeName.value(),
knownBlockSize);
}
}
+ LLVM::CConv callingConvention = gpuFuncOp.isKernel()
+ ? kernelCallingConvention
+ : nonKernelCallingConvention;
auto llvmFuncOp = rewriter.create<LLVM::LLVMFuncOp>(
gpuFuncOp.getLoc(), gpuFuncOp.getName(), funcType,
- LLVM::Linkage::External, /*dsoLocal=*/false, /*cconv=*/LLVM::CConv::C,
+ LLVM::Linkage::External, /*dsoLocal=*/false, callingConvention,
/*comdat=*/nullptr, attributes);
{
@@ -125,24 +158,49 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
rewriter.setInsertionPointToStart(&gpuFuncOp.front());
unsigned numProperArguments = gpuFuncOp.getNumArguments();
- for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
- auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
- global.getAddrSpace());
- Value address = rewriter.create<LLVM::AddressOfOp>(
- loc, ptrType, global.getSymNameAttr());
- Value memory =
- rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(), address,
- ArrayRef<LLVM::GEPArg>{0, 0});
-
- // Build a memref descriptor pointing to the buffer to plug with the
- // existing memref infrastructure. This may use more registers than
- // otherwise necessary given that memref sizes are fixed, but we can try
- // and canonicalize that away later.
- Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
- auto type = cast<MemRefType>(attribution.getType());
- auto descr = MemRefDescriptor::fromStaticShape(
- rewriter, loc, *getTypeConverter(), type, memory);
- signatureConversion.remapInput(numProperArguments + idx, descr);
+ if (encodeWorkgroupAttributionsAsArguments) {
+ unsigned numAttributions = gpuFuncOp.getNumWorkgroupAttributions();
+ assert(numProperArguments >= numAttributions &&
+ "Expecting attributions to be encoded as arguments already");
+
+ // Arguments encoding workgroup attributions will be in positions
+ // [numProperArguments, numProperArguments+numAttributions)
+ ArrayRef<BlockArgument> attributionArguments =
+ gpuFuncOp.getArguments().slice(numProperArguments - numAttributions,
+ numAttributions);
+ for (auto [idx, vals] : llvm::enumerate(llvm::zip_equal(
+ gpuFuncOp.getWorkgroupAttributions(), attributionArguments))) {
+ auto [attribution, arg] = vals;
+ auto type = cast<MemRefType>(attribution.getType());
+
+ // Arguments are of llvm.ptr type and attributions are of memref type:
+ // we need to wrap them in memref descriptors.
+ Value descr = MemRefDescriptor::fromStaticShape(
+ rewriter, loc, *getTypeConverter(), type, arg);
+
+ // And remap the arguments
+ signatureConversion.remapInput(numProperArguments + idx, descr);
+ }
+ } else {
+ for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
+ auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
+ global.getAddrSpace());
+ Value address = rewriter.create<LLVM::AddressOfOp>(
+ loc, ptrType, global.getSymNameAttr());
+ Value memory =
+ rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(),
+ address, ArrayRef<LLVM::GEPArg>{0, 0});
+
+ // Build a memref descriptor pointing to the buffer to plug with the
+ // existing memref infrastructure. This may use more registers than
+ // otherwise necessary given that memref sizes are fixed, but we can try
+ // and canonicalize that away later.
+ Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
+ auto type = cast<MemRefType>(attribution.getType());
+ auto descr = MemRefDescriptor::fromStaticShape(
+ rewriter, loc, *getTypeConverter(), type, memory);
+ signatureConversion.remapInput(numProperArguments + idx, descr);
+ }
}
// Rewrite private memory attributions to alloca'ed buffers.
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
index 92e69badc27dd..781bea6b09406 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
@@ -35,16 +35,39 @@ struct GPUDynamicSharedMemoryOpLowering
unsigned alignmentBit;
};
+struct GPUFuncOpLoweringOptions {
+ /// The address space to use for `alloca`s in private memory.
+ unsigned allocaAddrSpace;
+ /// The address space to use declaring workgroup memory.
+ unsigned workgroupAddrSpace;
+
+ /// The attribute name to use instead of `gpu.kernel`.
+ std::optional<StringAttr> kernelAttributeName = std::nullopt;
+ /// The attribute name to to set block size
+ std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt;
+
+ /// The calling convention to use for kernel functions
+ LLVM::CConv kernelCallingConvention = LLVM::CConv::C;
+ /// The calling convention to use for non-kernel functions
+ LLVM::CConv nonKernelCallingConvention = LLVM::CConv::C;
+
+ /// Whether to encode workgroup attributions as additional arguments instead
+ /// of a global variable.
+ bool encodeWorkgroupAttributionsAsArguments = false;
+};
+
struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
- GPUFuncOpLowering(
- const LLVMTypeConverter &converter, unsigned allocaAddrSpace,
- unsigned workgroupAddrSpace, StringAttr kernelAttributeName,
- std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt)
+ GPUFuncOpLowering(const LLVMTypeConverter &converter,
+ const GPUFuncOpLoweringOptions &options)
: ConvertOpToLLVMPattern<gpu::GPUFuncOp>(converter),
- allocaAddrSpace(allocaAddrSpace),
- workgroupAddrSpace(workgroupAddrSpace),
- kernelAttributeName(kernelAttributeName),
- kernelBlockSizeAttributeName(kernelBlockSizeAttributeName) {}
+ allocaAddrSpace(options.allocaAddrSpace),
+ workgroupAddrSpace(options.workgroupAddrSpace),
+ kernelAttributeName(options.kernelAttributeName),
+ kernelBlockSizeAttributeName(options.kernelBlockSizeAttributeName),
+ kernelCallingConvention(options.kernelCallingConvention),
+ nonKernelCallingConvention(options.nonKernelCallingConvention),
+ encodeWorkgroupAttributionsAsArguments(
+ options.encodeWorkgroupAttributionsAsArguments) {}
LogicalResult
matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
@@ -57,10 +80,18 @@ struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
unsigned workgroupAddrSpace;
/// The attribute name to use instead of `gpu.kernel`.
- StringAttr kernelAttributeName;
-
+ std::optional<StringAttr> kernelAttributeName;
/// The attribute name to to set block size
std::optional<StringAttr> kernelBlockSizeAttributeName;
+
+ /// The calling convention to use for kernel functions
+ LLVM::CConv kernelCallingConvention;
+ /// The calling convention to use for non-kernel functions
+ LLVM::CConv nonKernelCallingConvention;
+
+ /// Whether to encode workgroup attributions as additional arguments instead
+ /// of a global variable.
+ bool encodeWorkgroupAttributionsAsArguments;
};
/// The lowering of gpu.printf to a call to HIP hostcalls
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
index da5650b2b68dd..d47c5e679d86e 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
@@ -6,7 +6,9 @@ add_mlir_conversion_library(MLIRGPUToLLVMSPV
LINK_LIBS PUBLIC
MLIRGPUDialect
+ MLIRGPUToGPURuntimeTransforms
MLIRLLVMCommonConversion
MLIRLLVMDialect
+ MLIRSPIRVAttrToLLVMConversion
MLIRSPIRVDialect
)
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
index 27d63b5f8948d..74dd5f19c20f5 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
@@ -8,15 +8,18 @@
#include "mlir/Conversion/GPUToLLVMSPV/GPUToLLVMSPVPass.h"
+#include "../GPUCommon/GPUOpsLowering.h"
#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
#include "mlir/Conversion/LLVMCommon/Pattern.h"
#include "mlir/Conversion/LLVMCommon/TypeConverter.h"
+#include "mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMTypes.h"
#include "mlir/Dialect/SPIRV/IR/SPIRVDialect.h"
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
#include "mlir/Dialect/SPIRV/IR/TargetAndABI.h"
#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/Matchers.h"
@@ -321,8 +324,8 @@ struct GPUToLLVMSPVConversionPass final
LLVMConversionTarget target(*context);
target.addIllegalOp<gpu::BarrierOp, gpu::BlockDimOp, gpu::BlockIdOp,
- gpu::GlobalIdOp, gpu::GridDimOp, gpu::ShuffleOp,
- gpu::ThreadIdOp>();
+ gpu::GPUFuncOp, gpu::GlobalIdOp, gpu::GridDimOp,
+ gpu::ReturnOp, gpu::ShuffleOp, gpu::ThreadIdOp>();
populateGpuToLLVMSPVConversionPatterns(converter, patterns);
@@ -340,11 +343,27 @@ struct GPUToLLVMSPVConversionPass final
namespace mlir {
void populateGpuToLLVMSPVConversionPatterns(LLVMTypeConverter &typeConverter,
RewritePatternSet &patterns) {
- patterns.add<GPUBarrierConversion, GPUShuffleConversion,
+ patterns.add<GPUBarrierConversion, GPUReturnOpLowering, GPUShuffleConversion,
LaunchConfigOpConversion<gpu::BlockIdOp>,
LaunchConfigOpConversion<gpu::GridDimOp>,
LaunchConfigOpConversion<gpu::BlockDimOp>,
LaunchConfigOpConversion<gpu::ThreadIdOp>,
LaunchConfigOpConversion<gpu::GlobalIdOp>>(typeConverter);
+ constexpr spirv::ClientAPI clientAPI = spirv::ClientAPI::OpenCL;
+ MLIRContext *context = &typeConverter.getContext();
+ unsigned privateAddressSpace =
+ storageClassToAddressSpace(clientAPI, spirv::StorageClass::Function);
+ unsigned localAddressSpace =
+ storageClassToAddressSpace(clientAPI, spirv::StorageClass::Workgroup);
+ OperationName llvmFuncOpName(LLVM::LLVMFuncOp::getOperationName(), context);
+ StringAttr kernelBlockSizeAttributeName =
+ LLVM::LLVMFuncOp::getReqdWorkGroupSizeAttrName(llvmFuncOpName);
+ patterns.add<GPUFuncOpLowering>(
+ typeConverter,
+ GPUFuncOpLoweringOptions{
+ privateAddressSpace, localAddressSpace,
+ /*kernelAttributeName=*/std::nullopt, kernelBlockSizeAttributeName,
+ LLVM::CConv::SPIR_KERNEL, LLVM::CConv::SPIR_FUNC,
+ /*encodeWorkgroupAttributionsAsArguments=*/true});
}
} // namespace mlir
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index faa97caacb885..060a1e1e82f75 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -365,13 +365,15 @@ void mlir::populateGpuToNVVMConversionPatterns(LLVMTypeConverter &converter,
// attributions since NVVM models it as `alloca`s in the default
// memory space and does not support `alloca`s with addrspace(5).
patterns.add<GPUFuncOpLowering>(
- converter, /*allocaAddrSpace=*/0,
- /*workgroupAddrSpace=*/
- static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
- StringAttr::get(&converter.getContext(),
- NVVM::NVVMDialect::getKernelFuncAttrName()),
- StringAttr::get(&converter.getContext(),
- NVVM::NVVMDialect::getMaxntidAttrName()));
+ converter,
+ GPUFuncOpLoweringOptions{
+ /*allocaAddrSpace=*/0,
+ /*workgroupAddrSpace=*/
+ static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
+ StringAttr::get(&converter.getContext(),
+ NVVM::NVVMDialect::getKernelFuncAttrName()),
+ StringAttr::get(&converter.getContext(),
+ NVVM::NVVMDialect::getMaxntidAttrName())});
populateOpPatterns<arith::RemFOp>(converter, patterns, "__nv_fmodf",
"__nv_fmod");
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 100181cdc69fe..564bab1ad92b9 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -372,10 +372,11 @@ void mlir::populateGpuToROCDLConversionPatterns(
patterns.add<GPUReturnOpLowering>(converter);
patterns.add<GPUFuncOpLowering>(
converter,
- /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
- /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
- rocdlDialect->getKernelAttrHelper().getName(),
- rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName());
+ GPUFuncOpLoweringOptions{
+ /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
+ /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
+ rocdlDialect->getKernelAttrHelper().getName(),
+ rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName()});
if (Runtime::HIP == runtime) {
patterns.add<GPUPrintfOpToHIPLowering>(converter);
} else if (Runtime::OpenCL == runtime) {
diff --git a/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp b/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp
new file mode 100644
index 0000000000000..924bd1643f...
[truncated]
|
struct GPUFuncOpLoweringOptions { | ||
/// The address space to use for `alloca`s in private memory. | ||
unsigned allocaAddrSpace; | ||
/// The address space to use declaring workgroup memory. | ||
unsigned workgroupAddrSpace; | ||
|
||
/// The attribute name to use instead of `gpu.kernel`. | ||
std::optional<StringAttr> kernelAttributeName = std::nullopt; | ||
/// The attribute name to to set block size | ||
std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt; | ||
|
||
/// The calling convention to use for kernel functions | ||
LLVM::CConv kernelCallingConvention = LLVM::CConv::C; | ||
/// The calling convention to use for non-kernel functions | ||
LLVM::CConv nonKernelCallingConvention = LLVM::CConv::C; | ||
|
||
/// Whether to encode workgroup attributions as additional arguments instead | ||
/// of a global variable. | ||
bool encodeWorkgroupAttributionsAsArguments = false; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was getting out of hand. Cleaner this way.
workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions()); | ||
for (const auto [idx, attribution] : | ||
llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) { | ||
auto type = dyn_cast<MemRefType>(attribution.getType()); | ||
assert(type && type.hasStaticShape() && "unexpected type in attribution"); | ||
|
||
uint64_t numElements = type.getNumElements(); | ||
|
||
auto elementType = | ||
cast<Type>(typeConverter->convertType(type.getElementType())); | ||
auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements); | ||
std::string name = | ||
std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx)); | ||
uint64_t alignment = 0; | ||
if (auto alignAttr = dyn_cast_or_null<IntegerAttr>( | ||
gpuFuncOp.getWorkgroupAttributionAttr( | ||
idx, LLVM::LLVMDialect::getAlignAttrName()))) | ||
alignment = alignAttr.getInt(); | ||
auto globalOp = rewriter.create<LLVM::GlobalOp>( | ||
gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false, | ||
LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment, | ||
workgroupAddrSpace); | ||
workgroupBuffers.push_back(globalOp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original code
for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) { | ||
auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(), | ||
global.getAddrSpace()); | ||
Value address = rewriter.create<LLVM::AddressOfOp>( | ||
loc, ptrType, global.getSymNameAttr()); | ||
Value memory = | ||
rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(), | ||
address, ArrayRef<LLVM::GEPArg>{0, 0}); | ||
|
||
// Build a memref descriptor pointing to the buffer to plug with the | ||
// existing memref infrastructure. This may use more registers than | ||
// otherwise necessary given that memref sizes are fixed, but we can try | ||
// and canonicalize that away later. | ||
Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx]; | ||
auto type = cast<MemRefType>(attribution.getType()); | ||
auto descr = MemRefDescriptor::fromStaticShape( | ||
rewriter, loc, *getTypeConverter(), type, memory); | ||
signatureConversion.remapInput(numProperArguments + idx, descr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original code
OperationName llvmFuncOpName(LLVM::LLVMFuncOp::getOperationName(), context); | ||
StringAttr kernelBlockSizeAttributeName = | ||
LLVM::LLVMFuncOp::getReqdWorkGroupSizeAttrName(llvmFuncOpName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always thought this should be a static member... Is there a better way to do this? I didn't wanna add the static member function to the LLVM dialect, so I went with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot be static, as an attribute requires the context present in the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I was wondering if at least the string name should be
@@ -0,0 +1,61 @@ | |||
//===- AttrToLLVMConverter.cpp - SPIR-V attributes conversion to LLVM -C++ ===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from SPIR-V to LLVM conversion. More similar approach to other dialects.
@@ -273,47 +268,13 @@ static std::optional<Type> convertArrayType(spirv::ArrayType type, | |||
return LLVM::LLVMArrayType::get(llvmElementType, numElements); | |||
} | |||
|
|||
static unsigned mapToOpenCLAddressSpace(spirv::StorageClass storageClass) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved
// CHECK-LABEL: llvm.func spir_kernelcc @kernel_with_workgoup_attribs( | ||
// CHECK-SAME: %[[VAL_27:.*]]: f32, %[[VAL_28:.*]]: i16, %[[VAL_29:.*]]: !llvm.ptr<3>, %[[VAL_30:.*]]: !llvm.ptr<3>) attributes {gpu.kernel} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional arguments of llvm.ptr<3>
type encode workgroup attributions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Victor! This is really helpful, I was able to replace some dumb copy-pasted code locally :) Haven't tried the llvm.ptr<3>
yet though. Would it require gpu.launch
lowering changes?
Yeah, definitely. In fact, I've just figured out we're dropping information on the size of this memory allocations. |
Attached as |
Alternatively to the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
A tuple is probably more convenient, yup. |
@antiagainst @kuhar @FMarno can I get a review here? I'd like people with different views and context to take a look as this is touching different pieces |
Is anyone opposed to merging this in the current state? I don't think changes are controversial, and I have follow up work I'd like to push too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with merging this from an LLVM dialect perspective. Just added one more nit to the GPU test, which seems to have an odd formatting.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/2209 Here is the relevant piece of the build log for the reference:
|
* 'main' of https://github.com/llvm/llvm-project: (700 commits) [SandboxIR][NFC] SingleLLVMInstructionImpl class (llvm#102687) [ThinLTO]Clean up 'import-assume-unique-local' flag. (llvm#102424) [nsan] Make #include more conventional [SandboxIR][NFC] Use Tracker.emplaceIfTracking() [libc] Moved range_reduction_double ifdef statement (llvm#102659) [libc] Fix CFP long double and add tests (llvm#102660) [TargetLowering] Handle vector types in expandFixedPointMul (llvm#102635) [compiler-rt][NFC] Replace environment variable with %t (llvm#102197) [UnitTests] Convert a test to use opaque pointers (llvm#102668) [CodeGen][NFCI] Don't re-implement parts of ASTContext::getIntWidth (llvm#101765) [SandboxIR] Clean up tracking code with the help of emplaceIfTracking() (llvm#102406) [mlir][bazel] remove extra blanks in mlir-tblgen test [NVPTX][NFC] Update tests to use bfloat type (llvm#101493) [mlir] Add support for parsing nested PassPipelineOptions (llvm#101118) [mlir][bazel] add missing td dependency in mlir-tblgen test [flang][cuda] Fix lib dependency [libc] Clean up remaining use of *_WIDTH macros in printf (llvm#102679) [flang][cuda] Convert cuf.alloc for box to fir.alloca in device context (llvm#102662) [SandboxIR] Implement the InsertElementInst class (llvm#102404) [libc] Fix use of cpp::numeric_limits<...>::digits (llvm#102674) [mlir][ODS] Verify type constraints in Types and Attributes (llvm#102326) [LTO] enable `ObjCARCContractPass` only on optimized build (llvm#101114) [mlir][ODS] Consistent `cppType` / `cppClassName` usage (llvm#102657) [lldb] Move definition of SBSaveCoreOptions dtor out of header (llvm#102539) [libc] Use cpp::numeric_limits in preference to C23 <limits.h> macros (llvm#102665) [clang] Implement -fptrauth-auth-traps. (llvm#102417) [LLVM][rtsan] rtsan transform to preserve CFGAnalyses (llvm#102651) Revert "[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086)" [RISCV][GISel] Add missing tests for G_CTLZ/CTTZ instruction selection. NFC Return available function types for BindingDecls. (llvm#102196) [clang] Wire -fptrauth-returns to "ptrauth-returns" fn attribute. (llvm#102416) [RISCV] Remove riscv-experimental-rv64-legal-i32. (llvm#102509) [RISCV] Move PseudoVSET(I)VLI expansion to use PseudoInstExpansion. (llvm#102496) [NVPTX] support switch statement with brx.idx (reland) (llvm#102550) [libc][newhdrgen]sorted function names in yaml (llvm#102544) [GlobalIsel] Combine G_ADD and G_SUB with constants (llvm#97771) Suppress spurious warnings due to R_RISCV_SET_ULEB128 [scudo] Separated committed and decommitted entries. (llvm#101409) [MIPS] Fix missing ANDI optimization (llvm#97689) [Clang] Add env var for nvptx-arch/amdgpu-arch timeout (llvm#102521) [asan] Switch allocator to dynamic base address (llvm#98511) [AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086) [libc][math][c23] Add fadd{l,f128} C23 math functions (llvm#102531) [mlir][bazel] revert bazel rule change for DLTITransformOps [msan] Support vst{2,3,4}_lane instructions (llvm#101215) Revert "[MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561)" [X86] pr57673.ll - generate MIR test checks [mlir][vector][test] Split tests from vector-transfer-flatten.mlir (llvm#102584) [mlir][bazel] add bazel rule for DLTITransformOps OpenMPOpt: Remove dead include [IR] Add method to GlobalVariable to change type of initializer. (llvm#102553) [flang][cuda] Force default allocator in device code (llvm#102238) [llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (llvm#102578) [MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561) [AMDGPU][AsmParser][NFC] Remove a misleading comment. (llvm#102604) [Arm][AArch64][Clang] Respect function's branch protection attributes. (llvm#101978) [mlir] Verifier: steal bit to track seen instead of set. (llvm#102626) [Clang] Fix Handling of Init Capture with Parameter Packs in LambdaScopeForCallOperatorInstantiationRAII (llvm#100766) [X86] Convert truncsat clamping patterns to use SDPatternMatch. NFC. [gn] Give two scripts argparse.RawDescriptionHelpFormatter [bazel] Add missing dep for the SPIRVToLLVM target [Clang] Simplify specifying passes via -Xoffload-linker (llvm#102483) [bazel] Port for d45de80 [SelectionDAG] Use unaligned store/load to move AVX registers onto stack for `insertelement` (llvm#82130) [Clang][OMPX] Add the code generation for multi-dim `num_teams` (llvm#101407) [ARM] Regenerate big-endian-vmov.ll. NFC [AMDGPU][AsmParser][NFCI] All NamedIntOperands to be of the i32 type. (llvm#102616) [libc][math][c23] Add totalorderl function. (llvm#102564) [mlir][spirv] Support `memref` in `convert-to-spirv` pass (llvm#102534) [MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (llvm#101664) Fix a unit test input file (llvm#102567) [llvm-readobj][COFF] Dump hybrid objects for ARM64X files. (llvm#102245) AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (llvm#102614) [MemoryBuiltins] Simplify getCalledFunction() helper (NFC) [AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. (llvm#102105) [MemoryBuiltins] Handle allocator attributes on call-site LSV/test/AArch64: add missing lit.local.cfg; fix build (llvm#102607) Revert "Enable logf128 constant folding for hosts with 128bit floats (llvm#96287)" [RISCV] Add Syntacore SCR5 RV32/64 processors definition (llvm#102285) [InstCombine] Remove unnecessary RUN line from test (NFC) [flang][OpenMP] Handle multiple ranges in `num_teams` clause (llvm#102535) [mlir][vector] Add tests for scalable vectors in one-shot-bufferize.mlir (llvm#102361) [mlir][vector] Disable `vector.matrix_multiply` for scalable vectors (llvm#102573) [clang] Implement CWG2627 Bit-fields and narrowing conversions (llvm#78112) [NFC] Use references to avoid copying (llvm#99863) Revert "[mlir][ArmSME] Pattern to swap shape_cast(tranpose) with transpose(shape_cast) (llvm#100731)" (llvm#102457) [IRBuilder] Generate nuw GEPs for struct member accesses (llvm#99538) [bazel] Port for 9b06e25 [CodeGen][NewPM] Improve start/stop pass error message CodeGenPassBuilder (llvm#102591) [AArch64] Implement TRBMPAM_EL1 system register (llvm#102485) [InstCombine] Fixing wrong select folding in vectors with undef elements (llvm#102244) [AArch64] Sink operands to fmuladd. (llvm#102297) LSV: document hang reported in llvm#37865 (llvm#102479) Enable logf128 constant folding for hosts with 128bit floats (llvm#96287) [RISCV][clang] Remove bfloat base type in non-zvfbfmin vcreate (llvm#102146) [RISCV][clang] Add missing `zvfbfmin` to `vget_v` intrinsic (llvm#102149) [mlir][vector] Add mask elimination transform (llvm#99314) [Clang][Interp] Fix display of syntactically-invalid note for member function calls (llvm#102170) [bazel] Port for 3fffa6d [DebugInfo][RemoveDIs] Use iterator-inserters in clang (llvm#102006) ... Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
Add support in
-convert-gpu-to-llvm-spv
to convertgpu.func
tollvm.func
operations.spir_kernel
/spir_func
calling conventions used for kernels/functions.workgroup
attributions encoded as additionalllvm.ptr<3>
arguments.reqd_work_group_size
attribute using to encodegpu.known_block_size
.llvm.mlir.workgroup_attrib_size
used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to.Note: A notable missing feature that will be addressed in a follow-up PR is a
-use-bare-ptr-memref-call-conv
option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach.