[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` #101664

victor-eds · 2024-08-02T12:45:20Z

Add support in -convert-gpu-to-llvm-spv to convert gpu.func to llvm.func operations.

spir_kernel/spir_func calling conventions used for kernels/functions.
workgroup attributions encoded as additional llvm.ptr<3> arguments.
No attribute used to annotate kernels
reqd_work_group_size attribute using to encode gpu.known_block_size.
llvm.mlir.workgroup_attrib_size used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to.

Note: A notable missing feature that will be addressed in a follow-up PR is a -use-bare-ptr-memref-call-conv option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach.

Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. **Note**: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. Signed-off-by: Victor Perez <victor.perez@codeplay.com>

llvmbot · 2024-08-02T12:45:49Z

@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-mlir-spirv

@llvm/pr-subscribers-mlir-gpu

Author: Victor Perez (victor-eds)

Changes

Add support in -convert-gpu-to-llvm-spv to convert gpu.func to llvm.func operations.

spir_kernel/spir_func calling conventions used for kernels/functions.
workgroup attributions encoded as additional llvm.ptr<3> arguments.
No attribute used to annotate kernels
reqd_work_group_size attribute using to encode gpu.known_block_size.

Patch is 53.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101664.diff

13 Files Affected:

(added) mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h (+18)
(modified) mlir/lib/Conversion/CMakeLists.txt (+1)
(modified) mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp (+101-43)
(modified) mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h (+41-10)
(modified) mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt (+2)
(modified) mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp (+22-3)
(modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+9-7)
(modified) mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp (+5-4)
(added) mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp (+61)
(added) mlir/lib/Conversion/SPIRVCommon/CMakeLists.txt (+6)
(modified) mlir/lib/Conversion/SPIRVToLLVM/CMakeLists.txt (+1)
(modified) mlir/lib/Conversion/SPIRVToLLVM/SPIRVToLLVM.cpp (+4-43)
(modified) mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir (+285)

diff --git a/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
new file mode 100644
index 0000000000000..a99dd0fe6f133
--- /dev/null
+++ b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
@@ -0,0 +1,18 @@
+//===- AttrToLLVMConverter.h - SPIR-V attributes conversion to LLVM - C++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+#define MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
+
+namespace mlir {
+unsigned storageClassToAddressSpace(spirv::ClientAPI clientAPI,
+                                    spirv::StorageClass storageClass);
+} // namespace mlir
+
+#endif // MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
diff --git a/mlir/lib/Conversion/CMakeLists.txt b/mlir/lib/Conversion/CMakeLists.txt
index 80c8b84d9ae89..813f700c5556e 100644
--- a/mlir/lib/Conversion/CMakeLists.txt
+++ b/mlir/lib/Conversion/CMakeLists.txt
@@ -53,6 +53,7 @@ add_subdirectory(SCFToGPU)
 add_subdirectory(SCFToOpenMP)
 add_subdirectory(SCFToSPIRV)
 add_subdirectory(ShapeToStandard)
+add_subdirectory(SPIRVCommon)
 add_subdirectory(SPIRVToLLVM)
 add_subdirectory(TensorToLinalg)
 add_subdirectory(TensorToSPIRV)
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
index 6053e34f30a41..0007294b3ff27 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
@@ -25,29 +25,58 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
   Location loc = gpuFuncOp.getLoc();
 
   SmallVector<LLVM::GlobalOp, 3> workgroupBuffers;
-  workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
-  for (const auto [idx, attribution] :
-       llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
-    auto type = dyn_cast<MemRefType>(attribution.getType());
-    assert(type && type.hasStaticShape() && "unexpected type in attribution");
-
-    uint64_t numElements = type.getNumElements();
-
-    auto elementType =
-        cast<Type>(typeConverter->convertType(type.getElementType()));
-    auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
-    std::string name =
-        std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
-    uint64_t alignment = 0;
-    if (auto alignAttr =
-            dyn_cast_or_null<IntegerAttr>(gpuFuncOp.getWorkgroupAttributionAttr(
-                idx, LLVM::LLVMDialect::getAlignAttrName())))
-      alignment = alignAttr.getInt();
-    auto globalOp = rewriter.create<LLVM::GlobalOp>(
-        gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
-        LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
-        workgroupAddrSpace);
-    workgroupBuffers.push_back(globalOp);
+  if (encodeWorkgroupAttributionsAsArguments) {
+    ArrayRef<BlockArgument> workgroupAttributions =
+        gpuFuncOp.getWorkgroupAttributions();
+    std::size_t numAttributions = workgroupAttributions.size();
+
+    // Insert all arguments at the end.
+    unsigned index = gpuFuncOp.getNumArguments();
+    SmallVector<unsigned> argIndices(numAttributions, index);
+
+    // New arguments will simply be `llvm.ptr` with the correct address space
+    Type workgroupPtrType =
+        rewriter.getType<LLVM::LLVMPointerType>(workgroupAddrSpace);
+    SmallVector<Type> argTypes(numAttributions, workgroupPtrType);
+
+    // No argument attributes will be added
+    DictionaryAttr emptyDict = rewriter.getDictionaryAttr({});
+    SmallVector<DictionaryAttr> argAttrs(numAttributions, emptyDict);
+
+    // Location match function location
+    SmallVector<Location> argLocs(numAttributions, gpuFuncOp.getLoc());
+
+    // Perform signature modification
+    rewriter.modifyOpInPlace(
+        gpuFuncOp, [gpuFuncOp, &argIndices, &argTypes, &argAttrs, &argLocs]() {
+          static_cast<FunctionOpInterface>(gpuFuncOp).insertArguments(
+              argIndices, argTypes, argAttrs, argLocs);
+        });
+  } else {
+    workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
+    for (const auto [idx, attribution] :
+         llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
+      auto type = dyn_cast<MemRefType>(attribution.getType());
+      assert(type && type.hasStaticShape() && "unexpected type in attribution");
+
+      uint64_t numElements = type.getNumElements();
+
+      auto elementType =
+          cast<Type>(typeConverter->convertType(type.getElementType()));
+      auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
+      std::string name =
+          std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
+      uint64_t alignment = 0;
+      if (auto alignAttr = dyn_cast_or_null<IntegerAttr>(
+              gpuFuncOp.getWorkgroupAttributionAttr(
+                  idx, LLVM::LLVMDialect::getAlignAttrName())))
+        alignment = alignAttr.getInt();
+      auto globalOp = rewriter.create<LLVM::GlobalOp>(
+          gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
+          LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
+          workgroupAddrSpace);
+      workgroupBuffers.push_back(globalOp);
+    }
   }
 
   // Remap proper input types.
@@ -101,16 +130,20 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
   // attribute. The former is necessary for further translation while the
   // latter is expected by gpu.launch_func.
   if (gpuFuncOp.isKernel()) {
-    attributes.emplace_back(kernelAttributeName, rewriter.getUnitAttr());
+    if (kernelAttributeName)
+      attributes.emplace_back(*kernelAttributeName, rewriter.getUnitAttr());
     // Set the dialect-specific block size attribute if there is one.
     if (kernelBlockSizeAttributeName.has_value() && knownBlockSize) {
       attributes.emplace_back(kernelBlockSizeAttributeName.value(),
                               knownBlockSize);
     }
   }
+  LLVM::CConv callingConvention = gpuFuncOp.isKernel()
+                                      ? kernelCallingConvention
+                                      : nonKernelCallingConvention;
   auto llvmFuncOp = rewriter.create<LLVM::LLVMFuncOp>(
       gpuFuncOp.getLoc(), gpuFuncOp.getName(), funcType,
-      LLVM::Linkage::External, /*dsoLocal=*/false, /*cconv=*/LLVM::CConv::C,
+      LLVM::Linkage::External, /*dsoLocal=*/false, callingConvention,
       /*comdat=*/nullptr, attributes);
 
   {
@@ -125,24 +158,49 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
     rewriter.setInsertionPointToStart(&gpuFuncOp.front());
     unsigned numProperArguments = gpuFuncOp.getNumArguments();
 
-    for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
-      auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
-                                                global.getAddrSpace());
-      Value address = rewriter.create<LLVM::AddressOfOp>(
-          loc, ptrType, global.getSymNameAttr());
-      Value memory =
-          rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(), address,
-                                       ArrayRef<LLVM::GEPArg>{0, 0});
-
-      // Build a memref descriptor pointing to the buffer to plug with the
-      // existing memref infrastructure. This may use more registers than
-      // otherwise necessary given that memref sizes are fixed, but we can try
-      // and canonicalize that away later.
-      Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
-      auto type = cast<MemRefType>(attribution.getType());
-      auto descr = MemRefDescriptor::fromStaticShape(
-          rewriter, loc, *getTypeConverter(), type, memory);
-      signatureConversion.remapInput(numProperArguments + idx, descr);
+    if (encodeWorkgroupAttributionsAsArguments) {
+      unsigned numAttributions = gpuFuncOp.getNumWorkgroupAttributions();
+      assert(numProperArguments >= numAttributions &&
+             "Expecting attributions to be encoded as arguments already");
+
+      // Arguments encoding workgroup attributions will be in positions
+      // [numProperArguments, numProperArguments+numAttributions)
+      ArrayRef<BlockArgument> attributionArguments =
+          gpuFuncOp.getArguments().slice(numProperArguments - numAttributions,
+                                         numAttributions);
+      for (auto [idx, vals] : llvm::enumerate(llvm::zip_equal(
+               gpuFuncOp.getWorkgroupAttributions(), attributionArguments))) {
+        auto [attribution, arg] = vals;
+        auto type = cast<MemRefType>(attribution.getType());
+
+        // Arguments are of llvm.ptr type and attributions are of memref type:
+        // we need to wrap them in memref descriptors.
+        Value descr = MemRefDescriptor::fromStaticShape(
+            rewriter, loc, *getTypeConverter(), type, arg);
+
+        // And remap the arguments
+        signatureConversion.remapInput(numProperArguments + idx, descr);
+      }
+    } else {
+      for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
+        auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
+                                                  global.getAddrSpace());
+        Value address = rewriter.create<LLVM::AddressOfOp>(
+            loc, ptrType, global.getSymNameAttr());
+        Value memory =
+            rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(),
+                                         address, ArrayRef<LLVM::GEPArg>{0, 0});
+
+        // Build a memref descriptor pointing to the buffer to plug with the
+        // existing memref infrastructure. This may use more registers than
+        // otherwise necessary given that memref sizes are fixed, but we can try
+        // and canonicalize that away later.
+        Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
+        auto type = cast<MemRefType>(attribution.getType());
+        auto descr = MemRefDescriptor::fromStaticShape(
+            rewriter, loc, *getTypeConverter(), type, memory);
+        signatureConversion.remapInput(numProperArguments + idx, descr);
+      }
     }
 
     // Rewrite private memory attributions to alloca'ed buffers.
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
index 92e69badc27dd..781bea6b09406 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
@@ -35,16 +35,39 @@ struct GPUDynamicSharedMemoryOpLowering
   unsigned alignmentBit;
 };
 
+struct GPUFuncOpLoweringOptions {
+  /// The address space to use for `alloca`s in private memory.
+  unsigned allocaAddrSpace;
+  /// The address space to use declaring workgroup memory.
+  unsigned workgroupAddrSpace;
+
+  /// The attribute name to use instead of `gpu.kernel`.
+  std::optional<StringAttr> kernelAttributeName = std::nullopt;
+  /// The attribute name to to set block size
+  std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt;
+
+  /// The calling convention to use for kernel functions
+  LLVM::CConv kernelCallingConvention = LLVM::CConv::C;
+  /// The calling convention to use for non-kernel functions
+  LLVM::CConv nonKernelCallingConvention = LLVM::CConv::C;
+
+  /// Whether to encode workgroup attributions as additional arguments instead
+  /// of a global variable.
+  bool encodeWorkgroupAttributionsAsArguments = false;
+};
+
 struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
-  GPUFuncOpLowering(
-      const LLVMTypeConverter &converter, unsigned allocaAddrSpace,
-      unsigned workgroupAddrSpace, StringAttr kernelAttributeName,
-      std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt)
+  GPUFuncOpLowering(const LLVMTypeConverter &converter,
+                    const GPUFuncOpLoweringOptions &options)
       : ConvertOpToLLVMPattern<gpu::GPUFuncOp>(converter),
-        allocaAddrSpace(allocaAddrSpace),
-        workgroupAddrSpace(workgroupAddrSpace),
-        kernelAttributeName(kernelAttributeName),
-        kernelBlockSizeAttributeName(kernelBlockSizeAttributeName) {}
+        allocaAddrSpace(options.allocaAddrSpace),
+        workgroupAddrSpace(options.workgroupAddrSpace),
+        kernelAttributeName(options.kernelAttributeName),
+        kernelBlockSizeAttributeName(options.kernelBlockSizeAttributeName),
+        kernelCallingConvention(options.kernelCallingConvention),
+        nonKernelCallingConvention(options.nonKernelCallingConvention),
+        encodeWorkgroupAttributionsAsArguments(
+            options.encodeWorkgroupAttributionsAsArguments) {}
 
   LogicalResult
   matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
@@ -57,10 +80,18 @@ struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
   unsigned workgroupAddrSpace;
 
   /// The attribute name to use instead of `gpu.kernel`.
-  StringAttr kernelAttributeName;
-
+  std::optional<StringAttr> kernelAttributeName;
   /// The attribute name to to set block size
   std::optional<StringAttr> kernelBlockSizeAttributeName;
+
+  /// The calling convention to use for kernel functions
+  LLVM::CConv kernelCallingConvention;
+  /// The calling convention to use for non-kernel functions
+  LLVM::CConv nonKernelCallingConvention;
+
+  /// Whether to encode workgroup attributions as additional arguments instead
+  /// of a global variable.
+  bool encodeWorkgroupAttributionsAsArguments;
 };
 
 /// The lowering of gpu.printf to a call to HIP hostcalls
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
index da5650b2b68dd..d47c5e679d86e 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
@@ -6,7 +6,9 @@ add_mlir_conversion_library(MLIRGPUToLLVMSPV
 
   LINK_LIBS PUBLIC
   MLIRGPUDialect
+  MLIRGPUToGPURuntimeTransforms
   MLIRLLVMCommonConversion
   MLIRLLVMDialect
+  MLIRSPIRVAttrToLLVMConversion
   MLIRSPIRVDialect
 )
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
index 27d63b5f8948d..74dd5f19c20f5 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
@@ -8,15 +8,18 @@
 
 #include "mlir/Conversion/GPUToLLVMSPV/GPUToLLVMSPVPass.h"
 
+#include "../GPUCommon/GPUOpsLowering.h"
 #include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
 #include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
 #include "mlir/Conversion/LLVMCommon/Pattern.h"
 #include "mlir/Conversion/LLVMCommon/TypeConverter.h"
+#include "mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMTypes.h"
 #include "mlir/Dialect/SPIRV/IR/SPIRVDialect.h"
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
 #include "mlir/Dialect/SPIRV/IR/TargetAndABI.h"
 #include "mlir/IR/BuiltinTypes.h"
 #include "mlir/IR/Matchers.h"
@@ -321,8 +324,8 @@ struct GPUToLLVMSPVConversionPass final
     LLVMConversionTarget target(*context);
 
     target.addIllegalOp<gpu::BarrierOp, gpu::BlockDimOp, gpu::BlockIdOp,
-                        gpu::GlobalIdOp, gpu::GridDimOp, gpu::ShuffleOp,
-                        gpu::ThreadIdOp>();
+                        gpu::GPUFuncOp, gpu::GlobalIdOp, gpu::GridDimOp,
+                        gpu::ReturnOp, gpu::ShuffleOp, gpu::ThreadIdOp>();
 
     populateGpuToLLVMSPVConversionPatterns(converter, patterns);
 
@@ -340,11 +343,27 @@ struct GPUToLLVMSPVConversionPass final
 namespace mlir {
 void populateGpuToLLVMSPVConversionPatterns(LLVMTypeConverter &typeConverter,
                                             RewritePatternSet &patterns) {
-  patterns.add<GPUBarrierConversion, GPUShuffleConversion,
+  patterns.add<GPUBarrierConversion, GPUReturnOpLowering, GPUShuffleConversion,
                LaunchConfigOpConversion<gpu::BlockIdOp>,
                LaunchConfigOpConversion<gpu::GridDimOp>,
                LaunchConfigOpConversion<gpu::BlockDimOp>,
                LaunchConfigOpConversion<gpu::ThreadIdOp>,
                LaunchConfigOpConversion<gpu::GlobalIdOp>>(typeConverter);
+  constexpr spirv::ClientAPI clientAPI = spirv::ClientAPI::OpenCL;
+  MLIRContext *context = &typeConverter.getContext();
+  unsigned privateAddressSpace =
+      storageClassToAddressSpace(clientAPI, spirv::StorageClass::Function);
+  unsigned localAddressSpace =
+      storageClassToAddressSpace(clientAPI, spirv::StorageClass::Workgroup);
+  OperationName llvmFuncOpName(LLVM::LLVMFuncOp::getOperationName(), context);
+  StringAttr kernelBlockSizeAttributeName =
+      LLVM::LLVMFuncOp::getReqdWorkGroupSizeAttrName(llvmFuncOpName);
+  patterns.add<GPUFuncOpLowering>(
+      typeConverter,
+      GPUFuncOpLoweringOptions{
+          privateAddressSpace, localAddressSpace,
+          /*kernelAttributeName=*/std::nullopt, kernelBlockSizeAttributeName,
+          LLVM::CConv::SPIR_KERNEL, LLVM::CConv::SPIR_FUNC,
+          /*encodeWorkgroupAttributionsAsArguments=*/true});
 }
 } // namespace mlir
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index faa97caacb885..060a1e1e82f75 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -365,13 +365,15 @@ void mlir::populateGpuToNVVMConversionPatterns(LLVMTypeConverter &converter,
   // attributions since NVVM models it as `alloca`s in the default
   // memory space and does not support `alloca`s with addrspace(5).
   patterns.add<GPUFuncOpLowering>(
-      converter, /*allocaAddrSpace=*/0,
-      /*workgroupAddrSpace=*/
-      static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
-      StringAttr::get(&converter.getContext(),
-                      NVVM::NVVMDialect::getKernelFuncAttrName()),
-      StringAttr::get(&converter.getContext(),
-                      NVVM::NVVMDialect::getMaxntidAttrName()));
+      converter,
+      GPUFuncOpLoweringOptions{
+          /*allocaAddrSpace=*/0,
+          /*workgroupAddrSpace=*/
+          static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
+          StringAttr::get(&converter.getContext(),
+                          NVVM::NVVMDialect::getKernelFuncAttrName()),
+          StringAttr::get(&converter.getContext(),
+                          NVVM::NVVMDialect::getMaxntidAttrName())});
 
   populateOpPatterns<arith::RemFOp>(converter, patterns, "__nv_fmodf",
                                     "__nv_fmod");
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 100181cdc69fe..564bab1ad92b9 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -372,10 +372,11 @@ void mlir::populateGpuToROCDLConversionPatterns(
   patterns.add<GPUReturnOpLowering>(converter);
   patterns.add<GPUFuncOpLowering>(
       converter,
-      /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
-      /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
-      rocdlDialect->getKernelAttrHelper().getName(),
-      rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName());
+      GPUFuncOpLoweringOptions{
+          /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
+          /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
+          rocdlDialect->getKernelAttrHelper().getName(),
+          rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName()});
   if (Runtime::HIP == runtime) {
     patterns.add<GPUPrintfOpToHIPLowering>(converter);
   } else if (Runtime::OpenCL == runtime) {
diff --git a/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp b/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp
new file mode 100644
index 0000000000000..924bd1643f...
[truncated]

llvmbot · 2024-08-02T12:45:50Z

@llvm/pr-subscribers-mlir

Author: Victor Perez (victor-eds)

Changes

Add support in -convert-gpu-to-llvm-spv to convert gpu.func to llvm.func operations.

spir_kernel/spir_func calling conventions used for kernels/functions.
workgroup attributions encoded as additional llvm.ptr<3> arguments.
No attribute used to annotate kernels
reqd_work_group_size attribute using to encode gpu.known_block_size.

Patch is 53.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101664.diff

13 Files Affected:

(added) mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h (+18)
(modified) mlir/lib/Conversion/CMakeLists.txt (+1)
(modified) mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp (+101-43)
(modified) mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h (+41-10)
(modified) mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt (+2)
(modified) mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp (+22-3)
(modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+9-7)
(modified) mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp (+5-4)
(added) mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp (+61)
(added) mlir/lib/Conversion/SPIRVCommon/CMakeLists.txt (+6)
(modified) mlir/lib/Conversion/SPIRVToLLVM/CMakeLists.txt (+1)
(modified) mlir/lib/Conversion/SPIRVToLLVM/SPIRVToLLVM.cpp (+4-43)
(modified) mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir (+285)

diff --git a/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
new file mode 100644
index 0000000000000..a99dd0fe6f133
--- /dev/null
+++ b/mlir/include/mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h
@@ -0,0 +1,18 @@
+//===- AttrToLLVMConverter.h - SPIR-V attributes conversion to LLVM - C++ -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+#define MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
+
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
+
+namespace mlir {
+unsigned storageClassToAddressSpace(spirv::ClientAPI clientAPI,
+                                    spirv::StorageClass storageClass);
+} // namespace mlir
+
+#endif // MLIR_CONVERSION_SPIRVCOMMON_ATTRTOLLVMCONVERTER_H_
diff --git a/mlir/lib/Conversion/CMakeLists.txt b/mlir/lib/Conversion/CMakeLists.txt
index 80c8b84d9ae89..813f700c5556e 100644
--- a/mlir/lib/Conversion/CMakeLists.txt
+++ b/mlir/lib/Conversion/CMakeLists.txt
@@ -53,6 +53,7 @@ add_subdirectory(SCFToGPU)
 add_subdirectory(SCFToOpenMP)
 add_subdirectory(SCFToSPIRV)
 add_subdirectory(ShapeToStandard)
+add_subdirectory(SPIRVCommon)
 add_subdirectory(SPIRVToLLVM)
 add_subdirectory(TensorToLinalg)
 add_subdirectory(TensorToSPIRV)
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
index 6053e34f30a41..0007294b3ff27 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
@@ -25,29 +25,58 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
   Location loc = gpuFuncOp.getLoc();
 
   SmallVector<LLVM::GlobalOp, 3> workgroupBuffers;
-  workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
-  for (const auto [idx, attribution] :
-       llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
-    auto type = dyn_cast<MemRefType>(attribution.getType());
-    assert(type && type.hasStaticShape() && "unexpected type in attribution");
-
-    uint64_t numElements = type.getNumElements();
-
-    auto elementType =
-        cast<Type>(typeConverter->convertType(type.getElementType()));
-    auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
-    std::string name =
-        std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
-    uint64_t alignment = 0;
-    if (auto alignAttr =
-            dyn_cast_or_null<IntegerAttr>(gpuFuncOp.getWorkgroupAttributionAttr(
-                idx, LLVM::LLVMDialect::getAlignAttrName())))
-      alignment = alignAttr.getInt();
-    auto globalOp = rewriter.create<LLVM::GlobalOp>(
-        gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
-        LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
-        workgroupAddrSpace);
-    workgroupBuffers.push_back(globalOp);
+  if (encodeWorkgroupAttributionsAsArguments) {
+    ArrayRef<BlockArgument> workgroupAttributions =
+        gpuFuncOp.getWorkgroupAttributions();
+    std::size_t numAttributions = workgroupAttributions.size();
+
+    // Insert all arguments at the end.
+    unsigned index = gpuFuncOp.getNumArguments();
+    SmallVector<unsigned> argIndices(numAttributions, index);
+
+    // New arguments will simply be `llvm.ptr` with the correct address space
+    Type workgroupPtrType =
+        rewriter.getType<LLVM::LLVMPointerType>(workgroupAddrSpace);
+    SmallVector<Type> argTypes(numAttributions, workgroupPtrType);
+
+    // No argument attributes will be added
+    DictionaryAttr emptyDict = rewriter.getDictionaryAttr({});
+    SmallVector<DictionaryAttr> argAttrs(numAttributions, emptyDict);
+
+    // Location match function location
+    SmallVector<Location> argLocs(numAttributions, gpuFuncOp.getLoc());
+
+    // Perform signature modification
+    rewriter.modifyOpInPlace(
+        gpuFuncOp, [gpuFuncOp, &argIndices, &argTypes, &argAttrs, &argLocs]() {
+          static_cast<FunctionOpInterface>(gpuFuncOp).insertArguments(
+              argIndices, argTypes, argAttrs, argLocs);
+        });
+  } else {
+    workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
+    for (const auto [idx, attribution] :
+         llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
+      auto type = dyn_cast<MemRefType>(attribution.getType());
+      assert(type && type.hasStaticShape() && "unexpected type in attribution");
+
+      uint64_t numElements = type.getNumElements();
+
+      auto elementType =
+          cast<Type>(typeConverter->convertType(type.getElementType()));
+      auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
+      std::string name =
+          std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
+      uint64_t alignment = 0;
+      if (auto alignAttr = dyn_cast_or_null<IntegerAttr>(
+              gpuFuncOp.getWorkgroupAttributionAttr(
+                  idx, LLVM::LLVMDialect::getAlignAttrName())))
+        alignment = alignAttr.getInt();
+      auto globalOp = rewriter.create<LLVM::GlobalOp>(
+          gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
+          LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
+          workgroupAddrSpace);
+      workgroupBuffers.push_back(globalOp);
+    }
   }
 
   // Remap proper input types.
@@ -101,16 +130,20 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
   // attribute. The former is necessary for further translation while the
   // latter is expected by gpu.launch_func.
   if (gpuFuncOp.isKernel()) {
-    attributes.emplace_back(kernelAttributeName, rewriter.getUnitAttr());
+    if (kernelAttributeName)
+      attributes.emplace_back(*kernelAttributeName, rewriter.getUnitAttr());
     // Set the dialect-specific block size attribute if there is one.
     if (kernelBlockSizeAttributeName.has_value() && knownBlockSize) {
       attributes.emplace_back(kernelBlockSizeAttributeName.value(),
                               knownBlockSize);
     }
   }
+  LLVM::CConv callingConvention = gpuFuncOp.isKernel()
+                                      ? kernelCallingConvention
+                                      : nonKernelCallingConvention;
   auto llvmFuncOp = rewriter.create<LLVM::LLVMFuncOp>(
       gpuFuncOp.getLoc(), gpuFuncOp.getName(), funcType,
-      LLVM::Linkage::External, /*dsoLocal=*/false, /*cconv=*/LLVM::CConv::C,
+      LLVM::Linkage::External, /*dsoLocal=*/false, callingConvention,
       /*comdat=*/nullptr, attributes);
 
   {
@@ -125,24 +158,49 @@ GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
     rewriter.setInsertionPointToStart(&gpuFuncOp.front());
     unsigned numProperArguments = gpuFuncOp.getNumArguments();
 
-    for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
-      auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
-                                                global.getAddrSpace());
-      Value address = rewriter.create<LLVM::AddressOfOp>(
-          loc, ptrType, global.getSymNameAttr());
-      Value memory =
-          rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(), address,
-                                       ArrayRef<LLVM::GEPArg>{0, 0});
-
-      // Build a memref descriptor pointing to the buffer to plug with the
-      // existing memref infrastructure. This may use more registers than
-      // otherwise necessary given that memref sizes are fixed, but we can try
-      // and canonicalize that away later.
-      Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
-      auto type = cast<MemRefType>(attribution.getType());
-      auto descr = MemRefDescriptor::fromStaticShape(
-          rewriter, loc, *getTypeConverter(), type, memory);
-      signatureConversion.remapInput(numProperArguments + idx, descr);
+    if (encodeWorkgroupAttributionsAsArguments) {
+      unsigned numAttributions = gpuFuncOp.getNumWorkgroupAttributions();
+      assert(numProperArguments >= numAttributions &&
+             "Expecting attributions to be encoded as arguments already");
+
+      // Arguments encoding workgroup attributions will be in positions
+      // [numProperArguments, numProperArguments+numAttributions)
+      ArrayRef<BlockArgument> attributionArguments =
+          gpuFuncOp.getArguments().slice(numProperArguments - numAttributions,
+                                         numAttributions);
+      for (auto [idx, vals] : llvm::enumerate(llvm::zip_equal(
+               gpuFuncOp.getWorkgroupAttributions(), attributionArguments))) {
+        auto [attribution, arg] = vals;
+        auto type = cast<MemRefType>(attribution.getType());
+
+        // Arguments are of llvm.ptr type and attributions are of memref type:
+        // we need to wrap them in memref descriptors.
+        Value descr = MemRefDescriptor::fromStaticShape(
+            rewriter, loc, *getTypeConverter(), type, arg);
+
+        // And remap the arguments
+        signatureConversion.remapInput(numProperArguments + idx, descr);
+      }
+    } else {
+      for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
+        auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
+                                                  global.getAddrSpace());
+        Value address = rewriter.create<LLVM::AddressOfOp>(
+            loc, ptrType, global.getSymNameAttr());
+        Value memory =
+            rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(),
+                                         address, ArrayRef<LLVM::GEPArg>{0, 0});
+
+        // Build a memref descriptor pointing to the buffer to plug with the
+        // existing memref infrastructure. This may use more registers than
+        // otherwise necessary given that memref sizes are fixed, but we can try
+        // and canonicalize that away later.
+        Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
+        auto type = cast<MemRefType>(attribution.getType());
+        auto descr = MemRefDescriptor::fromStaticShape(
+            rewriter, loc, *getTypeConverter(), type, memory);
+        signatureConversion.remapInput(numProperArguments + idx, descr);
+      }
     }
 
     // Rewrite private memory attributions to alloca'ed buffers.
diff --git a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
index 92e69badc27dd..781bea6b09406 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
+++ b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
@@ -35,16 +35,39 @@ struct GPUDynamicSharedMemoryOpLowering
   unsigned alignmentBit;
 };
 
+struct GPUFuncOpLoweringOptions {
+  /// The address space to use for `alloca`s in private memory.
+  unsigned allocaAddrSpace;
+  /// The address space to use declaring workgroup memory.
+  unsigned workgroupAddrSpace;
+
+  /// The attribute name to use instead of `gpu.kernel`.
+  std::optional<StringAttr> kernelAttributeName = std::nullopt;
+  /// The attribute name to to set block size
+  std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt;
+
+  /// The calling convention to use for kernel functions
+  LLVM::CConv kernelCallingConvention = LLVM::CConv::C;
+  /// The calling convention to use for non-kernel functions
+  LLVM::CConv nonKernelCallingConvention = LLVM::CConv::C;
+
+  /// Whether to encode workgroup attributions as additional arguments instead
+  /// of a global variable.
+  bool encodeWorkgroupAttributionsAsArguments = false;
+};
+
 struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
-  GPUFuncOpLowering(
-      const LLVMTypeConverter &converter, unsigned allocaAddrSpace,
-      unsigned workgroupAddrSpace, StringAttr kernelAttributeName,
-      std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt)
+  GPUFuncOpLowering(const LLVMTypeConverter &converter,
+                    const GPUFuncOpLoweringOptions &options)
       : ConvertOpToLLVMPattern<gpu::GPUFuncOp>(converter),
-        allocaAddrSpace(allocaAddrSpace),
-        workgroupAddrSpace(workgroupAddrSpace),
-        kernelAttributeName(kernelAttributeName),
-        kernelBlockSizeAttributeName(kernelBlockSizeAttributeName) {}
+        allocaAddrSpace(options.allocaAddrSpace),
+        workgroupAddrSpace(options.workgroupAddrSpace),
+        kernelAttributeName(options.kernelAttributeName),
+        kernelBlockSizeAttributeName(options.kernelBlockSizeAttributeName),
+        kernelCallingConvention(options.kernelCallingConvention),
+        nonKernelCallingConvention(options.nonKernelCallingConvention),
+        encodeWorkgroupAttributionsAsArguments(
+            options.encodeWorkgroupAttributionsAsArguments) {}
 
   LogicalResult
   matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
@@ -57,10 +80,18 @@ struct GPUFuncOpLowering : ConvertOpToLLVMPattern<gpu::GPUFuncOp> {
   unsigned workgroupAddrSpace;
 
   /// The attribute name to use instead of `gpu.kernel`.
-  StringAttr kernelAttributeName;
-
+  std::optional<StringAttr> kernelAttributeName;
   /// The attribute name to to set block size
   std::optional<StringAttr> kernelBlockSizeAttributeName;
+
+  /// The calling convention to use for kernel functions
+  LLVM::CConv kernelCallingConvention;
+  /// The calling convention to use for non-kernel functions
+  LLVM::CConv nonKernelCallingConvention;
+
+  /// Whether to encode workgroup attributions as additional arguments instead
+  /// of a global variable.
+  bool encodeWorkgroupAttributionsAsArguments;
 };
 
 /// The lowering of gpu.printf to a call to HIP hostcalls
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
index da5650b2b68dd..d47c5e679d86e 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/CMakeLists.txt
@@ -6,7 +6,9 @@ add_mlir_conversion_library(MLIRGPUToLLVMSPV
 
   LINK_LIBS PUBLIC
   MLIRGPUDialect
+  MLIRGPUToGPURuntimeTransforms
   MLIRLLVMCommonConversion
   MLIRLLVMDialect
+  MLIRSPIRVAttrToLLVMConversion
   MLIRSPIRVDialect
 )
diff --git a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
index 27d63b5f8948d..74dd5f19c20f5 100644
--- a/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
+++ b/mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp
@@ -8,15 +8,18 @@
 
 #include "mlir/Conversion/GPUToLLVMSPV/GPUToLLVMSPVPass.h"
 
+#include "../GPUCommon/GPUOpsLowering.h"
 #include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
 #include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
 #include "mlir/Conversion/LLVMCommon/Pattern.h"
 #include "mlir/Conversion/LLVMCommon/TypeConverter.h"
+#include "mlir/Conversion/SPIRVCommon/AttrToLLVMConverter.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
 #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMTypes.h"
 #include "mlir/Dialect/SPIRV/IR/SPIRVDialect.h"
+#include "mlir/Dialect/SPIRV/IR/SPIRVEnums.h"
 #include "mlir/Dialect/SPIRV/IR/TargetAndABI.h"
 #include "mlir/IR/BuiltinTypes.h"
 #include "mlir/IR/Matchers.h"
@@ -321,8 +324,8 @@ struct GPUToLLVMSPVConversionPass final
     LLVMConversionTarget target(*context);
 
     target.addIllegalOp<gpu::BarrierOp, gpu::BlockDimOp, gpu::BlockIdOp,
-                        gpu::GlobalIdOp, gpu::GridDimOp, gpu::ShuffleOp,
-                        gpu::ThreadIdOp>();
+                        gpu::GPUFuncOp, gpu::GlobalIdOp, gpu::GridDimOp,
+                        gpu::ReturnOp, gpu::ShuffleOp, gpu::ThreadIdOp>();
 
     populateGpuToLLVMSPVConversionPatterns(converter, patterns);
 
@@ -340,11 +343,27 @@ struct GPUToLLVMSPVConversionPass final
 namespace mlir {
 void populateGpuToLLVMSPVConversionPatterns(LLVMTypeConverter &typeConverter,
                                             RewritePatternSet &patterns) {
-  patterns.add<GPUBarrierConversion, GPUShuffleConversion,
+  patterns.add<GPUBarrierConversion, GPUReturnOpLowering, GPUShuffleConversion,
                LaunchConfigOpConversion<gpu::BlockIdOp>,
                LaunchConfigOpConversion<gpu::GridDimOp>,
                LaunchConfigOpConversion<gpu::BlockDimOp>,
                LaunchConfigOpConversion<gpu::ThreadIdOp>,
                LaunchConfigOpConversion<gpu::GlobalIdOp>>(typeConverter);
+  constexpr spirv::ClientAPI clientAPI = spirv::ClientAPI::OpenCL;
+  MLIRContext *context = &typeConverter.getContext();
+  unsigned privateAddressSpace =
+      storageClassToAddressSpace(clientAPI, spirv::StorageClass::Function);
+  unsigned localAddressSpace =
+      storageClassToAddressSpace(clientAPI, spirv::StorageClass::Workgroup);
+  OperationName llvmFuncOpName(LLVM::LLVMFuncOp::getOperationName(), context);
+  StringAttr kernelBlockSizeAttributeName =
+      LLVM::LLVMFuncOp::getReqdWorkGroupSizeAttrName(llvmFuncOpName);
+  patterns.add<GPUFuncOpLowering>(
+      typeConverter,
+      GPUFuncOpLoweringOptions{
+          privateAddressSpace, localAddressSpace,
+          /*kernelAttributeName=*/std::nullopt, kernelBlockSizeAttributeName,
+          LLVM::CConv::SPIR_KERNEL, LLVM::CConv::SPIR_FUNC,
+          /*encodeWorkgroupAttributionsAsArguments=*/true});
 }
 } // namespace mlir
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index faa97caacb885..060a1e1e82f75 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -365,13 +365,15 @@ void mlir::populateGpuToNVVMConversionPatterns(LLVMTypeConverter &converter,
   // attributions since NVVM models it as `alloca`s in the default
   // memory space and does not support `alloca`s with addrspace(5).
   patterns.add<GPUFuncOpLowering>(
-      converter, /*allocaAddrSpace=*/0,
-      /*workgroupAddrSpace=*/
-      static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
-      StringAttr::get(&converter.getContext(),
-                      NVVM::NVVMDialect::getKernelFuncAttrName()),
-      StringAttr::get(&converter.getContext(),
-                      NVVM::NVVMDialect::getMaxntidAttrName()));
+      converter,
+      GPUFuncOpLoweringOptions{
+          /*allocaAddrSpace=*/0,
+          /*workgroupAddrSpace=*/
+          static_cast<unsigned>(NVVM::NVVMMemorySpace::kSharedMemorySpace),
+          StringAttr::get(&converter.getContext(),
+                          NVVM::NVVMDialect::getKernelFuncAttrName()),
+          StringAttr::get(&converter.getContext(),
+                          NVVM::NVVMDialect::getMaxntidAttrName())});
 
   populateOpPatterns<arith::RemFOp>(converter, patterns, "__nv_fmodf",
                                     "__nv_fmod");
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 100181cdc69fe..564bab1ad92b9 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -372,10 +372,11 @@ void mlir::populateGpuToROCDLConversionPatterns(
   patterns.add<GPUReturnOpLowering>(converter);
   patterns.add<GPUFuncOpLowering>(
       converter,
-      /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
-      /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
-      rocdlDialect->getKernelAttrHelper().getName(),
-      rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName());
+      GPUFuncOpLoweringOptions{
+          /*allocaAddrSpace=*/ROCDL::ROCDLDialect::kPrivateMemoryAddressSpace,
+          /*workgroupAddrSpace=*/ROCDL::ROCDLDialect::kSharedMemoryAddressSpace,
+          rocdlDialect->getKernelAttrHelper().getName(),
+          rocdlDialect->getReqdWorkGroupSizeAttrHelper().getName()});
   if (Runtime::HIP == runtime) {
     patterns.add<GPUPrintfOpToHIPLowering>(converter);
   } else if (Runtime::OpenCL == runtime) {
diff --git a/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp b/mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp
new file mode 100644
index 0000000000000..924bd1643f...
[truncated]

victor-eds · 2024-08-02T12:45:52Z

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h

+struct GPUFuncOpLoweringOptions {
+  /// The address space to use for `alloca`s in private memory.
+  unsigned allocaAddrSpace;
+  /// The address space to use declaring workgroup memory.
+  unsigned workgroupAddrSpace;
+
+  /// The attribute name to use instead of `gpu.kernel`.
+  std::optional<StringAttr> kernelAttributeName = std::nullopt;
+  /// The attribute name to to set block size
+  std::optional<StringAttr> kernelBlockSizeAttributeName = std::nullopt;
+
+  /// The calling convention to use for kernel functions
+  LLVM::CConv kernelCallingConvention = LLVM::CConv::C;
+  /// The calling convention to use for non-kernel functions
+  LLVM::CConv nonKernelCallingConvention = LLVM::CConv::C;
+
+  /// Whether to encode workgroup attributions as additional arguments instead
+  /// of a global variable.
+  bool encodeWorkgroupAttributionsAsArguments = false;
+};


This was getting out of hand. Cleaner this way.

victor-eds · 2024-08-02T12:46:11Z

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

+    workgroupBuffers.reserve(gpuFuncOp.getNumWorkgroupAttributions());
+    for (const auto [idx, attribution] :
+         llvm::enumerate(gpuFuncOp.getWorkgroupAttributions())) {
+      auto type = dyn_cast<MemRefType>(attribution.getType());
+      assert(type && type.hasStaticShape() && "unexpected type in attribution");
+
+      uint64_t numElements = type.getNumElements();
+
+      auto elementType =
+          cast<Type>(typeConverter->convertType(type.getElementType()));
+      auto arrayType = LLVM::LLVMArrayType::get(elementType, numElements);
+      std::string name =
+          std::string(llvm::formatv("__wg_{0}_{1}", gpuFuncOp.getName(), idx));
+      uint64_t alignment = 0;
+      if (auto alignAttr = dyn_cast_or_null<IntegerAttr>(
+              gpuFuncOp.getWorkgroupAttributionAttr(
+                  idx, LLVM::LLVMDialect::getAlignAttrName())))
+        alignment = alignAttr.getInt();
+      auto globalOp = rewriter.create<LLVM::GlobalOp>(
+          gpuFuncOp.getLoc(), arrayType, /*isConstant=*/false,
+          LLVM::Linkage::Internal, name, /*value=*/Attribute(), alignment,
+          workgroupAddrSpace);
+      workgroupBuffers.push_back(globalOp);
+    }


Original code

victor-eds · 2024-08-02T12:46:18Z

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

+      for (const auto [idx, global] : llvm::enumerate(workgroupBuffers)) {
+        auto ptrType = LLVM::LLVMPointerType::get(rewriter.getContext(),
+                                                  global.getAddrSpace());
+        Value address = rewriter.create<LLVM::AddressOfOp>(
+            loc, ptrType, global.getSymNameAttr());
+        Value memory =
+            rewriter.create<LLVM::GEPOp>(loc, ptrType, global.getType(),
+                                         address, ArrayRef<LLVM::GEPArg>{0, 0});
+
+        // Build a memref descriptor pointing to the buffer to plug with the
+        // existing memref infrastructure. This may use more registers than
+        // otherwise necessary given that memref sizes are fixed, but we can try
+        // and canonicalize that away later.
+        Value attribution = gpuFuncOp.getWorkgroupAttributions()[idx];
+        auto type = cast<MemRefType>(attribution.getType());
+        auto descr = MemRefDescriptor::fromStaticShape(
+            rewriter, loc, *getTypeConverter(), type, memory);
+        signatureConversion.remapInput(numProperArguments + idx, descr);
+      }


Original code

victor-eds · 2024-08-02T12:47:14Z

mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp

+  OperationName llvmFuncOpName(LLVM::LLVMFuncOp::getOperationName(), context);
+  StringAttr kernelBlockSizeAttributeName =
+      LLVM::LLVMFuncOp::getReqdWorkGroupSizeAttrName(llvmFuncOpName);


I've always thought this should be a static member... Is there a better way to do this? I didn't wanna add the static member function to the LLVM dialect, so I went with this.

This cannot be static, as an attribute requires the context present in the operation.

I know, I was wondering if at least the string name should be

victor-eds · 2024-08-02T12:47:36Z

mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp

@@ -0,0 +1,61 @@
+//===- AttrToLLVMConverter.cpp - SPIR-V attributes conversion to LLVM -C++ ===//


Moved from SPIR-V to LLVM conversion. More similar approach to other dialects.

victor-eds · 2024-08-02T12:48:01Z

mlir/lib/Conversion/SPIRVToLLVM/SPIRVToLLVM.cpp

@@ -273,47 +268,13 @@ static std::optional<Type> convertArrayType(spirv::ArrayType type,
  return LLVM::LLVMArrayType::get(llvmElementType, numElements);
 }

-static unsigned mapToOpenCLAddressSpace(spirv::StorageClass storageClass) {


victor-eds · 2024-08-02T12:48:45Z

mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir

+// CHECK-LABEL:        llvm.func spir_kernelcc @kernel_with_workgoup_attribs(
+// CHECK-SAME:             %[[VAL_27:.*]]: f32, %[[VAL_28:.*]]: i16, %[[VAL_29:.*]]: !llvm.ptr<3>, %[[VAL_30:.*]]: !llvm.ptr<3>) attributes {gpu.kernel} {


Additional arguments of llvm.ptr<3> type encode workgroup attributions

kurapov-peter

Thanks, Victor! This is really helpful, I was able to replace some dumb copy-pasted code locally :) Haven't tried the llvm.ptr<3> yet though. Would it require gpu.launch lowering changes?

victor-eds · 2024-08-02T13:58:50Z

Thanks, Victor! This is really helpful, I was able to replace some dumb copy-pasted code locally :) Haven't tried the llvm.ptr<3> yet though. Would it require gpu.launch lowering changes?

Yeah, definitely. In fact, I've just figured out we're dropping information on the size of this memory allocations. gpu.launch_func won't know the required size of these memory attributions, so we would need to attach that information as an attribute to the lowered function. The rest of the PR is ready for review.

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

mlir/lib/Conversion/SPIRVCommon/AttrToLLVMConverter.cpp

mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir

victor-eds · 2024-08-05T10:19:32Z

Yeah, definitely. In fact, I've just figured out we're dropping information on the size of this memory allocations. gpu.launch_func won't know the required size of these memory attributions, so we would need to attach that information as an attribute to the lowered function. The rest of the PR is ready for review.

Attached as llvm.mlir.workgroup_attrib_size.

victor-eds · 2024-08-05T11:01:11Z

Alternatively to the current numBytes approach for llvm.mlir.workgroup_attrib_size, we could attach a <NumElems, Type> tuple. WDYT?

kurapov-peter

LGTM!

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h

kurapov-peter · 2024-08-05T11:11:05Z

Alternatively to the current numBytes approach for llvm.mlir.workgroup_attrib_size, we could attach a <NumElems, Type> tuple. WDYT?

A tuple is probably more convenient, yup.

mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.td

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

victor-eds · 2024-08-06T11:58:17Z

@antiagainst @kuhar @FMarno can I get a review here? I'd like people with different views and context to take a look as this is touching different pieces

mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td

mlir/lib/Conversion/GPUToLLVMSPV/GPUToLLVMSPV.cpp

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h

mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td

victor-eds · 2024-08-08T12:11:20Z

Is anyone opposed to merging this in the current state? I don't think changes are controversial, and I have follow up work I'd like to push too

Dinistro

I'm fine with merging this from an LLVM dialect perspective. Just added one more nit to the GPU test, which seems to have an odd formatting.

mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir

llvm-ci · 2024-08-09T14:17:25Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building mlir at step 5 "build-check-mlir-build-only".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/2209

Here is the relevant piece of the build log for the reference:

Step 5 (build-check-mlir-build-only) failure: build (failure)
...
285.652 [317/16/4172] Building CXX object tools/mlir/examples/toy/Ch7/CMakeFiles/toyc-ch7.dir/mlir/LowerToAffineLoops.cpp.o
289.839 [316/16/4173] Building CXX object tools/mlir/test/lib/Dialect/Test/CMakeFiles/MLIRTestToLLVMIRTranslation.dir/TestToLLVMIRTranslation.cpp.o
289.869 [315/16/4174] Building CXX object tools/mlir/examples/toy/Ch7/CMakeFiles/toyc-ch7.dir/mlir/Dialect.cpp.o
289.887 [314/16/4175] Building CXX object tools/mlir/examples/toy/Ch7/CMakeFiles/toyc-ch7.dir/parser/AST.cpp.o
289.913 [313/16/4176] Building CXX object tools/mlir/examples/toy/Ch7/CMakeFiles/toyc-ch7.dir/mlir/ToyCombine.cpp.o
289.938 [312/16/4177] Building MyExtension.h.inc...
289.962 [311/16/4178] Building MyExtension.cpp.inc...
304.777 [310/16/4179] Building CXX object tools/mlir/examples/toy/Ch6/CMakeFiles/toyc-ch6.dir/mlir/LowerToLLVM.cpp.o
305.123 [309/16/4180] Building CXX object tools/mlir/examples/transform/Ch2/lib/CMakeFiles/MyExtensionCh2.dir/MyExtension.cpp.o
311.723 [308/16/4181] Building CXX object tools/mlir/lib/Dialect/LLVMIR/CMakeFiles/obj.MLIRLLVMDialect.dir/IR/LLVMDialect.cpp.o
FAILED: tools/mlir/lib/Dialect/LLVMIR/CMakeFiles/obj.MLIRLLVMDialect.dir/IR/LLVMDialect.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/g++-7 -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/lib/Dialect/LLVMIR -I/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR -I/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/include -I/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/llvm/include -I/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include -I/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++1z -MD -MT tools/mlir/lib/Dialect/LLVMIR/CMakeFiles/obj.MLIRLLVMDialect.dir/IR/LLVMDialect.cpp.o -MF tools/mlir/lib/Dialect/LLVMIR/CMakeFiles/obj.MLIRLLVMDialect.dir/IR/LLVMDialect.cpp.o.d -o tools/mlir/lib/Dialect/LLVMIR/CMakeFiles/obj.MLIRLLVMDialect.dir/IR/LLVMDialect.cpp.o -c /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
g++-7: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
312.728 [308/15/4182] Building CXX object tools/mlir/examples/toy/Ch7/CMakeFiles/toyc-ch7.dir/mlir/LowerToLLVM.cpp.o
313.520 [308/14/4183] Building CXX object tools/mlir/unittests/Target/LLVM/CMakeFiles/MLIRTargetLLVMTests.dir/SerializeROCDLTarget.cpp.o
In file included from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h:34:0,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/InitAllDialects.h:95,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp:12:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc: In member function ‘llvm::ArrayRef<long int> mlir::xegpu::CreateNdDescOp::getStaticStrides()’:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc:1218:26: warning: unused variable ‘offset’ [-Wunused-variable]
     auto [strides, offset] = getStridesAndOffset(memrefType);
                          ^
314.801 [308/13/4184] Building CXX object tools/mlir/unittests/Target/LLVM/CMakeFiles/MLIRTargetLLVMTests.dir/SerializeNVVMTarget.cpp.o
In file included from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h:34:0,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/InitAllDialects.h:95,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp:13:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc: In member function ‘llvm::ArrayRef<long int> mlir::xegpu::CreateNdDescOp::getStaticStrides()’:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc:1218:26: warning: unused variable ‘offset’ [-Wunused-variable]
     auto [strides, offset] = getStridesAndOffset(memrefType);
                          ^
315.915 [308/12/4185] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
In file included from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h:34:0,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/InitAllDialects.h:95,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/tools/mlir-reduce/mlir-reduce.cpp:18:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc: In member function ‘llvm::ArrayRef<long int> mlir::xegpu::CreateNdDescOp::getStaticStrides()’:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc:1218:26: warning: unused variable ‘offset’ [-Wunused-variable]
     auto [strides, offset] = getStridesAndOffset(memrefType);
                          ^
320.041 [308/11/4186] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
In file included from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h:34:0,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/include/mlir/InitAllDialects.h:95,
                 from /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/lib/CAPI/RegisterEverything/RegisterEverything.cpp:13:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc: In member function ‘llvm::ArrayRef<long int> mlir::xegpu::CreateNdDescOp::getStaticStrides()’:
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/tools/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.h.inc:1218:26: warning: unused variable ‘offset’ [-Wunused-variable]
     auto [strides, offset] = getStridesAndOffset(memrefType);
                          ^

* 'main' of https://github.com/llvm/llvm-project: (700 commits) [SandboxIR][NFC] SingleLLVMInstructionImpl class (llvm#102687) [ThinLTO]Clean up 'import-assume-unique-local' flag. (llvm#102424) [nsan] Make #include more conventional [SandboxIR][NFC] Use Tracker.emplaceIfTracking() [libc] Moved range_reduction_double ifdef statement (llvm#102659) [libc] Fix CFP long double and add tests (llvm#102660) [TargetLowering] Handle vector types in expandFixedPointMul (llvm#102635) [compiler-rt][NFC] Replace environment variable with %t (llvm#102197) [UnitTests] Convert a test to use opaque pointers (llvm#102668) [CodeGen][NFCI] Don't re-implement parts of ASTContext::getIntWidth (llvm#101765) [SandboxIR] Clean up tracking code with the help of emplaceIfTracking() (llvm#102406) [mlir][bazel] remove extra blanks in mlir-tblgen test [NVPTX][NFC] Update tests to use bfloat type (llvm#101493) [mlir] Add support for parsing nested PassPipelineOptions (llvm#101118) [mlir][bazel] add missing td dependency in mlir-tblgen test [flang][cuda] Fix lib dependency [libc] Clean up remaining use of *_WIDTH macros in printf (llvm#102679) [flang][cuda] Convert cuf.alloc for box to fir.alloca in device context (llvm#102662) [SandboxIR] Implement the InsertElementInst class (llvm#102404) [libc] Fix use of cpp::numeric_limits<...>::digits (llvm#102674) [mlir][ODS] Verify type constraints in Types and Attributes (llvm#102326) [LTO] enable `ObjCARCContractPass` only on optimized build (llvm#101114) [mlir][ODS] Consistent `cppType` / `cppClassName` usage (llvm#102657) [lldb] Move definition of SBSaveCoreOptions dtor out of header (llvm#102539) [libc] Use cpp::numeric_limits in preference to C23 <limits.h> macros (llvm#102665) [clang] Implement -fptrauth-auth-traps. (llvm#102417) [LLVM][rtsan] rtsan transform to preserve CFGAnalyses (llvm#102651) Revert "[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086)" [RISCV][GISel] Add missing tests for G_CTLZ/CTTZ instruction selection. NFC Return available function types for BindingDecls. (llvm#102196) [clang] Wire -fptrauth-returns to "ptrauth-returns" fn attribute. (llvm#102416) [RISCV] Remove riscv-experimental-rv64-legal-i32. (llvm#102509) [RISCV] Move PseudoVSET(I)VLI expansion to use PseudoInstExpansion. (llvm#102496) [NVPTX] support switch statement with brx.idx (reland) (llvm#102550) [libc][newhdrgen]sorted function names in yaml (llvm#102544) [GlobalIsel] Combine G_ADD and G_SUB with constants (llvm#97771) Suppress spurious warnings due to R_RISCV_SET_ULEB128 [scudo] Separated committed and decommitted entries. (llvm#101409) [MIPS] Fix missing ANDI optimization (llvm#97689) [Clang] Add env var for nvptx-arch/amdgpu-arch timeout (llvm#102521) [asan] Switch allocator to dynamic base address (llvm#98511) [AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086) [libc][math][c23] Add fadd{l,f128} C23 math functions (llvm#102531) [mlir][bazel] revert bazel rule change for DLTITransformOps [msan] Support vst{2,3,4}_lane instructions (llvm#101215) Revert "[MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561)" [X86] pr57673.ll - generate MIR test checks [mlir][vector][test] Split tests from vector-transfer-flatten.mlir (llvm#102584) [mlir][bazel] add bazel rule for DLTITransformOps OpenMPOpt: Remove dead include [IR] Add method to GlobalVariable to change type of initializer. (llvm#102553) [flang][cuda] Force default allocator in device code (llvm#102238) [llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (llvm#102578) [MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561) [AMDGPU][AsmParser][NFC] Remove a misleading comment. (llvm#102604) [Arm][AArch64][Clang] Respect function's branch protection attributes. (llvm#101978) [mlir] Verifier: steal bit to track seen instead of set. (llvm#102626) [Clang] Fix Handling of Init Capture with Parameter Packs in LambdaScopeForCallOperatorInstantiationRAII (llvm#100766) [X86] Convert truncsat clamping patterns to use SDPatternMatch. NFC. [gn] Give two scripts argparse.RawDescriptionHelpFormatter [bazel] Add missing dep for the SPIRVToLLVM target [Clang] Simplify specifying passes via -Xoffload-linker (llvm#102483) [bazel] Port for d45de80 [SelectionDAG] Use unaligned store/load to move AVX registers onto stack for `insertelement` (llvm#82130) [Clang][OMPX] Add the code generation for multi-dim `num_teams` (llvm#101407) [ARM] Regenerate big-endian-vmov.ll. NFC [AMDGPU][AsmParser][NFCI] All NamedIntOperands to be of the i32 type. (llvm#102616) [libc][math][c23] Add totalorderl function. (llvm#102564) [mlir][spirv] Support `memref` in `convert-to-spirv` pass (llvm#102534) [MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (llvm#101664) Fix a unit test input file (llvm#102567) [llvm-readobj][COFF] Dump hybrid objects for ARM64X files. (llvm#102245) AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (llvm#102614) [MemoryBuiltins] Simplify getCalledFunction() helper (NFC) [AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. (llvm#102105) [MemoryBuiltins] Handle allocator attributes on call-site LSV/test/AArch64: add missing lit.local.cfg; fix build (llvm#102607) Revert "Enable logf128 constant folding for hosts with 128bit floats (llvm#96287)" [RISCV] Add Syntacore SCR5 RV32/64 processors definition (llvm#102285) [InstCombine] Remove unnecessary RUN line from test (NFC) [flang][OpenMP] Handle multiple ranges in `num_teams` clause (llvm#102535) [mlir][vector] Add tests for scalable vectors in one-shot-bufferize.mlir (llvm#102361) [mlir][vector] Disable `vector.matrix_multiply` for scalable vectors (llvm#102573) [clang] Implement CWG2627 Bit-fields and narrowing conversions (llvm#78112) [NFC] Use references to avoid copying (llvm#99863) Revert "[mlir][ArmSME] Pattern to swap shape_cast(tranpose) with transpose(shape_cast) (llvm#100731)" (llvm#102457) [IRBuilder] Generate nuw GEPs for struct member accesses (llvm#99538) [bazel] Port for 9b06e25 [CodeGen][NewPM] Improve start/stop pass error message CodeGenPassBuilder (llvm#102591) [AArch64] Implement TRBMPAM_EL1 system register (llvm#102485) [InstCombine] Fixing wrong select folding in vectors with undef elements (llvm#102244) [AArch64] Sink operands to fmuladd. (llvm#102297) LSV: document hang reported in llvm#37865 (llvm#102479) Enable logf128 constant folding for hosts with 128bit floats (llvm#96287) [RISCV][clang] Remove bfloat base type in non-zvfbfmin vcreate (llvm#102146) [RISCV][clang] Add missing `zvfbfmin` to `vget_v` intrinsic (llvm#102149) [mlir][vector] Add mask elimination transform (llvm#99314) [Clang][Interp] Fix display of syntactically-invalid note for member function calls (llvm#102170) [bazel] Port for 3fffa6d [DebugInfo][RemoveDIs] Use iterator-inserters in clang (llvm#102006) ... Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>

victor-eds requested a review from FMarno August 2, 2024 12:45

victor-eds self-assigned this Aug 2, 2024

victor-eds requested review from antiagainst and kuhar as code owners August 2, 2024 12:45

llvmbot added mlir:gpu mlir:spirv mlir labels Aug 2, 2024

victor-eds commented Aug 2, 2024

View reviewed changes

kurapov-peter reviewed Aug 2, 2024

View reviewed changes

victor-eds commented Aug 2, 2024

View reviewed changes

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp Outdated Show resolved Hide resolved

kuhar reviewed Aug 2, 2024

View reviewed changes

Apply suggestions and implement llvm.mlir.workgroup_attrib_size

098af95

llvmbot added the mlir:llvm label Aug 5, 2024

victor-eds requested review from kuhar and kurapov-peter August 5, 2024 10:18

kurapov-peter approved these changes Aug 5, 2024

View reviewed changes

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h Show resolved Hide resolved

joker-eph reviewed Aug 5, 2024

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.td Outdated Show resolved Hide resolved

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp Show resolved Hide resolved

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp Show resolved Hide resolved

victor-eds added 2 commits August 5, 2024 13:52

Use tuple to encode workgroup attribution in LLVM

36d5bf0

Add doc

ed7b600

victor-eds requested a review from joker-eph August 5, 2024 13:33

Use discardableAttrs

81bf21c

victor-eds requested a review from gysit August 6, 2024 11:58

Dinistro reviewed Aug 6, 2024

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td Outdated Show resolved Hide resolved

victor-eds requested a review from Dinistro August 6, 2024 13:59

gysit reviewed Aug 6, 2024

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td Outdated Show resolved Hide resolved

mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td Outdated Show resolved Hide resolved

mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td Outdated Show resolved Hide resolved

victor-eds added 2 commits August 6, 2024 17:56

attrib->attribution

361c336

Change doc

bf25aec

victor-eds requested a review from gysit August 6, 2024 16:59

FMarno reviewed Aug 6, 2024

View reviewed changes

victor-eds added 2 commits August 7, 2024 08:48

Address comments

3981cc8

Improve doc

af5955a

victor-eds requested a review from FMarno August 7, 2024 07:54

FMarno approved these changes Aug 7, 2024

View reviewed changes

Dinistro approved these changes Aug 9, 2024

View reviewed changes

mlir/test/Conversion/GPUToLLVMSPV/gpu-to-llvm-spv.mlir Outdated Show resolved Hide resolved

Format tests

512724f

victor-eds merged commit d45de80 into llvm:main Aug 9, 2024
8 checks passed

victor-eds deleted the gpu-func-llvm-spv-conversion branch August 9, 2024 14:09

kurapov-peter mentioned this pull request Aug 12, 2024

[mlir][gpu] Skip address space checks for memrefs between launchFuncOp and kernel func #102925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` #101664

[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` #101664

victor-eds commented Aug 2, 2024 •

edited

Loading

llvmbot commented Aug 2, 2024 •

edited

Loading

llvmbot commented Aug 2, 2024

victor-eds Aug 2, 2024

victor-eds Aug 2, 2024

victor-eds Aug 2, 2024

victor-eds Aug 2, 2024

Dinistro Aug 9, 2024

victor-eds Aug 9, 2024

victor-eds Aug 2, 2024

victor-eds Aug 2, 2024

victor-eds Aug 2, 2024

kurapov-peter left a comment

victor-eds commented Aug 2, 2024

victor-eds commented Aug 5, 2024

victor-eds commented Aug 5, 2024

kurapov-peter left a comment

kurapov-peter commented Aug 5, 2024

victor-eds commented Aug 6, 2024

victor-eds commented Aug 8, 2024

Dinistro left a comment

llvm-ci commented Aug 9, 2024

		@@ -0,0 +1,61 @@
		//===- AttrToLLVMConverter.cpp - SPIR-V attributes conversion to LLVM -C++ ===//

		// CHECK-LABEL: llvm.func spir_kernelcc @kernel_with_workgoup_attribs(
		// CHECK-SAME: %[[VAL_27:.]]: f32, %[[VAL_28:.]]: i16, %[[VAL_29:.]]: !llvm.ptr<3>, %[[VAL_30:.]]: !llvm.ptr<3>) attributes {gpu.kernel} {

[MLIR][GPU-LLVM] Convert gpu.func to llvm.func #101664

[MLIR][GPU-LLVM] Convert gpu.func to llvm.func #101664

Conversation

victor-eds commented Aug 2, 2024 • edited Loading

llvmbot commented Aug 2, 2024 • edited Loading

llvmbot commented Aug 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kurapov-peter left a comment

Choose a reason for hiding this comment

victor-eds commented Aug 2, 2024

victor-eds commented Aug 5, 2024

victor-eds commented Aug 5, 2024

kurapov-peter left a comment

Choose a reason for hiding this comment

kurapov-peter commented Aug 5, 2024

victor-eds commented Aug 6, 2024

victor-eds commented Aug 8, 2024

Dinistro left a comment

Choose a reason for hiding this comment

llvm-ci commented Aug 9, 2024

[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` #101664

[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` #101664

victor-eds commented Aug 2, 2024 •

edited

Loading

llvmbot commented Aug 2, 2024 •

edited

Loading