HIP: force max threads per block to be 1024 #11621

fxzjshm · 2025-02-03T14:34:46Z

Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM.

IMbackK

over all i am fine with this, but will defer to @slaren on if this kind of vendor behavior is something we want to support (see discussion in #11619)

IMbackK · 2025-02-03T22:29:54Z

ggml/src/ggml-hip/CMakeLists.txt

@@ -40,6 +40,9 @@ find_package(hip     REQUIRED)
 find_package(hipblas REQUIRED)
 find_package(rocblas REQUIRED)

+# Workaround old compilers


please move this down a bit as the find_package calls and the version check below are logically related operations

slaren · 2025-02-03T22:37:46Z

I saw the discussion, but don't have any knowledge about HIP/ROCm to have an opinion about this. If you think that it is not likely to cause issues to other users, feel free to merge it.

Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: ggml-org#10610, ggml-org#11619 Signed-off-by: fxzjshm <fxzjshm@163.com>

fxzjshm · 2025-02-04T05:24:31Z

@IMbackK Moved. Is this place proper?

@slaren This compiler flag is documented at https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-gpu-max-threads-per-block. I've also compiled with ROCm 6.3.1 and no compile error is given, now testing test-backend-ops.

Update: test-backend-ops w/ ROCm 6.3.1 on gfx1100 passed.

IMbackK

This just resets the default, also maximum value for all current amd gpus, it dosent change the code generation at all on sane versions of llvm at this time. We might run in to problems in the future if amd changes this for a new gpu arch - but i think this is an acceptable risk.

Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm. Signed-off-by: fxzjshm <fxzjshm@163.com>

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 3, 2025

IMbackK reviewed Feb 3, 2025

View reviewed changes

HIP: force max threads per block to be 1024

59ad593

Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: ggml-org#10610, ggml-org#11619 Signed-off-by: fxzjshm <fxzjshm@163.com>

fxzjshm force-pushed the hip-launch_bounds branch from 7e596d4 to 59ad593 Compare February 4, 2025 05:10

IMbackK approved these changes Feb 4, 2025

View reviewed changes

IMbackK merged commit 3ec9fd4 into ggml-org:master Feb 4, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP: force max threads per block to be 1024 #11621

HIP: force max threads per block to be 1024 #11621

fxzjshm commented Feb 3, 2025

IMbackK left a comment

IMbackK Feb 3, 2025

slaren commented Feb 3, 2025

fxzjshm commented Feb 4, 2025 •

edited

Loading

IMbackK left a comment •

edited

Loading

HIP: force max threads per block to be 1024 #11621

HIP: force max threads per block to be 1024 #11621

Conversation

fxzjshm commented Feb 3, 2025

IMbackK left a comment

Choose a reason for hiding this comment

IMbackK Feb 3, 2025

Choose a reason for hiding this comment

slaren commented Feb 3, 2025

fxzjshm commented Feb 4, 2025 • edited Loading

IMbackK left a comment • edited Loading

Choose a reason for hiding this comment

fxzjshm commented Feb 4, 2025 •

edited

Loading

IMbackK left a comment •

edited

Loading