Skip to content

Commit

Permalink
HIP: force max threads per block to be 1024
Browse files Browse the repository at this point in the history
Some old compilers still use 256. Explicitly set it to 1024 to get correct
result from ops like ARGMAX and GROUP_NORM.

Related: #10610, #11619
Signed-off-by: fxzjshm <fxzjshm@163.com>
  • Loading branch information
fxzjshm committed Feb 4, 2025
1 parent d92cb67 commit 59ad593
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions ggml/src/ggml-hip/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ endif()

message(STATUS "HIP and hipBLAS found")

# Workaround old compilers
set(CMAKE_HIP_FLAGS "${CMAKE_HIP_FLAGS} --gpu-max-threads-per-block=1024")

file(GLOB GGML_HEADERS_ROCM "../ggml-cuda/*.cuh")
list(APPEND GGML_HEADERS_ROCM "../../include/ggml-cuda.h")

Expand Down

0 comments on commit 59ad593

Please sign in to comment.