-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc. bug: Launch params (1024, 1, 1) are larger than launch bounds (256) for kernel _ZL12rms_norm_f32ILi1024EEvPKfPfif please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program ! #10610
Comments
This issue needs more information to debug. Please take a look at the "Bug (model use)" template and either re-open the issue using that template or provide the corresponding information here. In particular, please reproduce the issue using llama.cpp only. |
I don't know how to fix this issue. |
It is failing in |
I also encountered the same issue. I use HIPBLAS built the code.
|
Please open a new issue and fill out the "model use" template. |
Actually, if you're also using the same special GPU it will most likely not be possible to make it work unless a developer invests the effort to support it (which is not likely). |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
For some unknown reason, those DCU SDK people selects 256 for default launch bound instead of the common value 1024, which makes some assumptions broken for ops like Workaround: as indicated by the error, simply add HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" HIPFLAGS=" --gpu-max-threads-per-block=1024 " \
cmake .. -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906 Will report this issue to them; should this be documented in e.g. build.md, chapter HIP ? |
Related: ggml-org#10610 Signed-off-by: fxzjshm <fxzjshm@163.com>
Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: ggml-org#10610, ggml-org#11619 Signed-off-by: fxzjshm <fxzjshm@163.com>
Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: ggml-org#10610, ggml-org#11619 Signed-off-by: fxzjshm <fxzjshm@163.com>
Name and Version
I use ollama to run this model but something is wrong. and it show like that
llama_new_context_with_model: graph splits = 2
Launch params (1024, 1, 1) are larger than launch bounds (256) for kernel _ZL12rms_norm_f32ILi1024EEvPKfPfif please add launch_bounds to kernel define or use --gpu-max-threads-per-block recompile program !
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
No response
Problem description & steps to reproduce
I use ollama to run this model but something is wrong. and it show like that
llama_new_context_with_model: graph splits = 2
Launch params (1024, 1, 1) are larger than launch bounds (256) for kernel _ZL12rms_norm_f32ILi1024EEvPKfPfif please add launch_bounds to kernel define or use --gpu-max-threads-per-block recompile program !
First Bad Commit
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: