-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync : llama.cpp #1020
sync : llama.cpp #1020
Conversation
…ags (llama/10314)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.
* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci
ggml-ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 7 out of 27 changed files in this pull request and generated no suggestions.
Files not reviewed (20)
- CMakeLists.txt: Language not supported
- scripts/sync-llama.last: Language not supported
- src/ggml-aarch64.c: Language not supported
- src/ggml-amx/ggml-amx.cpp: Language not supported
- src/ggml-backend.cpp: Language not supported
- src/ggml-cpu/CMakeLists.txt: Language not supported
- src/ggml-cpu/ggml-cpu-aarch64.c: Language not supported
- src/ggml-cpu/ggml-cpu.c: Language not supported
- src/ggml-cpu/llamafile/sgemm.cpp: Language not supported
- src/ggml-cuda/CMakeLists.txt: Language not supported
- src/ggml-cuda/ggml-cuda.cu: Language not supported
- src/ggml-cuda/mmv.cu: Language not supported
- src/ggml-cuda/mmv.cuh: Language not supported
- src/ggml-hip/CMakeLists.txt: Language not supported
- src/ggml-metal/CMakeLists.txt: Language not supported
- src/ggml-metal/ggml-metal-impl.h: Language not supported
- src/ggml-musa/CMakeLists.txt: Language not supported
- src/ggml-opt.cpp: Language not supported
- src/ggml-vulkan/ggml-vulkan.cpp: Language not supported
- src/ggml-vulkan/vulkan-shaders/mul_mat_vec.comp: Language not supported
@JohannesGaessler The I can reproduce this also on while ./bin/test-opt ; do date ; done The CPU backend would also occasionally fail in the
|
I managed to get a stacktrace for one of the seg faults:
|
When running cc: @slaren |
It is indirectly tested in any test that run llama.cpp. I agree it would be good to have tests for it, but it's not an easy component to write unit tests for. At some point I will probably rewrite it in C++ with testing in mind. I don't see how it could cause |
I think I worded my post poorly. I agree that in this particular instance the bug is overwhelmingly likely in |
No description provided.