sync : ggml #2737

ggerganov · 2025-01-14T07:54:19Z

No description provided.

* Added init tensor calling code * Added get_alloc_size forwarding * Cleaned up and improved type/error handling. * fix: remove trailing whitespaces. * Cleanup and use GGML error logging functions. * Handle potentially dangerous edge cases. * Apply suggestions from code review Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

…tary driver (llama/11074) * Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver * Add (TM) to AMD name check

* CUDA: add BF16 support

…ama/11087) * SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 * Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" This reverts commit f62dc45f318e48d375e7734b34cbddee81deed52. * Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6

Remove duplicated macros, use GGML_LOG_ERROR for errors

* GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

* fix: Vulkan shader gen binary path when cross compiling

…(llama/11117) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive

This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for quantised int8 datatype. This change results in 10% - 70% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

Signed-off-by: hydai <z54981220@gmail.com>

* SYCL: refactor ggml_sycl_compute_forward * SYCL: add back GGML_USED(dst) to ggml_sycl_cpy * SYCL: add function name to noop debug * SYCL: Some device info print refactoring and add details of XMX availability

@compilade

llama: add support for QRWKV6 model architecture (llama/11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>

…roup_size_control validation error (llama/11161) * Vulkan: Remove float16 use in shaders * Fix validation error about subgroup_size_control extension

… (llama/11211) Build fails when using HIP and GGML_BACKEND_DL: ``` /usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg' collect2: error: ld returned 1 exit status ``` This patch fixes this.

…e improvements) (llama/11042) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <slarengh@gmail.com>

--------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>

ggml-ci

rgerganov and others added 23 commits January 14, 2025 09:49

ggml : allow loading backend with env variable (ggml/1059)

19c147b

ref: #1058

fix: Vulkan shader gen binary path (llama/11037)

95cd1d3

Vulkan: Add device-specific blacklist for coopmat for the AMD proprie…

fbe4db4

…tary driver (llama/11074) * Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver * Add (TM) to AMD name check

CUDA: add BF16 support (llama/11093)

9325f4a

* CUDA: add BF16 support

rpc : code cleanup (llama/11107)

f078351

Remove duplicated macros, use GGML_LOG_ERROR for errors

ggml-backend : only offload from host buffers (llama/11120)

c324f37

ggml-backend : only offload from host buffers (fix) (llama/11124)

6679500

GGUF: C++ refactor, backend support, misc fixes (llama/11030)

a48be79

* GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types

fix: Vulkan shader gen binary path when Cross-compiling (llama/11096)

f2031c5

* fix: Vulkan shader gen binary path when cross compiling

Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …

e322e91

…(llama/11117) * Disable GL_KHR_cooperative_matrix Vulkan extension if not available. * Perform Vulkan extensions checks in a more sensible order * Remove unnecessary #ifdef directive

fix: add missing msg in static_assert (llama/11143)

40aa3fa

Signed-off-by: hydai <z54981220@gmail.com>

SYCL: Refactor ggml_sycl_compute_forward (llama/11121)

fe7bb88

* SYCL: refactor ggml_sycl_compute_forward * SYCL: add back GGML_USED(dst) to ggml_sycl_cpy * SYCL: add function name to noop debug * SYCL: Some device info print refactoring and add details of XMX availability

Vulkan: Fix float16 use on devices without float16 support + fix subg…

618d94a

…roup_size_control validation error (llama/11161) * Vulkan: Remove float16 use in shaders * Fix validation error about subgroup_size_control extension

GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)

6a0e8b5

ggml-ci

sync : ggml

41f3a68

talk-llama : sync llama.cpp

ef62a22

ggerganov merged commit 99b011a into master Jan 14, 2025
45 checks passed

ggerganov deleted the sync-ggml-25-01-14 branch January 14, 2025 08:38

ggerganov mentioned this pull request Jan 16, 2025

Fix whisper.objc Xcode project #2736

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : ggml #2737

sync : ggml #2737

ggerganov commented Jan 14, 2025

sync : ggml #2737

sync : ggml #2737

Conversation

ggerganov commented Jan 14, 2025