Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[auto] Sync version 2403021812.0.0+llamacpp-release.b2316
== Relevant log messages from source repo: commit bbde6eb2561153aabbdfac5001c690fe00cad639 Author: Kawrakow <48489457+ikawrakow@users.noreply.github.com> Date: Sat Mar 2 17:00:51 2024 +0200 ggml : IQ3_S improvements (#5829) * iq3_s: somewhat faster AVX2 dot product On Ryzen a 7950X TG-128 increases to 16 t/s from 15.5 t/s using 16 threads. For 8 threads it is 13.85 t/s vs 11.75 t/s. PP-512 increases to 28.5 t/s from 23.8 t/s. * iq3_s: somewhat faster ARM_NEON dot product Still dog slow - 10.7 t/s up from 9.9 t/s. * iq3_s: another small ARM_NEON improvement 10.7 -> 11.0 t/s. Using vmulq_s8 is faster than the xor - sub trick that works best on AVX2. * iq3_s: minor improvement on Metal 49.4 t/s -> 50.3 t/s * iq3_s: PPL improvement E.g., for a context of 4096 LLaMA-v2-7B goes to 5.1340 from 5.1653. * iq3_s: use new grid everywhere * Fix ARM_NEON --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> commit 6c32d8c7ad8ba7b6ad2a162e929a21dd04fcdca0 Author: Xuan Son Nguyen <thichthat@gmail.com> Date: Sat Mar 2 15:19:09 2024 +0100 llama : refactor internal quantization functions (#5830) commit 802da0091ba646ecf02e1a8fae2da0b8e76409bd Author: compilade <113953597+compilade@users.noreply.github.com> Date: Sat Mar 2 08:42:56 2024 -0500 llama : fix segfault from unknown model arch name (#5820) * llama : fix segfault from unknown model arch name * llama : make all LLM maps const This also requires using `std::map::at` instead of its `operator[]` which does not exist for const maps. * llama : name LLM_ARCH_UNKNOWN to "(unknown)" This avoids errors from `std::map::at` when getting the general name of the model architecture. Using "(unknown)" instead of an empty string as per suggestion ggerganov/llama.cpp#5820 (comment) * llama : remove redundant inner const for LLM_TENSOR_NAMES The extra const won't do anything here as const maps return const references to values. Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * llama : remove redundant nullptr check in llm_arch_from_string Since LLM_ARCH_NAMES is a const map, no spurious elements with a NULL name are inserted anymore, so this check is dead code. --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> commit 715641391dda1ff9762dc5d99d9a30acce99f2c6 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Sat Mar 2 19:49:30 2024 +0800 Support multiple GPUs (split mode) on SYCL backend (#5806) * suport multiple cards: split-mode - layer|row * rm warning * rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test * update news * fix merge error * update according to review comments
- Loading branch information