Releases: teleprint-me/llama.cpp
Releases · teleprint-me/llama.cpp
b3448
server : fix URL.parse in the UI (#8646)
b3441
llama : fix codeshell support (#8599) * llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order
b3428
CUDA: MMQ code deduplication + iquant support (#8495) * CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build
b3423
ggml : fix quant dot product with odd number of blocks (#8549) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix odd blocks for ARM_NEON (#8556) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix q4_1 * ggml : fix q5_0 * ggml : fix q5_1 * ggml : fix iq4_nl metal ggml-ci * ggml : fix q4_0 * ggml : fix q8_0 ggml-ci * ggml : remove special Q4_0 code for first 2 blocks * ggml : fix sumf redefinition --------- Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3422
convert-*.py: remove add_name from ChatGLMModel class (#8590)
b3416
CUDA: fix partial offloading for ne0 % 256 != 0 (#8572)
b3405
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515)
b3399
fix ci (#8494)
b3387
llama : fix pre-tokenization of non-special added tokens (#8228) * llama : fix mpt and olmo pre-tokenizer * llama : pre-tokenize non-special user-defined tokens first * llama : fix detection of control-like user-defined tokens * convert_hf : identify which user-defined tokens are control tokens Only used in _set_vocab_gpt2() for now. * convert_hf : identify more added control tokens for SPM tokenziers This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly, including HTML tags and consecutive spaces, but it unfortunately requires model re-conversion. There seems to be a weird behavior of the HF tokenizer for Gemma, which prefers to use the 16-space token over more lengthy space tokens, while using the SentencePiece tokenizer does not do this. (the implementation in llama.cpp has the same behavior as SentencePiece) * llama : fix wrong pre-tokenization of byte tokens * llama : fix Viking pre-tokenizer regex The order was previously wrong, which caused errors in some tests. * llama : fix command-r detokenization * convert_hf : reduce usages of the UNKNOWN token type * llama : add UNKNOWN tokens in the special tokens cache * convert_hf : reduce usages of UNKNOWN for InternLM2 This makes the changes from #8321 more consistent with the other changes made here. * test-tokenizer-random : reduce potential confilcts with #8379 * test-tokenizer-random : add a failing edge case for falcon
b3386
vulkan : cmake integration (#8119) * Add Vulkan to CMake pkg * Add Sycl to CMake pkg * Add OpenMP to CMake pkg * Split generated shader file into separate translation unit * Add CMake target for Vulkan shaders * Update README.md * Add make target for Vulkan shaders * Use pkg-config to locate vulkan library * Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow * Clean up tabs * Move sudo to apt-key invocation * Forward GGML_EXTRA_LIBS to CMake config pkg * Update vulkan obj file paths * Add shaderc to nix pkg * Add python3 to Vulkan nix build * Link against ggml in cmake pkg * Remove Python dependency from Vulkan build * code review changes * Remove trailing newline * Add cflags from pkg-config to fix w64devkit build * Update README.md * Remove trailing whitespace * Update README.md * Remove trailing whitespace * Fix doc heading * Make glslc required Vulkan component * remove clblast from nix pkg