merge from upstream #56

l3utterfly · 2025-03-01T07:54:07Z

No description provided.

* Add super wip scripts for multimodal granite gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add example for converting mmgranite to gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * remove hardcoded path Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add vision feature layer to gguf params Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Clean up llava surgery and remove name substitution hacks Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add transformers llava next tensor name mapping Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Make siglip / openclip mutuall exclusive Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix projector linear substitution Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix linear 2 substitution index Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Increase max flattened gridpoints to 64 Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix hardcoded concat for multiple feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Pull vision feature layers out of gguf keys Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * fix num gridpoints and use all layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Avoid dropping last image encoder layer in llava models Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use 10 for max number of patches Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Standardize vision feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Cleanup logs Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update comment for vision feature layer init Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update notes for alternative to legacy llm conversion script Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix notes rendering Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add v prefix to vision feature layer log Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use current defaults for feature layer Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use constant for max gridpoints / feat layers, style fixes Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * clarify non-negative feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Remove CLIP_API from func signature Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * USE MAX_IMAGE_FEATURE_LAYERS const in layer calc Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Clarify feature layers are non negative ints and not uint Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix condition for reading feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * pop last llava layer when feature layers are unset Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix unset vision layer 0 Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update examples/llava/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Reenable assertion for out of bounds get_rows Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use std vector for gridpoints and feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Caculate max feature layer at load time Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Include base patch for granite vision allocation Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix trailing whitespace Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add max num patches = 10 back for minicpmv Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use unordered set to store feature layers Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use max feature layer for postnorm Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Apply suggestions from code review --------- Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com> Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>

metal: use dequantize_q templates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).

* Add example docs for granite vision Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* vulkan: implement GGML_OP_ROPE_BACK * vulkan: implement GGML_OP_RMS_NORM_BACK * vulkan: implement GGML_OP_SILU_BACK * vulkan: implement GGML_OP_SOFTMAX_BACK

* ggml-cpu: Fix build with sve Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml-cpu: Remove unused variable in sve q3_k vec dot Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Co-authored-by: Judd <foldl@boxvest.com>

Looks like a copy/paste bug from qx_needs_dequant.

…ht (ggml-org#12069)

Signed-off-by: kerthcet <kerthcet@gmail.com>

Currently self.byte_order is never used. Actually use it to byteswap read data to allow reading big endian files on little endian systems and vice versa. Now it's possible to convert little-endian model into a big-endian model and back on a little-endian system.

* Refactor gguf scripts to improve metadata handling Added contents method to ReaderField class Added endianess property to GGUFReader class * update scripts * fix import * remove unused import * attempt to work around flake and pyright errors * second attempt * give up, ignore type * bump version * apply newbyteorder fixes

* add struct for FFI bindgen * Apply suggestions from code review --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Fix dependencies between ggml and backends ggml backends link only to ggml-base and ggml links to all backends. * Fix installation of ggml backends Set up GNUInstallDirs before setting the installation directory of ggml backends

* vulkan: improve im2col performance

* faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8

Remove unused header file that causes compilation failure on ARM platform with GCC 13.

…rg#12064) * Added SVE Support for Q2_K Quantized Models * Use 4-space indentation in the switch cases * removed comments lines * Remove the loop Retain the curly bracess for better understanding of code * Remove the comment like added for q3_k_q8_k kernel --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com>

…ns (ggml-org#11595) * vulkan: implement specialized MMV kernels for IQ2 quantizations * vulkan: add MMV kernels for IQ3 quants * vulkan: Increase MMV batch size and unroll IQ LUT setup * vulkan: fix init_iq_shmem for WG sizes larger than tables * vulkan: common batch size for all I-quants

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

…2108) * Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>

* Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>

* convert : fix Norway problem when parsing YAML * Update gguf-py/gguf/metadata.py * add newline at correct place

alex-jw-brooks and others added 28 commits February 24, 2025 17:09

metal : copy kernels for quant to F32/F16 conversions (ggml-org#12017)

58d07a8

metal: use dequantize_q templates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

llama : expose llama_model_n_head_kv in the API (ggml-org#11997)

3e9a286

It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).

Add Doc for Converting Granite Vision -> GGUF (ggml-org#12006)

4d1051a

* Add example docs for granite vision Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

server: support add_generation_prompt query param (ggml-org#12062)

0b52745

vulkan: implement more backpropagation operators (ggml-org#11914)

61d4f39

* vulkan: implement GGML_OP_ROPE_BACK * vulkan: implement GGML_OP_RMS_NORM_BACK * vulkan: implement GGML_OP_SILU_BACK * vulkan: implement GGML_OP_SOFTMAX_BACK

add OP sigmoid (ggml-org#12056)

c132239

Co-authored-by: Judd <foldl@boxvest.com>

server: handle echo=false on /v1/completions (ggml-org#12060)

401af80

vulkan: fix assertion when qy_needs_dequant (ggml-org#12068)

a82c9e7

Looks like a copy/paste bug from qx_needs_dequant.

docs: add docs/function-calling.md to lighten server/README.md's plig…

d7cfe1f

…ht (ggml-org#12069)

readme : update infra list (ggml-org#9096)

53e4db1

Signed-off-by: kerthcet <kerthcet@gmail.com>

llava : add struct for FFI bindgen (ggml-org#12079)

a800ae4

* add struct for FFI bindgen * Apply suggestions from code review --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

vulkan: improve im2col (ggml-org#11826)

581650b

* vulkan: improve im2col performance

vulkan: matmul dequantization improvements (ggml-org#12015)

fbeda90

* faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8

CANN: Fix build error with GCC 13 (ggml-org#11990)

673cfef

Remove unused header file that causes compilation failure on ARM platform with GCC 13.

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (ggml-org#12098)

9c42b17

Update granite vision docs for 3.2 model (ggml-org#12105)

84d5f4b

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

convert : fix Norway problem when parsing YAML (ggml-org#12114)

06c2b15

* convert : fix Norway problem when parsing YAML * Update gguf-py/gguf/metadata.py * add newline at correct place

Merge branch 'layla-build' into merge

648f244

l3utterfly merged commit 6718011 into layla-build Mar 1, 2025
5 checks passed

github-actions bot added the documentation Improvements or additions to documentation label Mar 1, 2025

github-actions bot added SYCL Nvidia GPU Vulkan testing examples python server ggml Apple Metal labels Mar 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from upstream #56

merge from upstream #56

l3utterfly commented Mar 1, 2025

merge from upstream #56

merge from upstream #56

Conversation

l3utterfly commented Mar 1, 2025