[pull] master from ggerganov:master #146

pull · 2024-10-17T01:42:34Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

This commit removes the buffer_id field from the leaf_alloc struct. The motivation for is that this field is only written to and never read/used as far as I can tell. Each tensor_alloc has a buffer_id field and this is what caused me to look into this more closely, to understand what the buffer_id in leaf_alloc was used for.

* server: fix the disappearance of the end of the text when streaming with stop strings * simplify "send text" checks

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Prior to this commit, using a JSON Schema containing a string with `pattern` regular expression that uses top-level alternation (e.g. `"pattern": "^A|B|C|D$"`) would result in invalid JSON output from the constrained sampling grammar, because it ended up creating a grammar rule like this for the string: ``` thing ::= "\"" "A" | "B" | "C" | "D" "\"" space ``` Note that this rule will only match a starting quote for the "A" case, and will only match an ending quote for the "D" case, so this rule will always produce invalid JSON when used for sampling (that is, the JSON will always be lacking the starting quote, the ending quote, or both). This was fixed in a simple way by adding parentheses to the generated rule (for all string pattern rules, to keep it simple), such that the new generated rule looks like this (correct): ``` thing ::= "\"" ("A" | "B" | "C" | "D") "\"" space ```

* llama : suppress conversion from 'size_t' to 'int' This commit updates llm_tokenizer_spm.tokenize to suppress/remove the following warnings that are generated on Windows when using MSVC: ```console src\llama-vocab.cpp(211,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data src\llama-vocab.cpp(517,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data ``` This is done by adding a cast for the size_t returned from symbols.size(). I believe this is safe as it seems unlikely that symbols, which stores an entry for each UTF8 character, would become larger than INT_MAX. The motivation for this change is to reduce the number of warnings that are currently generated when building on Windows. * squash! llama : suppress conversion from 'size_t' to 'int' Move cast into for loop.

* fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment

* vulkan : add backend registry / device interfaces * llama : print devices used on model load

Co-authored-by: Tim Wang <tim.wang@ing.com>

leo-pony and others added 11 commits October 16, 2024 08:51

[CANN] Fix cann compilation error (#9891)

becfd38

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

sync : ggml

0e41b30

server : fix the disappearance of the end of the text (#9867)

1f66b69

* server: fix the disappearance of the end of the text when streaming with stop strings * simplify "send text" checks

llama : add tensor name for "result_norm" (#9907)

10433e8

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

llava : fix typo in error message [no ci] (#9884)

dbf18e4

fix: allocating CPU buffer with size 0 (#9917)

2194200

vulkan : add backend registry / device interfaces (#9721)

f010b77

* vulkan : add backend registry / device interfaces * llama : print devices used on model load

pull bot added the ⤵️ pull label Oct 17, 2024

github-actions bot added examples python server ggml Vulkan testing script labels Oct 17, 2024

readme : update bindings list (#9918)

3752217

Co-authored-by: Tim Wang <tim.wang@ing.com>

teleprint-me closed this Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #146

[pull] master from ggerganov:master #146

pull bot commented Oct 17, 2024 •

edited

Loading

[pull] master from ggerganov:master #146

[pull] master from ggerganov:master #146

Conversation

pull bot commented Oct 17, 2024 • edited Loading

pull bot commented Oct 17, 2024 •

edited

Loading