merge from upstream #52

l3utterfly · 2025-02-11T06:36:15Z

No description provided.

ggml-ci

More RAII mainly Signed-off-by: Eric Curtin <ecurtin@redhat.com>

There is no need to use map, just store the base pointer in the buffer context.

* Copy minja from google/minja@58f0ca6 * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (google/minja#22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to google/minja@b8437df * Update minja to google/minja#25 * Update minja from google/minja#27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* init * add readme * update readme * no use make * update readme * update fix code * fix editorconfig-checker * no change convert py * use clip_image_u8_free

ggml-org#11342) * Factor string_join, string_split, string_repeat into common * json: refactor to surface a versatile builder * Update common.cpp

Signed-off-by: Jiri Podivin <jpodivin@redhat.com>

* main : update README documentation for batch size * fix formatting * minor

With robustbufferaccess disabled, this shader was showing OOB stores. There is a bounds check in the code, but the workgrouop dimensions were reversed vs CUDA and it was running the wrong number of threads. So fix the workgroup dimensions and disable robustness for this pipeline.

Fixes ggml-org#11306.

There should be a copy-and-paste error here. *mmq_wg_denoms should be used together with *warptile_mmq, instead of wg_denoms.

ollama uses hf.co/ to specify huggingface prefix, like RamaLama uses hf:// Treat them similarly. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if

Most other llama.cpp cli tools accept -ngl with a single dash. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

To show -n, -ngl, --ngl is acceptable. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

Now that we have batched mat-vec mul Vulkan shaders for up to n==8, these tests weren't actually exercising the mat-mat mul path. Test n==9 as well. Also, change to use all_types.

…nt (ggml-org#11364) * webui : put DeepSeek R1 CoT in a collapsible <details> element * webui: refactor split * webui: don't use regex to split cot and response * webui: format+qol * webui: no loading icon if the model isn't generating * ui fix, add configs * add jsdoc types * only filter </think> for assistant msg * build * update build --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

For consistency Signed-off-by: Eric Curtin <ecurtin@redhat.com>

…rg#11366) See https://reproducible-builds.org/ for why this is good and https://reproducible-builds.org/specs/source-date-epoch/ for the definition of this variable. Without this patch, compiling on different machines produced different binaries, which made verification of results difficult. Fixes: ggml-org#11317 This patch was done while working on reproducible builds for openSUSE.

…g#11368)

* release : pack /lib and /include in the packages * cmake : put libs in /bin * TMP : push artifacts * Revert "TMP : push artifacts" This reverts commit 4decf2c. * ci : fix HIP cmake compiler options to be on first line * ci : restore the original HIP commands * ci : change ubuntu build from latest to 20.04 * ci : try to fix macos build rpaths * ci : remove obsolete MacOS build * TMP : push artifacts * ci : change back to ubuntu latest * ci : macos set build rpath to "@loader_path" * ci : fix typo * ci : change ubuntu package to 22.04 * Revert "TMP : push artifacts" This reverts commit 537b09e.

* Add hipGraph support * Enable VMM on rocm

* CANN: Add Ascend CANN build ci * Update build.yml * Modify cann image version * Update build.yml * Change to run on x86 system * Update build.yml * Update build.yml * Modify format error * Update build.yml * Add 'Ascend NPU' label restrictions * Exclude non PR event Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> * Update build.yml --------- Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>

ggml-ci

…l-org#11759) * redo Settings modal UI * add python code interpreter * fix auto scroll * build * fix overflow for long output lines * bring back sticky copy button * adapt layout on mobile view * fix multiple lines output and color scheme * handle python exception * better state management * add webworker * add headers * format code * speed up by loading pyodide on page load * (small tweak) add small animation to make it feels like claude

…gml-org#11502)

Use the ANSI escape code for clearing a line. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

…VRAM allocation (ggml-org#11592)

…1494) Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

typo: `\` -> `/` Change the UNIX path separator to` \`.

Technically the fixed width types come only from iostream and cstdint/stdint.h headers. memory and vector headers should not provide these. In GCC 15 the headers are cleaned up and you require the proper header cstdint. src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type 26 | uint32_t read_u32() const; | ^~~~~~~~

…-org#11792) * server : (webui) introduce conversation branching + idb storage * mark old conv as "migrated" instead deleting them * improve migration * add more comments * more clarification

…ke systems (ggml-org#11770)

* Update ggml.c * Update arg.cpp * Update speculative.h

* CUDA: use arch list for feature availability check --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

ggerganov and others added 30 commits January 21, 2025 08:48

metal : fix out-of-bounds write (ggml-org#11314)

2139667

ggml-ci

linenoise.cpp refactoring (ggml-org#11301)

2e2f8f0

More RAII mainly Signed-off-by: Eric Curtin <ecurtin@redhat.com>

rpc : better caching of the base buffer pointer (ggml-org#11331)

6da5bec

There is no need to use map, just store the base pointer in the buffer context.

export-lora : fix tok_embd tensor (ggml-org#11330)

e28245f

llava : support Minicpm-omni (ggml-org#11289)

3e3357f

* init * add readme * update readme * no use make * update readme * update fix code * fix editorconfig-checker * no change convert py * use clip_image_u8_free

common: utils to split / join / repeat strings (from json converter) (

a94f3b2

ggml-org#11342) * Factor string_join, string_split, string_repeat into common * json: refactor to surface a versatile builder * Update common.cpp

Adding logprobs to /v1/completions (ggml-org#11344)

96f4053

Signed-off-by: Jiri Podivin <jpodivin@redhat.com>

minja: sync at google/minja@0f5f7f2 (ggml-org#11352)

c64d2be

server : fix draft context not being released (ggml-org#11354)

12c2bdf

readme : add plugin links (ggml-org#11355)

16d3df7

main : update README documentation for batch size (ggml-org#11353)

6152129

* main : update README documentation for batch size * fix formatting * minor

vulkan: sort shaders for more deterministic binary (ggml-org#11315)

1971adf

Fixes ggml-org#11306.

Vulkan-run-test: fix mmq_wg_denoms (ggml-org#11343)

955a6c2

There should be a copy-and-paste error here. *mmq_wg_denoms should be used together with *warptile_mmq, instead of wg_denoms.

Treat hf.co/ prefix the same as hf:// (ggml-org#11350)

f211d1d

ollama uses hf.co/ to specify huggingface prefix, like RamaLama uses hf:// Treat them similarly. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

server : add more clean up when cancel_tasks is called (ggml-org#11340)

5845661

* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if

Add -ngl (ggml-org#11372)

f7fb43c

Most other llama.cpp cli tools accept -ngl with a single dash. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

Update documentation (ggml-org#11373)

05f63cc

To show -n, -ngl, --ngl is acceptable. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

tests: fix some mul_mat test gaps (ggml-org#11375)

564804b

Now that we have batched mat-vec mul Vulkan shaders for up to n==8, these tests weren't actually exercising the mat-mat mul path. Test n==9 as well. Also, change to use all_types.

Update llama-run README.md (ggml-org#11386)

01f37ed

For consistency Signed-off-by: Eric Curtin <ecurtin@redhat.com>

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (ggml-org#11380)

8137b4b

docs : Update readme to build targets for local docker build (ggml-or…

a07c2c8

…g#11368)

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (ggml-org#11356)

9fbadae

CUDA: fix FP16 cuBLAS GEMM (ggml-org#11396)

c5d9eff

hip : Add hipGraph and VMM support to ROCM (ggml-org#11362)

5f0db95

* Add hipGraph support * Enable VMM on rocm

ggerganov and others added 16 commits February 8, 2025 16:49

cont : fix mmap flag print (ggml-org#11699)

bdcf8b6

server : minor log updates (ggml-org#11760)

aaa5505

ggml-ci

server : (webui) increase edit textarea size (ggml-org#11763)

e6e6583

vulkan: account for lookup tables when checking shared memory size (g…

98f6b0f

…gml-org#11502)

There's a better way of clearing lines (ggml-org#11756)

19d3c82

Use the ANSI escape code for clearing a line. Signed-off-by: Eric Curtin <ecurtin@redhat.com>

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

b044a0f

…VRAM allocation (ggml-org#11592)

vulkan: Make Vulkan optional at runtime (ggml-org#11493). (ggml-org#1…

c2a67ef

…1494) Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

Update README.md [no ci] (ggml-org#11781)

9ac3457

typo: `\` -> `/` Change the UNIX path separator to` \`.

sync: minja (google/minja@a72057e) (ggml-org#11774)

d7b31a9

server : correct signal handler (ggml-org#11795)

0893e01

server : (webui) introduce conversation branching + idb storage (ggml…

507f917

…-org#11792) * server : (webui) introduce conversation branching + idb storage * mark old conv as "migrated" instead deleting them * improve migration * add more comments * more clarification

docs: utilize the forward slash (/) as the path separator for Unix-li…

8173261

…ke systems (ggml-org#11770)

fix: typos in documentation files (ggml-org#11791)

7b891bd

* Update ggml.c * Update arg.cpp * Update speculative.h

CUDA: use arch list for compatibility check (ggml-org#11775)

b9ab0a4

* CUDA: use arch list for feature availability check --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

l3utterfly merged commit 08aa6de into layla-build Feb 11, 2025
50 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python server ggml Apple Metal script labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from upstream #52

merge from upstream #52

l3utterfly commented Feb 11, 2025

merge from upstream #52

merge from upstream #52

Conversation

l3utterfly commented Feb 11, 2025