title #29

apicalshark · 2024-12-05T10:48:01Z

No description provided.

* server : force F16 KV cache for the draft model ggml-ci * server : fix draft params ggml-ci * server : various params fixes ggml-ci

this doesn't work as expected

* metal : small-batch mat-mul kernels ggml-ci * metal : add rest of types ggml-ci * metal : final adjustments ggml-ci * metal : add comments ggml-ci

…10271) * readme : document --no-display-prompt * readme : update default prompt context size * readme : remove unnecessary indentation Indenting a line with four spaces makes Markdown treat that section as plain text. * readme : indent commands under bullets * readme : indent commands in lettered list

…anov#10636)

* wip * wip implementation f32 * kernel conv transpose 1d f32 working * initial commit

* implemented argmax kernel * tpig -> tgpig * change to strides * contiguous assertions * kernel working and tested * argmax simd parallel implementation * added 2 new tests for argmax in test-backend-ops * cosmit * added 3 tests cases for perf eval * add test_argmax in make_test_cases_perf * Update test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit * same problem in vec32 --------- Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>

ggml-ci

…ganov#10599) * hide buttons in dropdown menu * use npm as deps manager and vite as bundler * fix build * fix build (2) * fix responsive on mobile * fix more problems on mobile * sync build * (test) add CI step for verifying build * fix ci * force rebuild .hpp files * cmake: clean up generated files pre build

Use vector loads when possible in mul_mat_split_k_reduce. Use split_k when there aren't enough workgroups to fill the shaders.

Co-authored-by: piDack <pcdack@hotmail.co>

* Add notes for a static build * Update docs/build.md --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

…IDIA backend (ggerganov#10584) * [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com> * Formatting * Address PR comments to increase readibility --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

…gerganov#10642)

…v#10626) * ggml : add predefined list of CPU backend variants to build * update CPU dockerfiles

* server : fix speculative decoding with context shift ggml-ci * server : take into account speculative limits ggml-ci * server : add tests

Fixed Path Separator Handling for Cross-Platform Support (Windows File Systems)

This commit updates the copy-paste instruction in convert_hf_to_gguf_update.py to reflect that convert_hf_to_gguf.py will have already been updated with the new get_vocab_base_pre() function when this script completes.

ggerganov and others added 26 commits December 3, 2024 11:20

server : fix default draft model parameters (ggerganov#10586)

70b98fa

* server : force F16 KV cache for the draft model ggml-ci * server : fix draft params ggml-ci * server : various params fixes ggml-ci

github : minify link [no ci]

844e2e1

github : minify link [no ci] (revert)

515d4e5

this doesn't work as expected

metal : small-batch mat-mul kernels (ggerganov#10581)

0115df2

* metal : small-batch mat-mul kernels ggml-ci * metal : add rest of types ggml-ci * metal : final adjustments ggml-ci * metal : add comments ggml-ci

llama : add missing LLAMA_API for llama_chat_builtin_templates (ggerg…

3b4f2e3

…anov#10636)

metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026)

667d70d

* wip * wip implementation f32 * kernel conv transpose 1d f32 working * initial commit

CUDA: remove unnecessary warp reduce in FA (ggml/1032)

e9e661b

* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit * same problem in vec32 --------- Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>

sync : ggml

c505471

scripts : remove amx sync

1cd3df4

ggml-ci

vulkan: optimize and reenable split_k (ggerganov#10637)

cc98896

Use vector loads when possible in mul_mat_split_k_reduce. Use split_k when there aren't enough workgroups to fill the shaders.

clip : add sycl support (ggerganov#10574)

01e6d9b

Co-authored-by: piDack <pcdack@hotmail.co>

Add docs for creating a static build (ggerganov#10268) (ggerganov#10630)

da6aac9

* Add notes for a static build * Update docs/build.md --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

Avoid using __fp16 on ARM with old nvcc (ggerganov#10616)

cd2f37b

fix typo of README.md (ggerganov#10605)

98036d5

vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (g…

2759916

…gerganov#10642)

llama: Support MiniCPM-1B (with & w/o longrope) (ggerganov#10559)

8d0cfd5

Fix HF repo commit to clone lora test models (ggerganov#10649)

253b7fd

ggml-cpu : fix HWCAP2_I8MM value (ggerganov#10646)

2803540

ggml : add predefined list of CPU backend variants to build (ggergano…

59f4db1

…v#10626) * ggml : add predefined list of CPU backend variants to build * update CPU dockerfiles

server : fix speculative decoding with context shift (ggerganov#10641)

1da7b76

* server : fix speculative decoding with context shift ggml-ci * server : take into account speculative limits ggml-ci * server : add tests

Update deprecation-warning.cpp (ggerganov#10619)

f112d19

Fixed Path Separator Handling for Cross-Platform Support (Windows File Systems)

github-actions bot added documentation Improvements or additions to documentation examples server devops labels Dec 5, 2024

github-actions bot added testing python script ggml SYCL Nvidia GPU Vulkan Apple Metal labels Dec 5, 2024

Merge branch 'master' into 1

78156d7

apicalshark merged commit bae649a into master Dec 5, 2024
5 of 8 checks passed

apicalshark deleted the 1 branch December 5, 2024 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

title #29

title #29

apicalshark commented Dec 5, 2024

title #29

title #29

Conversation

apicalshark commented Dec 5, 2024