[pull] master from ggerganov:master #34

pull · 2024-02-09T08:43:48Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* Fix Vulkan crash on APUs with very little device memory * Fix debug output function names

* Not capping thread count when MoE inference is running on CPU * Whitespace

* llava: add requirements.txt and update README.md This commit adds a `requirements.txt` file to the `examples/llava` directory. This file contains the required Python packages to run the scripts in the `examples/llava` directory. The motivation of this to make it easier for users to run the scripts in `examples/llava`. This will avoid users from having to possibly run into missing package issues if the packages are not installed on their system. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * llava: fix typo in llava-surgery.py output Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

A common default for the maximum number of open files is 256, which can lead to `asyncio.gather(*tasks)` failing with Too many open files. $ python ggml_vk_generate_shaders.py --glslc=$ANDROID_NDK_PATH/shader-tools/darwin-x86_64/glslc ggml_vulkan: Generating and compiling shaders to SPIR-V Traceback (most recent call last): File "/Users/neuman/Code.noindex/github/llama.cpp/ggml_vk_generate_shaders.py", line 2326, in <module> asyncio.run(main()) File "/Users/neuman/Code.noindex/miniforge3/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/Users/neuman/Code.noindex/miniforge3/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/Users/neuman/Code.noindex/github/llama.cpp/ggml_vk_generate_shaders.py", line 2294, in main await asyncio.gather(*tasks) [...snip...] OSError: [Errno 24] Too many open files This change sets a reasonable concurrency limit for tasks (and therefore open files), without significant impact on run time.

slaren and others added 3 commits February 8, 2024 21:33

llama : do not print "offloading layers" message in CPU-only builds (#…

41f308f

…5416)

CUDA: more warps for mmvq on NVIDIA (#5394)

8e6a9d2

Fix Vulkan crash on APUs with very little device memory (#5424)

44fbe34

* Fix Vulkan crash on APUs with very little device memory * Fix debug output function names

pull bot added the ⤵️ pull label Feb 9, 2024

Xarbirus and others added 6 commits February 9, 2024 11:56

ggml : fix error C2078: too many initializers for MSVC ARM64 (#5404)

b2f87cb

readme : add JavaScript/Wasm repo (#5415)

e4124c2

llama : do not cap thread count when MoE on CPU (#5419)

e5ca393

* Not capping thread count when MoE inference is running on CPU * Whitespace

server : fix prompt caching for repeated prompts (#5420)

7c777fc

teleprint-me closed this Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #34

[pull] master from ggerganov:master #34

pull bot commented Feb 9, 2024 •

edited

Loading

[pull] master from ggerganov:master #34

[pull] master from ggerganov:master #34

Conversation

pull bot commented Feb 9, 2024 • edited Loading

pull bot commented Feb 9, 2024 •

edited

Loading