merge from upstream #25

l3utterfly · 2024-06-25T05:47:50Z

No description provided.

…erganov#8022) * vulkan: detect multiple devices by deviceUUID instead of deviceID * vulkan: remove unneeded variables * vulkan: fix id query

@ochafik

* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars. * Adding additional examples as documented in ggerganov#7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program. * Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs. * Merging improved schema test methods added by @ochafik in ggerganov#7797 * Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework. * Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings. * Fixing grammar indentation to be consistent throughout file.

@JohannesGaessler

…alues (ggerganov#8058) Uses the values computed by @JohannesGaessler in PR ggerganov#7413

ggerganov#8052) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp

* cvector: fix CI + correct help message * also correct --pca-iter

* Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error

* test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer

* hf bitnet v1 * hf bitnet e2e v2 * finish bitnet e2e * finish f16 hf bitnet e2e * remove unsed * finish bitnet i2 e2e * move i2s to quantize v1 * move i2 to quantize * clean code * clean code 2 * fix codestyle * fix code * fix * fix code * fix merge * remove unused * change table name * fix whitespace * delete redundant * i2_s to absmax * finish i2_s/i8_s vec_dot x86 simd * i2s->q22 * fix code * remove block scale * add dequantize * fix seq * update avx2 * remove q2_2 * remove q22_grid * fix whitespace * reuse llm_build_kv * fix bo --------- Co-authored-by: root <root@wangjinheng>

* ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF

…el variants (ggerganov#5763) * gguf-py : add T5 model architecture * gguf-py : add separate tensors for encoder and decoder * gguf-py : add new model header parameters: decoder_start_token_id, attention.relative_buckets_count, tokenizer.ggml.remove_extra_whitespaces, tokenizer.ggml.precompiled_charsmap * convert-hf : add model conversion support for T5ForConditionalGeneration and T5WithLMHeadModel --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* add parameters for embeddings --embd-normalize --embd-output-format --embd-separator description in the README.md * Update README.md fix tipo * Trailing whitespace * fix json generation, use " not ' * fix merge master * fix code formating group of parameters // embedding print usage for embedding parameters --------- Co-authored-by: Brian <mofosyne@gmail.com>

* support splits in convert.py * Support split by size and dry run to write estimated shards/filesizes * Move split functionality to new GGUFManager class * fix improper function signature * tentative push of convert-hf-to-gguf support * resolve merge + SplitArguments for easier parsing * Fix eager tensor memory leak and remove convert.py changes Removed a memory leak caused by unexpected reference retention to eager tensors. Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py. * refactor SplitStrategy to be a deque Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself. * fix Q8 quantization * remove unnecessary imports in gguf_manager * fix final? merge issue * fix gguf_writer placement and remove comments * oops, actually fix gguf_writer placement * reduce duplicated code from gguf_writer * further simplify GGUFManager * simplify even further and standardize with GGUFWriter * reduce diffs with master * form shards while adding tensors, SHA256 sums agree with master * re-add type hint Co-authored-by: compilade <git@compilade.net> * GGUFWriter compatibility fix Co-authored-by: compilade <git@compilade.net> * Shard dataclass and un-negative dont_add_architecture * type consistency in format_n_bytes_to_str * move kv keys to constants.py * make pathlib explicit * base-1024 bytes to base-1000 * rename GGUFManager to GGUFWriterSplit * Update gguf-py/gguf/constants.py Co-authored-by: compilade <git@compilade.net> * fix convert-hf-to-gguf.py permissions * fix line endings * Update gguf-py/gguf/gguf_writer_split.py Co-authored-by: compilade <git@compilade.net> * convert-hf : restore executable file permission * examples/convert-legacy-llama.py: restore executable file permission * reinstate original gguf package import and fix type annotation * attempt to appease the linter * attempt 2 to appease the linter * attempt 3 to appease the linter * comma consistency * Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net> * edit cmd line args * use simplification from ggerganov#7827 * kv/ti data are still wrong * try to refactor kv data (still fails) * fix ti data messiness * tidy up * fix linting * actually make the linter happy * cleanup round 1 * remove SplitStrategy, SplitArguments * appease linter * fix typing and clean up * fix linting * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * progress bar, fix split logic * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * catch oversights * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * swap bar orders * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * compatibility fix * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * Update convert-hf-to-gguf.py Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>

* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr

ggerganov#8090) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Brian <mofosyne@gmail.com>

Adriankhl and others added 23 commits June 21, 2024 10:28

vulkan: detect multiple devices by deviceUUID instead of deviceID (gg…

557b653

…erganov#8022) * vulkan: detect multiple devices by deviceUUID instead of deviceID * vulkan: remove unneeded variables * vulkan: fix id query

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

5b48cd5

…alues (ggerganov#8058) Uses the values computed by @JohannesGaessler in PR ggerganov#7413

convert-hf : change assert to exception (ggerganov#8015)

3aa184a

cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (

adf480c

ggerganov#8052) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp

cvector: fix CI + correct help message (ggerganov#8064)

3e58b0e

* cvector: fix CI + correct help message * also correct --pca-iter

Removing extra blank lines that were breaking Lint. (ggerganov#8067)

b5a5f34

Refactor Vulkan backend to allow multiple contexts (ggerganov#7961)

45c0e2e

* Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error

fix CI failures (ggerganov#8066)

b6b9a8e

* test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer

Fix typo in llama_set_embeddings comment (ggerganov#8077)

11318d9

server : fix JSON-Scheme typo (ggerganov#7975)

6a2f298

ggml : remove ggml_task_type and GGML_PERF (ggerganov#8017)

95f57bb

* ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF

disable publishing the full-rocm docker image (ggerganov#8083)

8cb508d

CUDA: optimize MMQ int8 tensor core performance (ggerganov#8062)

9a590c8

* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr

gguf-py : fix tensor groups for encoder-decoder models in gguf-dump.py (

d62e4aa

ggerganov#8090) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Brian <mofosyne@gmail.com>

CUDA: use MMQ instead of cuBLAS by default (ggerganov#8075)

a818f30

CUDA: fix MMQ writeback for int8 tensor cores (ggerganov#8100)

3b099bc

CUDA: fix matrix multiplication algorithm choice (ggerganov#8102)

2df373a

[SYCL] Re-enabled mul_mat_batched_sycl (ggerganov#8095)

083bacc

l3utterfly merged commit 37b40c3 into layla-build Jun 25, 2024
60 of 75 checks passed

github-actions bot added SYCL Nvidia GPU Vulkan testing build examples labels Jun 25, 2024

github-actions bot added devops python server ggml labels Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from upstream #25

merge from upstream #25

l3utterfly commented Jun 25, 2024

merge from upstream #25

merge from upstream #25

Conversation

l3utterfly commented Jun 25, 2024