Some optimize and build warning fix for LoongArch #11709

MQ-mengqing · 2025-02-06T12:16:13Z

ggml : optimize convert f32<->f16 for loongarch_asx
ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16
ggml : Fix warnings when run cpu CI locally on LoongArch

ggerganov · 2025-02-06T13:59:51Z

junchao-loongson · 2025-02-07T06:17:19Z

perf record -e cache-misses ./build/bin/llama-cli -m ../qwen2-1_5b-instruct-q3_k_m.gguf -p "I believe the meaning of life is" -n 128

Before:
  26.02%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_q3_K_q8_K
  21.30%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_f16
  20.03%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_q4_K_q8_K
  10.55%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_q6_K_q8_K  

After:
  33.47%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_q3_K_q8_K
  25.05%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_q4_K_q8_K
  11.67%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_q6_K_q8_K
   5.35%  llama-cli  libc.so.6                      [.] memcpy
   4.63%  llama-cli  libggml-cpu.so                 [.] ggml_compute_forward_mul_mat
   2.94%  llama-cli  llama-cli                      [.] common_sampler_sample(common_sampler*, llama_context*, int, bool)
   2.19%  llama-cli  libc.so.6                      [.] memset
   1.89%  llama-cli  libggml-cpu.so                 [.] ggml_vec_dot_f16

The ggml_vec_dot_f16 hotspot function is clearly mitigated

Before:
> $ ./build/bin/llama-bench -m ../qwen2-1_5b-instruct-q3_k_m.gguf                                                                                                                                                                                   [±master]
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| qwen2 1.5B Q3_K - Medium       | 780.32 MiB |     1.54 B | CPU        |       8 |         pp512 |         21.40 ± 0.01 |
| qwen2 1.5B Q3_K - Medium       | 780.32 MiB |     1.54 B | CPU        |       8 |         tg128 |         20.91 ± 0.07 |

build: c3db0480 (4645)

After:
> $ ./build/bin/llama-bench -m ../qwen2-1_5b-instruct-q3_k_m.gguf                                                                                                                                                                                  [±pr11709]
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| qwen2 1.5B Q3_K - Medium       | 780.32 MiB |     1.54 B | CPU        |       8 |         pp512 |         23.09 ± 0.07 |
| qwen2 1.5B Q3_K - Medium       | 780.32 MiB |     1.54 B | CPU        |       8 |         tg128 |         21.44 ± 0.09 |

build: 99bbe263 (4656)

Benchmark has a small improvement

LGTM！

* ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch

MQ-mengqing added 3 commits February 6, 2025 20:04

ggml : optimize convert f32<->f16 for loongarch_asx

e6d955e

ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16

45aa1db

ggml : Fix warnings when run cpu CI locally on LoongArch

99bbe26

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 6, 2025

ggerganov approved these changes Feb 7, 2025

View reviewed changes

ggerganov merged commit 225bbbf into ggml-org:master Feb 7, 2025
44 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some optimize and build warning fix for LoongArch #11709

Some optimize and build warning fix for LoongArch #11709

MQ-mengqing commented Feb 6, 2025

ggerganov commented Feb 6, 2025

junchao-loongson commented Feb 7, 2025

Some optimize and build warning fix for LoongArch #11709

Some optimize and build warning fix for LoongArch #11709

Conversation

MQ-mengqing commented Feb 6, 2025

ggerganov commented Feb 6, 2025

junchao-loongson commented Feb 7, 2025