ggml : move CPU backend to a separate file #10144

slaren · 2024-11-02T23:08:01Z

Moves the ggml code specific to the CPU backend to a separate file.

This is an initial step to separate the core ggml library from the CPU backend. In the future, this will allow:

Building other backends as a shared library, without having to link them to the CPU backend
Building the core ggml library with only the base instruction set for the ABI, and load an optimized version of the CPU backend dynamically

Additionally:

Removes the optimization interface, since it has dependencies to the CPU backend, and would be removed in ggml: new optimization interface ggml#988 regardless
Removes the baby-llama example since it depends on the opt interface

ggml-ci

JohannesGaessler

Are there also plans to split ggml-cpu.c into multiple smaller files like was done for CUDA?

(I did not really look at ggml.c and ggml-cpu.c since I think it's not feasible.)

JohannesGaessler · 2024-11-03T09:35:52Z

common/common.cpp

@@ -1951,6 +1951,8 @@ void yaml_dump_string_multiline(FILE * stream, const char * prop_name, const cha

 void yaml_dump_non_result_info(FILE * stream, const common_params & params, const llama_context * lctx,
                               const std::string & timestamp, const std::vector<int> & prompt_tokens, const char * model_desc) {
+    ggml_cpu_init(); // some ARM features are detected at runtime


I didn't get around to it, but this PR reminds me that I also want to at some point remove the YAML log code again. It has become pretty outdated and nowadays there are better solutions for the things that I was originally using it for.

ggml/include/ggml.h

ggml/src/ggml-rpc.cpp

slaren · 2024-11-03T11:04:31Z

Are there also plans to split ggml-cpu.c into multiple smaller files like was done for CUDA?

Yes, I think that would be great. We should also adapt it to C++ and use templates to avoid duplicating the code of the operations for each type.

ggerganov · 2024-11-03T13:32:02Z

Looking into this now.

ggerganov · 2024-11-03T14:48:46Z

Isn't this going to produce thread sanitizer data race warnings on the is_first_call var?

llama.cpp/ggml/src/ggml.c

Lines 1424 to 1443 in bf95fff

    
           struct ggml_context * ggml_init(struct ggml_init_params params) { 
        
               static volatile bool is_first_call = false; 
        
               if (!is_first_call) { 
        
                   ggml_critical_section_start(); 
        
                   if (!is_first_call) { 
        
                       // initialize time system (required on Windows) 
        
                       ggml_time_init(); 
        
                       for (int i = 0; i < (1 << 16); ++i) { 
        
                           union { 
        
                               uint16_t u16; 
        
                               ggml_fp16_t fp16; 
        
                           } u = {i}; 
        
                           ggml_table_f32_f16[i] = GGML_COMPUTE_FP16_TO_FP32(u.fp16); 
        
                       } 
        
                       is_first_call = true; 
        
                   } 
        
                   ggml_critical_section_end(); 
        
               }

This fixes the build break from the recent changes to move the CPU backend to separate files ggerganov#10144

chaxu01 · 2024-11-04T13:46:12Z

@slaren this commit 9f40989 breaks q4_0_4_8 on Arm CPUs, likely related to #10165.

The following command triggers the issue:
./bin/llama-cli -m llama-2-7b-chat.Q4_0_4_8.gguf -p "Write a code in C for bubble sorting" -n 32 -t 4 -ngl 0

The error output is:
Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])), function ggml_compute_forward), function ggml_compute_forward), function ggml_compute_forward_soft_max_f32, file ggml-cpu.c, _soft_max_f32, file ggml-cpu.c, _soft_max_f32, file ggml-cpu.c, ), function ggml_compute_forwardline 8904.

This issue does not occur on commit 08828a6.

This fixes the build break from the recent changes to move the CPU backend to separate files #10144