ggml-opencl : store GPU buffer in ggml_tensor::extra #2994

slaren · 2023-09-03T18:27:52Z

ggml-opencl currently stores the GPU buffer in ggml_tensor::data, and after the GGUF changes this will result in a memory leak when not using mmap, as the address of the CPU buffer is lost after the call to ggml_cl_transform_tensor:

llama.cpp/llama.cpp

Lines 1516 to 1523 in 47068e5

    
           #elif defined(GGML_USE_CLBLAST) 
        
                           case GGML_BACKEND_GPU: 
        
                               ggml_cl_transform_tensor(cur->data, cur); 
        
                               if (!use_mmap) { 
        
                                   free(cur->data); 
        
                               } 
        
                               break; 
        
           #endif

This change solves that issue and a possible interaction with ggml-alloc if the opencl buffer falls within the measure buffer memory range by storing the GPU buffer in ggml_tensor::extra instead.

Fixes #2993

ggml-opencl : store GPU buffer in ggml_tensor::extra

2d63144

ggerganov approved these changes Sep 3, 2023

View reviewed changes

slaren merged commit bd33e5a into master Sep 4, 2023

slaren deleted the opencl-extra branch September 4, 2023 12:59

jhen0409 mentioned this pull request Sep 8, 2023

Excessively high memory consumption on iOS #3069

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-opencl : store GPU buffer in ggml_tensor::extra #2994

ggml-opencl : store GPU buffer in ggml_tensor::extra #2994

slaren commented Sep 3, 2023 •

edited

Loading

	#elif defined(GGML_USE_CLBLAST)
	case GGML_BACKEND_GPU:
	ggml_cl_transform_tensor(cur->data, cur);
	if (!use_mmap) {
	free(cur->data);
	}
	break;
	#endif

ggml-opencl : store GPU buffer in ggml_tensor::extra #2994

ggml-opencl : store GPU buffer in ggml_tensor::extra #2994

Conversation

slaren commented Sep 3, 2023 • edited Loading

slaren commented Sep 3, 2023 •

edited

Loading