Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only call rocblas_initialize for versions < 4 to eliminate unncessary VRAM allocation on some AMD cards #11080

Merged
merged 9 commits into from
Jan 28, 2025
11 changes: 9 additions & 2 deletions ggml/src/ggml-cuda/ggml-cuda.cu
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,15 @@ static ggml_cuda_device_info ggml_cuda_init() {
#ifdef __HIP_PLATFORM_AMD__
// Workaround for a rocBLAS bug when using multiple graphics cards:
// https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1346
rocblas_initialize();
CUDA_CHECK(cudaDeviceSynchronize());
{
char version_string[64];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is fine it would be slightly better here to use rocblas_get_version_string_size to let rocblas tell you how big the buffer needs to be.

version_string[0] = '\0';
const rocblas_status status = rocblas_get_version_string(version_string, sizeof(version_string));
if (status != rocblas_status_success || version_string[0] < '4') {
Copy link
Collaborator

@IMbackK IMbackK Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like this too much as this will ofc fail if rocblas changes its version to 10.0.0 or whatever. I think we should make a bit more effort to parse this properly. Looking at rocblas (https://github.com/ROCm/rocBLAS/blob/59825a7367a24eed4e7e8a483820592089eaf17e/library/src/buildinfo.cpp#L29) it seams we would be on the safe side to use string_split<int> here https://github.com/ggerganov/llama.cpp/blob/df984e014714cba4c99ef894b20b51cbcef31b16/common/common.h#L439

however currently common.h is not used outside of the clients/examples and contains code that makes no sense in the backend.
@ggerganov maybe you can weigh in on if its ok to use this header here or if we should move function somewhere else.

rocblas_initialize();
CUDA_CHECK(cudaDeviceSynchronize());
}
}
#endif

ggml_cuda_device_info info = {};
Expand Down
Loading