cublas Cuda 801 on Maxwell Titan X #1447

jmalfara · 2023-11-07T16:30:12Z

Its an old card I know but hopefully there is something that can be done.

https://github.com/ggerganov/whisper.cpp/blob/master/ggml-cuda.cu#L7069-#L7071

There seems to be an issue on Maxwell cards not supporting some type of function in Cuda. Im not sure exactly what instruction is not supported but maybe someone can provide some insights?

whisper_init_from_file_with_params_no_state: loading model from '/usr/src/app/dist/lib/whisper/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  464.68 MB
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX TITAN X, compute capability 5.2
whisper_model_load: model size    =  464.44 MB
whisper_init_state: kv self size  =   15.75 MB
whisper_init_state: kv cross size =   52.73 MB
whisper_init_state: compute buffer (conv)   =   25.82 MB
whisper_init_state: compute buffer (encode) =  122.14 MB
whisper_init_state: compute buffer (cross)  =    5.96 MB
whisper_init_state: compute buffer (decode) =   36.27 MB

system_info: n_threads = 4 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

run: processing './tmp/7b21d44b-278c-48a1-a68c-5e27a49b2c7e.wav' (158800 samples, 9.9 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


CUDA error 801 at /usr/src/app/whisper.cpp/ggml-cuda.cu:7071: operation not supported
current device: 0

In this sample I manually disabled the tensor cores by forcing GGML_CUDA_FORCE_MMQ but the issue still exists

An important thing to note is that I compiled the library on a device with a 3070. That could likely be a root cause

The text was updated successfully, but these errors were encountered:

jmalfara · 2023-11-07T18:33:12Z

Commenting out #L7071 stops this error but im still curious as to what instruction wasn't supported 🤔

bobqianic · 2023-11-07T19:42:18Z

Commenting out #L7071 stops this error but im still curious as to what instruction wasn't supported 🤔

This actually amounts to burying one's head in the sand because you've only eliminated the error message, but the error itself still exists. You can set CUDA_ARCH_FLAG=all in Makefile to solve this problem.

whisper.cpp/ggml-cuda.cu

Lines 187 to 198 in 6a5d195

    
           #define CUDA_CHECK(err)                                                                 \ 
        
               do {                                                                                \ 
        
                   cudaError_t err_ = (err);                                                       \ 
        
                   if (err_ != cudaSuccess) {                                                      \ 
        
                       int dev_id;                                                                     \ 
        
                       cudaGetDevice(&dev_id);                                                         \ 
        
                       fprintf(stderr, "\nCUDA error %d at %s:%d: %s\n", err_, __FILE__, __LINE__, \ 
        
                           cudaGetErrorString(err_));                                              \ 
        
                       fprintf(stderr, "current device: %d\n", dev_id);                                \ 
        
                       exit(1);                                                                    \ 
        
                   }                                                                               \ 
        
               } while (0)

whisper.cpp/Makefile

Lines 200 to 216 in 6a5d195

    
           ifdef WHISPER_CUBLAS 
        
           	ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1) 
        
           		CUDA_ARCH_FLAG=native 
        
           	else 
        
           		CUDA_ARCH_FLAG=all 
        
           	endif 
        
           	CFLAGS      += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include 
        
           	CXXFLAGS    += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include 
        
           	LDFLAGS     += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib 
        
           	WHISPER_OBJ += ggml-cuda.o 
        
           	NVCC        = nvcc 
        
           	NVCCFLAGS   = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG) 
        
           ggml-cuda.o: ggml-cuda.cu ggml-cuda.h 
        
           	$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -Wno-pedantic -c $< -o $@ 
        
           endif

joshuachris2001 · 2023-11-08T19:38:01Z

I have a NVIDIA GeForce GTX 860M, and I am suddenly having the same issue since the last pull.
but it has worked fine before the update.
I forced CUDA_ARCH_FLAG to all and the error persists.

EDIT: yep falling back to commit fa8dbdc [1.4.0] and the gpu works perfectly.

jmalfara · 2023-11-09T14:48:30Z

Commenting out #L7071 stops this error but im still curious as to what instruction wasn't supported 🤔

This actually amounts to burying one's head in the sand because you've only eliminated the error message, but the error itself still exists. You can set CUDA_ARCH_FLAG=all in Makefile to solve this problem.

whisper.cpp/ggml-cuda.cu

Lines 187 to 198 in 6a5d195

#define CUDA_CHECK(err) \

do { \

cudaError_t err_ = (err); \

if (err_ != cudaSuccess) { \

int dev_id; \

cudaGetDevice(&dev_id); \

fprintf(stderr, "\nCUDA error %d at %s:%d: %s\n", err_, __FILE__, __LINE__, \

cudaGetErrorString(err_)); \

fprintf(stderr, "current device: %d\n", dev_id); \

exit(1); \

} \

} while (0)

whisper.cpp/Makefile

Lines 200 to 216 in 6a5d195

ifdef WHISPER_CUBLAS

ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1)

CUDA_ARCH_FLAG=native

else

CUDA_ARCH_FLAG=all

endif

CFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include

CXXFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include

LDFLAGS += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib

WHISPER_OBJ += ggml-cuda.o

NVCC = nvcc

NVCCFLAGS = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG)

ggml-cuda.o: ggml-cuda.cu ggml-cuda.h

$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -Wno-pedantic -c $< -o $@

endif

I didn't notice the exit(1). That makes way more sense compared to a print line causing a crash...

jmalfara · 2023-11-09T15:50:45Z

I have a NVIDIA GeForce GTX 860M, and I am suddenly having the same issue since the last pull. but it has worked fine before the update. I forced CUDA_ARCH_FLAG to all and the error persists.

EDIT: yep falling back to commit fa8dbdc [1.4.0] and the gpu works perfectly.

Forcing CUDA_ARCH_FLAG still results in the problem for me as well. What interesting in my case is fa8dbdc [1.4.0] doesn't work on my machine with docker, even with cpu only. There are no errors it just exits. I'll continue to investigate but at least i'm not the only one who saw this issue.

joshuachris2001 · 2023-11-24T01:14:35Z

I don't mean to poke, but this is still an issue.
for context I am using an NVIDIA GeForce GTX 860M.

ggerganov · 2023-11-24T07:43:19Z

Does it work if you apply this patch?

diff --git a/ggml-cuda.cu b/ggml-cuda.cu
index b420330..9da239a 100644
--- a/ggml-cuda.cu
+++ b/ggml-cuda.cu
@@ -96,7 +96,7 @@
 // -  7B quantum model: +100-200 MB
 // - 13B quantum model: +200-400 MB
 //
-//#define GGML_CUDA_FORCE_MMQ
+#define GGML_CUDA_FORCE_MMQ
 
 // TODO: improve this to be correct for more hardware
 //       for example, currently fails for GeForce GTX 1660 which is TURING arch (> VOLTA) but does not have tensor cores

cebtenzzre · 2023-11-28T04:47:45Z

Does it work if you apply this patch?

The first commit with this issue is f96e1c5 (#1422). That patch doesn't help.

bobqianic added the question Further information is requested label Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cublas Cuda 801 on Maxwell Titan X #1447

cublas Cuda 801 on Maxwell Titan X #1447

jmalfara commented Nov 7, 2023 •

edited

Loading

jmalfara commented Nov 7, 2023

bobqianic commented Nov 7, 2023

joshuachris2001 commented Nov 8, 2023 •

edited

Loading

jmalfara commented Nov 9, 2023 •

edited

Loading

jmalfara commented Nov 9, 2023

joshuachris2001 commented Nov 24, 2023

ggerganov commented Nov 24, 2023

cebtenzzre commented Nov 28, 2023

cublas Cuda 801 on Maxwell Titan X #1447

cublas Cuda 801 on Maxwell Titan X #1447

Comments

jmalfara commented Nov 7, 2023 • edited Loading

jmalfara commented Nov 7, 2023

bobqianic commented Nov 7, 2023

joshuachris2001 commented Nov 8, 2023 • edited Loading

jmalfara commented Nov 9, 2023 • edited Loading

jmalfara commented Nov 9, 2023

joshuachris2001 commented Nov 24, 2023

ggerganov commented Nov 24, 2023

cebtenzzre commented Nov 28, 2023

jmalfara commented Nov 7, 2023 •

edited

Loading

joshuachris2001 commented Nov 8, 2023 •

edited

Loading

jmalfara commented Nov 9, 2023 •

edited

Loading