C:\Users\Liam\Downloads>koboldcpp_rocm.exe *** Welcome to KoboldCpp - Version 1.68.yr0-ROCm For command line arguments, please refer to --help *** Attempting to use hipBLAS library for faster prompt ingestion. A compatible AMD GPU will be required. Initializing dynamic library: koboldcpp_hipblas.dll ========== Namespace(model=None, model_param='D:/AI/Shared/Bigger models/nous-capybara-34b.Q4_K_M.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=15, usecublas=['normal', '0', 'mmq'], usevulkan=None, useclblast=None, noblas=False, contextsize=2048, gpulayers=23, tensor_split=None, checkforupdates=False, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=15, lora=None, noshift=False, nommap=False, usemlock=False, noavx2=False, debugmode=1, skiplauncher=False, onready='', benchmark=None, multiuser=1, remotetunnel=False, highpriority=False, foreground=False, preloadstory=None, quiet=False, ssl=None, nocertify=False, mmproj=None, password=None, ignoremissing=False, chatcompletionsadapter=None, flashattention=False, quantkv=0, forceversion=0, smartcontext=False, hordemodelname='', hordeworkername='', hordekey='', hordemaxctx=0, hordegenlen=0, sdmodel='', sdthreads=15, sdclamped=0, sdvae='', sdvaeauto=False, sdquant=False, sdlora='', sdloramult=1.0, whispermodel='', hordeconfig=None, sdconfig=None) ========== Loading model: D:\AI\Shared\Bigger models\nous-capybara-34b.Q4_K_M.gguf The reported GGUF Arch is: llama --- Identified as GGUF model: (ver 6) Attempting to Load... --- Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead! System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | llama_model_loader: loaded meta data with 20 key-value pairs and 543 tensors from D:\AI\Shared\Bigger models\nous-capybara-34b.Q@ê╘▐llm_load_vocab: special tokens cache size = 267 llm_load_vocab: token to piece cache size = 0.3834 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 64000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 200000 llm_load_print_meta: n_embd = 7168 llm_load_print_meta: n_head = 56 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 60 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 7 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 20480 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 5000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 200000 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 30B llm_load_print_meta: model ftype = unknown, may not work (guessed) llm_load_print_meta: model params = 34.39 B llm_load_print_meta: model size = 19.24 GiB (4.81 BPW) llm_load_print_meta: general.name = nousresearch_nous-capybara-34b llm_load_print_meta: BOS token = 144 ' ' llm_load_print_meta: EOS token = 2 '<|endoftext|>' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: PAD token = 0 '' llm_load_print_meta: LF token = 315 '<0x0A>' llm_load_print_meta: EOT token = 7 '<|im_end|>' ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no llm_load_tensors: ggml ctx size = 0.64 MiB llm_load_tensors: offloading 23 repeating layers to GPU llm_load_tensors: offloaded 23/61 layers to GPU llm_load_tensors: ROCm0 buffer size = 7376.69 MiB llm_load_tensors: CPU buffer size = 19700.24 MiB ................................................................................................... Automatic RoPE Scaling: Using model internal value. llama_new_context_with_model: n_ctx = 2144 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 5000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 192.62 MiB Traceback (most recent call last): File "koboldcpp.py", line 4114, in File "koboldcpp.py", line 3773, in main File "koboldcpp.py", line 469, in load_model OSError: exception: access violation writing 0x0000000000000010 [37564] Failed to execute script 'koboldcpp' due to unhandled exception!