Microsoft Windows [Version 10.0.22631.3880] (c) Microsoft Corporation. All rights reserved. <>\koboldcpp-rocm\YellowRose_FORK\koboldcpp-rocm>koboldcpp_rocm.exe *** Welcome to KoboldCpp - Version 1.70.yr0-ROCm (nice) Deleted orphaned pyinstaller dir: C:\Users\<>\AppData\Local\Temp\_MEI101642 Deleted orphaned pyinstaller dir: C:\Users\<>\AppData\Local\Temp\_MEI201442 For command line arguments, please refer to --help *** Attempting to use hipBLAS library for faster prompt ingestion. A compatible AMD GPU will be required. Initializing dynamic library: koboldcpp_hipblas.dll ========== Namespace(model='', model_param='<>/Gemmasutra-Pro-27B-v1_Q4km.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=5, usecublas=['normal', '1', 'mmq'], usevulkan=None, useclblast=None, noblas=False, contextsize=8192, gpulayers=55, tensor_split=None, checkforupdates=False, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=5, lora=None, noshift=False, nommap=False, usemlock=False, noavx2=False, debugmode=0, skiplauncher=False, onready='', benchmark=None, multiuser=1, remotetunnel=False, highpriority=False, foreground=False, preloadstory=None, quiet=False, ssl=None, nocertify=False, mmproj=None, password=None, ignoremissing=False, chatcompletionsadapter=None, flashattention=False, quantkv=0, forceversion=0, smartcontext=False, unpack='', hordemodelname='', hordeworkername='', hordekey='', hordemaxctx=0, hordegenlen=0, sdmodel='', sdthreads=5, sdclamped=0, sdvae='', sdvaeauto=False, sdquant=False, sdlora='', sdloramult=1.0, whispermodel='', hordeconfig=None, sdconfig=None) ========== Loading model: <>\Gemmasutra-Pro-27B-v1_Q4km.gguf The reported GGUF Arch is: gemma2 Arch Category: 0 --- Identified as GGUF model: (ver 6) Attempting to Load... --- Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead! It means that the RoPE values written above will be replaced by the RoPE values indicated after loading. System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | llama_model_loader: loaded meta data with 38 key-value pairs and 508 tensors from <>\Gemmasutra-Pro-27B-v1_Q4km.ggllm_load_vocab: special tokens cache size = 217 llm_load_vocab: token to piece cache size = 1.6014 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = gemma2 llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 256000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4608 llm_load_print_meta: n_layer = 46 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 16 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 4096 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 2 llm_load_print_meta: n_embd_k_gqa = 2048 llm_load_print_meta: n_embd_v_gqa = 2048 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 36864 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 27B llm_load_print_meta: model ftype = unknown, may not work (guessed) llm_load_print_meta: model params = 27.23 B llm_load_print_meta: model size = 15.50 GiB (4.89 BPW) llm_load_print_meta: general.name = Gemmasutra Pro 27B V1F llm_load_print_meta: BOS token = 2 '' llm_load_print_meta: EOS token = 1 '' llm_load_print_meta: UNK token = 3 '' llm_load_print_meta: PAD token = 0 '' llm_load_print_meta: LF token = 227 '<0x0A>' llm_load_print_meta: EOT token = 107 '' llm_load_print_meta: max token length = 48 ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no llm_load_tensors: ggml ctx size = 0.53 MiB llm_load_tensors: offloading 46 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 47/47 layers to GPU llm_load_tensors: ROCm0 buffer size = 15868.49 MiB llm_load_tensors: CPU buffer size = 922.85 MiB ........................................................................................... Automatic RoPE Scaling: Using (scale:1.000, base:10000.0). llama_new_context_with_model: n_ctx = 8288 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 2978.50 MiB llama_new_context_with_model: KV self size = 2978.50 MiB, K (f16): 1489.25 MiB, V (f16): 1489.25 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.98 MiB llama_new_context_with_model: ROCm0 compute buffer size = 584.38 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 41.38 MiB llama_new_context_with_model: graph nodes = 1850 llama_new_context_with_model: graph splits = 2 Load Text Model OK: True Embedded KoboldAI Lite loaded. Embedded API docs loaded. Starting Kobold API on port 5001 at http://localhost:5001/api/ Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/ ====== Please connect to custom endpoint at http://localhost:5001