Releases: YellowRoseCx/koboldcpp-rocm
KoboldCPP-v1.82.1.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.82.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.81.1.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.80.3.yr0-ROCm
Update cmake-rocm-windows.yml
KoboldCPP-v1.79.1.yr1-ROCm
attempt 6700xt fix for cmake-rocm-windows.yml
KoboldCPP-v1.79.1.yr0-ROCm
attempt 6700xt fix for cmake-rocm-windows.yml
KoboldCPP-v1.78.yr0-ROCm
koboldcpp-rocm-1.78
- NEW: Added support for Flux and Stable Diffusion 3.5 models: Image generation has been updated with new arch support (thanks to stable-diffusion.cpp) with additional enhancements. You can use either fp16 or fp8 safetensor models, or the GGUF models. Supports all-in-one models (bundled T5XXL, Clip-L/G, VAE) or loading them individually.
- Grab an all-in-one flux model here: https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors
- Alternatively, we have a ready to use
.kcppt
template that will setup and download everything you need here: https://huggingface.co/koboldcpp/kcppt/resolve/main/Flux1-Dev.kcppt - Large image handling is also more consistent with VAE tiling, 1024x1024 should work nicely for SDXL and Flux.
- You can specify the new image gen components by loading them with
--sdt5xxl
,--sdclipl
and--sdclipg
(for SD3.5), they work with URL resources as well. - Note: FP16 Flux needs over 20GB of VRAM to work. If you have less VRAM, you should use the quantized GGUFs, or select Compress Weights when loading the Flux model. SD3.5 medium is more forgiving.
- As before, it can be used with the bundled StableUI at http://localhost:5001/sdui/
- Debug mode prints penalties for XTC
- Added a new flag
--nofastforward
, this forces full prompt reprocessing on every request. It can potentially give more repeatable/reliable/consistent results in some cases. - CLBlast support is still retained, but has been further downgraded to "compatibility mode" and is no longer recommended (use Vulkan instead). CLBlast GPU offload must now maintain duplicate a copy of the layers in RAM as well, as it now piggybacks off the CPU backend.
- Added common identity provider
/.well-known/serviceinfo
Haidra-Org/AI-Horde#466 PygmalionAI/aphrodite-engine#807 theroyallab/tabbyAPI#232 - Reverted some changes that reduced speed in HIPBLAS.
- Fixed a bug where bad logprobs JSON was output when logits were
-Infinity
- Updated Kobold Lite, multiple fixes and improvements
- Added support for custom CSS styles
- Added support for generating larger images (select BigSquare in image gen settings)
- Fixed some streaming issues when connecting to Tabby backend
- Better world info length limiting (capped at 50% of max context before appending to memory)
- Added support for Clip Skip for local image generation.
- Merged fixes and improvements from upstream
To use, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller.
If you're using Linux, clone the repo and build in terminal with make LLAMA_HIPBLAS=1 -j
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.
Release notes from: https://github.com/LostRuins/koboldcpp/releases/tag/v1.78
KoboldCPP-v1.77.yr1-ROCm
- Bring Speed Back
upstream llama.cpp introduced a change to calculate certain values in full 32 bit precision by default which introduced a major slow down for some users with AMD GPUs, this reverts that change until improvements are made
KoboldCPP-v1.77.yr0-ROCm
Update dependencies in cmake-rocm-windows.yml
KoboldCPP-v1.76.yr1-ROCm
version bump