Releases: Nexesenex/croco.cpp
Croco.Cpp_FrankenFork_v1.83020_b4667
Test release for Ampere and beyond, for now.
New mod on Croco:
- KV cache quantization customizable independently for the draft model used for speculative decoding.
-> My favored choice for the draft model one is KV q5_0/iq4_nl (5BPW), but even iq4_nl/iq4_nl (4.5BPW) is viable, notably if you use an iq4_xs (or less) quant for your draft model.
-> As for the main model, i suggest KV q8_0/q5_0 (7BPW) if you're about quality, q6_0/q5_0 (6BPW) if you're about a compromise, and q6_0/iq4_nl if you want to be savvy (5.5BPW) without much loss. q5_0/iq4_nl being the minimum not-too-lossy (less than +1% ppl compared to KV f16) in such case.
-> KV q4_0 (4.5BPW, +5% ppl) is left for legacy purpose, and in case of bug, but if KV iq4_nl works for you, got for it, it's the same size and.. MUCH better (+2.5% ppl).
Also, reduction of the Blas Batch Size for the draft model : a draft model is logically smaller than the main model, so both TG and PP are much faster, and thus, the BBS can be shrunk.
Also, all the supported FA KV quants are back, the previous Croco versions were shrunk a bit by being compiled without some of the usual FA quants.
Full Changelog: v1.83007_b4608...v1.83020_b4467
v1.83007_b4608
Full Changelog: v1.83005_b4569...v1.83007_b4608
Croco.Cpp_FrankenFork_v1.83005_b4569
Full Changelog: v1.83004_b4569...v1.83005_b4569
Croco.Cpp_FrankenFork_v1.83004_b4569
Full Changelog: v1.83002_b4517...v1.83004_b4569
Croco.Cpp_FrankenFork_v1.83002_b4517
Full Changelog: v1.83001_b4517...v1.83002_b4517
Croco.Cpp_FrankenFork_v1.83001_b4517
Full Changelog: v1.83000_b4517...v1.83001_b4517
Croco.Cpp_FrankenFork_v1.83000_b4517
Full Changelog: v1.82131_b4502...v1.83000_b4517
Croco.Cpp_FrankenFork_v1.82132_b4502
Full Changelog: v1.82131_b4502...v1.82132_b4502
Croco.Cpp_FrankenFork_v1.82131_b4502
Full Changelog: v1.82021_b4502...v1.82131_b4502
-> Update first your drivers to Nvidia Cuda 12.6.3 compatible ones (aka. very recent drivers) if it doesn't work.
Croco.Cpp_FrankenFork_v1.82021_b4502
Full Changelog: v1.82020_b4491...v1.82021_b4502