-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple GGUF files #379
Conversation
Code Metrics Report=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== Dockerfile 1 34 25 0 9 Happy 1 442 369 0 73 JSON 9 21 21 0 0 Python 24 864 731 25 108 TOML 15 403 365 1 37 ------------------------------------------------------------------------------- Jupyter Notebooks 1 0 0 0 0 |- Markdown 1 60 30 22 8 |- Python 1 96 87 1 8 (Total) 156 117 23 16 ------------------------------------------------------------------------------- Markdown 16 1056 0 782 274 |- BASH 6 203 190 0 13 |- Python 6 121 110 0 11 |- Rust 3 185 172 9 4 (Total) 1565 472 791 302 ------------------------------------------------------------------------------- Rust 93 29569 26925 433 2211 |- Markdown 47 468 0 455 13 (Total) 30037 26925 888 2224 =============================================================================== Total 161 32389 28436 1241 2712 =============================================================================== |
Oh no I was a bit too slow getting my PR ready 💀 I'm not used to contributing on a project where the change I work on is so prone to conflicts from other activity 😓 Should I have a draft PR that I iterate on instead to better communicate this? |
Some GGUF models are very large and are sharded into multiple files. Mistral.rs supports this, and to use it, delimit the `.gguf` filenames with a space as such: | ||
|
||
```bash | ||
./mistralrs-server --chat-template <chat_template> gguf -m . -f "a.gguf b.gguf" | ||
``` | ||
|
||
For the Python API, a list of strings is also accepted for this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example GGUF file where this is actually the case? Or was this a misunderstanding from referenced motivation in #380 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the sharding is necessary other than as this comment suggests, to workaround size restrictions on a file host?
Seems like it'd be more appropriate to have a tool that concatenates the files together if there's a UX issue you want to address, but I don't really see the point in adding additional complexity / maintenance to support something that should realistically be addressed outside of mistral.rs
(runtime) 🤷♂️
Similar to the tokenizer patching you added recently, these are workarounds that seem more appropriate as CLI tools to apply "fixes" 🤔
Would that be for developing a new feature? If so, that sounds good. Otherwise, we have a Discord for this purpose :) |
No, I was just refactoring on the GGUF tokenizer, but this PR touched quite a bit that I know rebasing my work won't be something I'm interested in doing.
I think you saw my github comment on candle about what I was doing before this PR was merged, you had given it a 👍 Good to know though, my mistake. |
Ah sorry, I thought that was a different refactor. I will roll back these changes, as it would be great to see your changes. Generally, multiple GGUF files is a rare occurrence. Would your PR include support for those? |
No, it was just focused on tidying up the GGUF metadata tokenizer file you already put together, and would then have had the On Linux, from what I've read the file is split in parts that you can just run something like: cat goliath-120b.Q6_K.gguf-split-* > goliath-120b.Q6_K.gguf A small CLI tool could do similar to glob the related files and merge the output. I don't know for sure but I assume users would be fine with piecing it back into a single file. I haven't spent much time reading over your PR changes for the feature, so I could be mistaken on your integration to workaround multi-part files, but at a glance it seemed to interleave a bit (multiple loops and other additions to carry support). Since I didn't implement my PR on the model files, the conflicting files aren't much. I just don't have the energy atm to compare changes I'd need to do. |
Nice! Perhaps we can add such a tool. This PR was a bit hackey, so I actually just rolled back this PR. There is just 1 (easy) conflict now in #389. |
* Intial work on phi3v * Add the image embedding layer * Lints * Implement the loader * Add infrastructure for phi3 image processor * Merge * Merge * Merge * Merge * Partially implement padding * Implement the hd transform step * Work on the image processor * Clippy * Complete the phi3v inputs processor * Rename * Merge * Merge * Rename to phi3v and fix deser * Fix varbuilder * Fix varbuilder * Default for do convert rgb * Some defaults * Allow no processor config * Setup debug flag * Add phi3v * Implement messages flattening * Update * Rewrite the pad, hd transform * Clippy * Detect num channels * Fix reshape * Fix global image channel dim * Fix assert * Fix dtype * Fix gt * Fix image id neg * Fix dim0 of pixel values * Fix dtype * Check if model supports gemm * Fix some shape errors * Fix some shape errors * Fix rank of slice_assign * Fix image toks * Properly downcase * Fix response * Fix response * Allow no images in prompt * Output correct hidden state * Fix nonzero and add test * Fix n image toks * Add mistralrs_vision * Typo * Fix and add tests * Fix indexing * Fix test condition * Fix unsqueeze * Fix dtype for norm * Update clip * Clippy * Run clip in f32 * Run in bf16 * Run in bf16 again * Fix dtype * Set toks to have correct context lens * Set toks to have correct context lens * Support multiple GGUF files (#379) * Move to gguf module * Add content abstraction for multiple gguf files * Fix test * Allow specifying and loading multiple gguf files * Update docs and examples * Print some info * Merge * Organize normal loading metadata (#381) * Organize normal loading metadata * Fix * Bump version 0.1.13 -> 0.1.14 (#382) * Patch incorrect unwrap and bump version (#383) * Patch incorrect unwrap * Bump version to 0.1.15 * More verbose logging during loading (#385) * More verbose logging when loading * More logging * Refactor enabling debug logging (#387) * Refactor enabling debug logging * Fix reversed order * Merge * Merge * Merge * Use precise gelu * Use correct kernel * Debugging commit * Add fused bias linear * Finish merge * Use fused layer in clip * Save progress * Remove debugs * Update example * Resize exact * Update interpolate * Fix batch dim * Update test and transform * It works * Add some examples * Allow more than one image * Add support in python api * Add to toml selector * Update python api * Overhaul readme and docs * Update * Export vision arch * Export vision arch * Export vision arch * Fix max img dim * Fix unwrap
* Intial work on phi3v * Add the image embedding layer * Lints * Implement the loader * Add infrastructure for phi3 image processor * Merge * Merge * Merge * Merge * Partially implement padding * Implement the hd transform step * Work on the image processor * Clippy * Complete the phi3v inputs processor * Rename * Merge * Merge * Rename to phi3v and fix deser * Fix varbuilder * Fix varbuilder * Default for do convert rgb * Some defaults * Allow no processor config * Setup debug flag * Add phi3v * Implement messages flattening * Update * Rewrite the pad, hd transform * Clippy * Detect num channels * Fix reshape * Fix global image channel dim * Fix assert * Fix dtype * Fix gt * Fix image id neg * Fix dim0 of pixel values * Fix dtype * Check if model supports gemm * Fix some shape errors * Fix some shape errors * Fix rank of slice_assign * Fix image toks * Properly downcase * Fix response * Fix response * Allow no images in prompt * Output correct hidden state * Fix nonzero and add test * Fix n image toks * Add mistralrs_vision * Typo * Fix and add tests * Fix indexing * Fix test condition * Fix unsqueeze * Fix dtype for norm * Update clip * Clippy * Run clip in f32 * Run in bf16 * Run in bf16 again * Fix dtype * Set toks to have correct context lens * Set toks to have correct context lens * Support multiple GGUF files (#379) * Move to gguf module * Add content abstraction for multiple gguf files * Fix test * Allow specifying and loading multiple gguf files * Update docs and examples * Print some info * Merge * Organize normal loading metadata (#381) * Organize normal loading metadata * Fix * Bump version 0.1.13 -> 0.1.14 (#382) * Patch incorrect unwrap and bump version (#383) * Patch incorrect unwrap * Bump version to 0.1.15 * More verbose logging during loading (#385) * More verbose logging when loading * More logging * Refactor enabling debug logging (#387) * Refactor enabling debug logging * Fix reversed order * Merge * Merge * Merge * Use precise gelu * Use correct kernel * Debugging commit * Add fused bias linear * Finish merge * Use fused layer in clip * Save progress * Remove debugs * Update example * Resize exact * Update interpolate * Fix batch dim * Update test and transform * It works * Add some examples * Allow more than one image * Add support in python api * Add to toml selector * Update python api * Overhaul readme and docs * Update * Export vision arch * Export vision arch * Export vision arch * Fix max img dim * Fix unwrap
Support multiple GGUF files by refactoring the GGUF
Content
usage.