-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: GGUF + GGML Loaders with ModelKind
#356
refactor: GGUF + GGML Loaders with ModelKind
#356
Conversation
These methods are very verbose, but really only differ by two params to differentiate from Lora vs XLora.
`ModelConfig` groups the common properties used across the `from_gguf()` methods. This will better communicate differences across impl of `from_gguf()`. The quantized xlora models `from_gguf()` now have a prop to param forwarder as a workaround to minimize breakage elsewhere.
Very similar to the `from_gguf`, except only `quantized_llama.rs` xlora supports this. No `Device` params, slightly different `File` params from GGUF type.
Finally, all this extra boilerplate can be shifted into the model config `Adapter` struct to self-contain in a single method. This required adjusting ownership a little to satisfy the compiler. The original `from_gguf()` and `from_ggml()` methods are unaffected, they still receive the expected params as reference.
This introduces a slight breaking change, in that using these `from_gguf()` / `from_ggml()` methods now requires importing the trait into scope. The methods drop the `pub` prefix as they inherit `pub` from the trait requirement itself. The purpose of this trait is to not require each model to duplicate the structs to params mapping helper method. Instead that can be centralized.
These no longer need to be maintained as copies within the supported model modules. They now leverage the common shared traits and take an annotated type parameter to handle. The syntax for usage is presently a little more verbose than desired.
For reference, these alternatives could be considered.
- Impl traits for the non-adapter quant models - Since adapter variants only append parameters and that is now a distinct struct, `model_config` can be defined earlier and a helper `with_adapter()` can convert to the adapter type variant.
With a rebase to adopt new methods for `ModelKind`, the shared logic can be hoisted out of the match arms. XLora specific variables were moved into `Adapter::try_new()` (`model_config.rs`) as they can share the same `paths` parameter by adding a separate bool to toggle x-lora usage. By hoisting `model_config` variable out of the match arm, the type would change when calling `with_adapter()`, thus to prevent that the separate `Adapter*` tuple structs have been dropped in favor of `ModelParams<Q>` which uses generic `Q` for the quantization type (trait marker) and a separate adapter optional that can be updated. `MapParamsToModel<T>` also is no longer compatible as an approach since the trait bound is ambiguous as there is no distinct adapter type (eg: `for AdapterGGUF`) to impl upon, unique method names also become required to avoid conflict on the same type. - `try_into_model()` + `try_into_model_with_adapter()` for the separate `Q` types (GGUF/GGML) with/without adapter. - Due to new struct introduced, slight change to destructuring. The `impl` also bundles both methods now for each `Q` variant. Order was adjusted to basic followed by adapter methods for each `Q` variant, instead of both basic, then both adapter variations following afterwards. - Likewise the `ggml.rs` and `gguf.rs` methods without the `MapParamsToModel<T>` trait now rely on `TryFrom` trait impl.
This approach introduces another generic parameter `MaybeAdapter` to emulate a `Option<Adapter>` that can be used as type to `impl` upon. To continue the unified type usage with an adapter variant in `ggml.rs` / `gguf.rs` pipelines, this must now leverage an enum for the two variants. - Slightly more complexity added as a result. - Adapter `try_into_model()` methods no longer need to check for `Some(Adapter)` to unwrap, since that should always be the case. This is now guaranteed. - However similar logic has bubbled up to the `TryFrom` for all impl due to the enum wrapper, thus this approach may not be much better beyond broader consistency. Likewise with the `with_adapter()` method. To minimize boilerplate in handling unwrapping of the enum in the `TryFrom` methods, `Variantly` has been introduced for it's `expect_variant()` method. As all four types are distinct, the `_with_adapter()` method can also be `try_into_model()` due to separate impl for the new generic param `MaybeAdapter`.
Since the type constraint for `try_into_model()` methods is bound as the return type, it can be inferred without any hint in the `TryFrom`, no need to annotate with `Self`. Use `derive_more` for terser construction of `Config` struct for `ModelParams` variants.
This is an alternative approach to build the config. Construction of the config from the param inputs is handled at the end now, not dependent upon separate `new()` + optional `with_adapter()` calls on a mutable variable. Unfortunately `buildstructor` and `typed-builder` APIs don't allow for easy flexibility of builder methods in different scopes (_due to moves_). `derive-builder` can do this but not with the more complex types due to lack of a `Copy` / `Clone`. Thus the `None` option is required as input regardless of if an adapter is needed.
This better communicates the block is only relevant to assigning this value. While the two `is_lora` / `is_xlora` variables are hoisted above due to usage later as metadata inputs.
`pipeline/gguf.rs` + `pipeline/ggml.rs` now ensure that `activate_adapters()` works for X-LoRA too. This is assumed as a bugfix due to the `XLoraLlama` model the two adapter kinds share along with code everywhere else checking `is_xlora`, no other usage of `is_lora` seems to be used. - To ensure further ambiguity is avoided, the condition is better communicated as `has_adapter`. - It is unclear if all usage of `is_xlora` is specific to X-LoRA or also intended to be applicable to LoRA since `XLora*` models do impl `is_xlora() -> true` (except Gemma, which is a potential bug). `pipeline/normal.rs` handled it's own `is_xlora` bool differently than `gguf.rs` / `ggml.rs` loaders. - It relied upon`model.is_xlora() && !is_lora`, but we already assume X-LoRA via prior matching on `ModelKind` which now provides this information via it's own `is_x_lora()` method. - Only `xlora_models/gemma.rs` would behave differently with this change, but Gemma might have meant to return `true`?
Matches are only for `Quantized` or `AdapterQuantized` variants with no difference in handling by `AdapterKind` variant used. Additionally restores the `GGUF X-LoRA` bail formatted string. For consistency the non-adapter branch also appends `for GGUF` and the architecture in the lora branch now comes before the `ModelKind`.
A better approach for the most part at encoding the kind info.
Code Metrics Report=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== Dockerfile 1 34 25 0 9 Happy 1 442 369 0 73 JSON 9 21 21 0 0 Python 21 741 622 21 98 TOML 15 393 355 1 37 ------------------------------------------------------------------------------- Jupyter Notebooks 1 0 0 0 0 |- Markdown 1 60 30 22 8 |- Python 1 96 87 1 8 (Total) 156 117 23 16 ------------------------------------------------------------------------------- Markdown 16 1054 0 781 273 |- BASH 6 203 190 0 13 |- Python 6 121 110 0 11 |- Rust 3 185 172 9 4 (Total) 1563 472 790 301 ------------------------------------------------------------------------------- Rust 86 28448 26034 379 2035 |- Markdown 42 439 0 427 12 (Total) 28887 26034 806 2047 =============================================================================== Total 151 31133 27426 1182 2525 =============================================================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polarathene, this looks excellent, and I think it makes it much more idiomatic and type safe which is always better. I wrote some comments, and it looks like there are a few clippy warnings, one of which can be ignored.
|
||
// Create VarBuilder: | ||
// TODO: `from_mmaped_safetensors` has `xlora_paths` as the 2nd param, is this a bug? | ||
let vb = from_mmaped_safetensors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, perhaps we should rename xlora_paths to xlora_classifier_path though as that is really what it carries here.
The 2nd param of from_mmaped_safetensors takes the adapters paths, perhaps we should rename that accordingly too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd param of from_mmaped_safetensors takes the adapters paths, perhaps we should rename that accordingly too?
Yes. I think more clearer names or at least some associated comments can assist with maintenance. Especially those that aren't familiar with all the ML / LLM jargon and only understand how to read code 😓
Can address that in a smaller follow-up PR and you can suggest the new names then? As mentioned I already was working on from_mmaped_safetensors()
rewrite. Hopefully no merge conflicts there when I return to it 😝
mistralrs-core/src/pipeline/gguf.rs
Outdated
@@ -657,8 +583,8 @@ impl Pipeline for GGUFPipeline { | |||
} | |||
} | |||
fn activate_adapters(&mut self, adapter_names: Vec<String>) -> anyhow::Result<usize> { | |||
if !self.metadata.is_lora { | |||
anyhow::bail!("Cannot activate adapters non-LoRA models.") | |||
if !self.metadata.has_adapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should remain as !self.metadata.is_lora
because for X-LoRA, the adapter set must remain the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, could we better communicate this?
LoRA models share the same as X-LoRA when they're created? GGUF/GGML is converted into XLoraLlama
for example, but this has an is_xlora() -> true
method.
Something seems off, or for someone like me not too familiar with these terms, other than you were involved in developing X-LoRA, I don't know how the usage differs in respect to XLoraLlama
vs GeneralMetadata
since is_xlora
and is_xlora()
kind of conflict there, but the model and pipeline are presumably related?
Is activate_adapters()
only meant for LoRA? Or for any potential AdapterKind
that isn't X-LoRA?
// Pseudo-code of what might be a better way to query?
if self.metadata.adapter.is_some_and(|a| !a.is_xlora()) { activate } else { bail }
mistralrs-core/src/pipeline/ggml.rs
Outdated
@@ -508,8 +478,8 @@ impl Pipeline for GGMLPipeline { | |||
} | |||
} | |||
fn activate_adapters(&mut self, adapter_names: Vec<String>) -> anyhow::Result<usize> { | |||
if !self.metadata.is_lora { | |||
anyhow::bail!("Cannot activate adapters non-LoRA models.") | |||
if !self.metadata.has_adapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should remain as !self.metadata.is_lora
because for X-LoRA, the adapter set must remain the same.
mistralrs-core/src/pipeline/mod.rs
Outdated
pub is_lora: bool, | ||
// Model utilizes an adapter (LoRA or X-LoRA): | ||
pub has_adapter: bool, | ||
// TODO: There is a possibility that code mistakenly queries only `is_xlora`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, it is used for the X-LoRA cache management and inputs generation.
FileGGML { ct, gqa }, | ||
) = self.quant; | ||
|
||
// Forwards all structured fields above into the required flattened param sequence: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for the others, but perhaps the models can take the structured input? I would like to start that process for the plain models too, as their ISQ + device mapping args are hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I plan to, but as mentioned I wanted to minimize running into conflicts. My first approach was just getting something that was an improvement and moving most changes into this model_config.rs
module to minimize conflicts as I spent quite a bit of time trying a few different things 😅
I still have considerations for ModelKind
to move the quantization and adapter into the Plain
/ Normal
variant via struct fields.
After this PR I can look into addressing the concern with the prop drilling (main.rs
of each CLI seems to have a lengthy chain to go from CLI args to Loader). There's plenty to tackle, I've just got to do it in smaller bursts I think due to the activity in the repo going on.
@@ -736,6 +736,7 @@ impl NormalModel for XLoraModel { | |||
fn device(&self) -> &Device { | |||
&self.device | |||
} | |||
// BUG?: This is the only x-lora model with `is_xlora() -> false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thanks for catching that. As a long term and safer solution, maybe we can get this information from the ModelKind
instead of using hardcoded values? Just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 I was thinking that made sense too.
Should is_xlora()
be true
here when the AdapterKind
is Lora
instead of XLora
? That was one of my concerns about how these two are treated since presently these would return true
for Lora
kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great, it makes the code much more type safe and (I think) easier to read! It looks like there are a few clippy warnings here, some of which, like the enum size one, can probably be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context for review @EricLBuehler
I'll go take care of appeasing the lint gods now, followed by resolving the conflict.
EDIT: Oh you already beat me to it 😂
I collapsed comments that don't need any attention and responded to your earlier review feedback for any overlapping discussion I brought up.
I'll avoid the longer walkthrough in future refactorings since that seemed unnecessary :)
Todo:
- Replace
has_adapter: bool
withadapter: AdapterKind
, or store the entireModelKind
inGeneralMetadata
. Update GGUF + GGMLactivate_adapters()
to restrict method specifically toAdapterKind::Lora
. - Get clarification on what
ct
andgqa
are abbreviations of, add contextual comments to struct fields. - Replace
is_xlora()
model methods to rely on queryingModelKind
. If usage is only for X-LoRA, fix bug of LoRA models outputting true, or have callers specifically callis_xlora()
on theModelKind
as a field of the model. May defer to follow-up PR. - Follow-up PR to replace
from_gguf()
/from_ggml()
method signatures to use struct inputs for simplified param inputs. - Follow-up PR better param names for
from_mmaped_safetensors()
.
`model_config.rs` GGUF and GGML structs prefixed with `Params`. Two exceptions added as the concerns don't seem to warrant change: - `#[allow(clippy::borrowed_box)]` - `#[allow(clippy::large_enum_variant)]`
This file has no other change in the commit beyond line ending conversion. It was mistakenly using CRLF since creation.
`GeneralMetadata` now stores the `ModelKind` for this type of information. `activate_adapters()` error message revised. `mod.rs` version includes contextual comment about X-LoRA not being equivalent.
Most likely redundant with `GeneralMetadata` now having `ModelKind` to query, but fixing here until all queries replaced. Additionally updates `model_config.rs` note to clarify not a bug.
I think I've addressed the feedback to get this reviewed again and merged. I have a checklist above of feedback to still tackle but I don't think it needs to block this PR from merging. EDIT: Oh, there's a markdown example with using Python that does use the |
While waiting on review here, I'm going over mistral.rs/mistralrs-core/src/utils/varbuilder_utils.rs Lines 87 to 88 in 527e7f5
mistral.rs/mistralrs-core/src/utils/varbuilder_utils.rs Lines 78 to 93 in 527e7f5
I'm not sure what |
I'll write a review soon, but just responding to your comments:
If you could add a more type-safe option to the Python API that would be great too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, almost ready to merge! Just one question about a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polarathene, is it OK to merge? Looks good on my end.
Yes happy to get this merged! 🚀 |
Sounds good! I wanted to check and see if you wanted to implement the other points, but I assume you plan on doing that in a future PR? |
Thank you! |
This is mostly where I wanted to get it functionally. Still plenty more to do, but there is a lot of churn going on lately that it's starting to introduce conflicts when rebasing, so I'd rather get this through review.
I'll provide more context in about 12 hours, but the commit history is iterative with detailed commit messages on changes. Some of it became redundant and replaced, but diff may be easier to follow if helpful.
ModelKind
workarounds dropped, so the newer version is now used as requested.from_gguf()
andfrom_ggml()
methods, I can address that in a follow-up PR. These methods now use a trait that needs to be imported to use them.I'll handle the clippy and rustfmt feedback when I return 😓