-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA #29587
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge work @pacman100 ! 🚀
Overall it looks great on quantization end, I left few nits - i've also left an open question about accelerate min version, perhaps just a warning for users should suffice as upgrading accleerate to 0.26.0 might be too brutal for users
Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
@younesbelkada, addressed the comments in the latest commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge work @pacman100 ! thanks very much for this ! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Looks great! Added a small suggestion to simplify things a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work - thanks for adding this!
Just two small comments / questions
model_to_load, key, "cpu", torch.empty(*param.size(), dtype=dtype) | ||
) | ||
else: | ||
hf_quantizer.create_quantized_param(model, param, key, "cpu", state_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I've understood - we don't need this anymore as we move creating quantized params when loading in the state dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hf_quantizer is None
, i.e., not quantized will always be False as the conditional logic if is_fsdp_enabled() and not is_local_dist_rank_0() and not is_quantized
has this and as such this inner conditional is no longer required. The quantized parameters are initilaized in the else logic on line 3950
@@ -1958,7 +1973,8 @@ def _get_resized_lm_head( | |||
if new_num_tokens is None: | |||
return old_lm_head | |||
|
|||
if is_deepspeed_zero3_enabled(): | |||
is_quantized = hasattr(self, "hf_quantizer") and self.hf_quantizer is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we couldn't have a is_quantized
property of the model, which is by default False
? Having to constantly define is_quantized
within the methods isn't ideal, as it requires updating in many different places if the criteria change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have property hf_quantizer
for the model, storing is_quantized
would duplicate the same information. Earlier, I was directly using self.hf_quantizer is None
in checks but there were suggestions above to improve readability using is_quantized
, and as such, I made the changes accordingly.
Co-Authored-By: Zach Mueller <7831895+muellerzr@users.noreply.github.com>
Really amazing work @pacman100 Thanks so much for this! ❤️ |
What does this PR do?
bitsandbytes
has now addedquant_storage
option for 4-bit parameters which is required for FSDP+QLoRA support. This PR adds that option and fixes the corresponding parameter calculation.zero.init
when using DeepSpeed with QLoRA.This PR should be merged after Accelerate PR huggingface/accelerate#2544.