-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepspeed param check #1015
Deepspeed param check #1015
Conversation
On line 146, in set_module_tensor_to_device(), adding a check for deepspeed parameters in the kwargs object, and not passing them solved the error I was receiving regarding the ds parameters not being recognized by torch.nn.Parameter.__new__(). With my admittedly limited knowledge, it seemed to me that the kwargs are not necessary to pass in the case of using Deepspeed+ Accelerate, and this bears out since the model loaded fine with zero-3 cpu parameter and buffer offload on a single-GPU machine, and performed perfectly comprehensible inference outputs (slowly) using the GPU. The error, in my case, was occurring here as called from accelerator's dispatch_model(). Please let me know if my thinking on this is in anyway wrong! This fix worked for me. `transformers` version: 4.26.0 - Platform: Linux-5.15.83.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 - Python version: 3.10.6 - Huggingface_hub version: 0.11.1 - PyTorch version (GPU?): 1.13.1+cu117 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: Yes - Using distributed or parallel set-up in script?: Yes and no (zero-3 on single machine)
The documentation is not available anymore as the PR was closed or merged. |
Thanks for your PR! @younesbelkada the kwargs added to support bnb and 8-bit load seem to clash with DeepSpeed, so we should probably change the test to only pass them along when the |
Thanks for the heads up! set_module_tensor_to_device(module, name, self.execution_device) is called recursively, hence the condition In any case I agree with @sgugger and we should add a safety checker to be on the safe side |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. @dhar174 can you edit a bit the test then? Should still fix since it's a stronger one.
src/accelerate/utils/modeling.py
Outdated
@@ -143,7 +143,12 @@ def set_module_tensor_to_device( | |||
elif value is not None or torch.device(device) != module._parameters[tensor_name].device: | |||
param_cls = type(module._parameters[tensor_name]) | |||
kwargs = module._parameters[tensor_name].__dict__ | |||
new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device) | |||
if (kwargs.get("ds_tensor") is not None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's make this check a bit stronger, like the param_cls.__name__
is bnb8 parameter class or not, since we only want to pass the kwargs for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the bnb parameter name is Int8Params
;) !
146-150 check for Int8 arguments. If found, send the args as well as the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple change like this look good?
src/accelerate/utils/modeling.py
Outdated
@@ -143,7 +143,12 @@ def set_module_tensor_to_device( | |||
elif value is not None or torch.device(device) != module._parameters[tensor_name].device: | |||
param_cls = type(module._parameters[tensor_name]) | |||
kwargs = module._parameters[tensor_name].__dict__ | |||
new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device) | |||
if (param_cls.__name__ == "Int8Params"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to test param_cls.name for Int8 parameters ("Int8Params") to only pass kwargs if true, else do not send kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! Can you just run make style
on your branch to fix the formatting?
src/accelerate/utils/modeling.py
Outdated
@@ -143,7 +143,12 @@ def set_module_tensor_to_device( | |||
elif value is not None or torch.device(device) != module._parameters[tensor_name].device: | |||
param_cls = type(module._parameters[tensor_name]) | |||
kwargs = module._parameters[tensor_name].__dict__ | |||
new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device) | |||
if (param_cls.__name__ == "Int8Params"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks!
Apologies for that. I've run make style, and it changed 5 files. However, it still seems to fail the quality test. Did I do something wrong when using make style? |
@dhar174 when you installed black and flake8 did you use |
Ah yes, that must be the issue. |
On line 146, in set_module_tensor_to_device(), adding a check for deepspeed parameters in the kwargs object, and not passing them solved the error I was receiving regarding the ds parameters not being recognized by torch.nn.Parameter.new(). With my admittedly limited knowledge, it seemed to me that the kwargs are not necessary to pass in the case of using Deepspeed+ Accelerate, and this bears out since the model loaded fine with zero-3 cpu parameter and buffer offload on a single-GPU machine, and performed perfectly comprehensible inference outputs (slowly) using the GPU.
The error, in my case, was occurring here as called from accelerator's dispatch_model().
Please let me know if my thinking on this is in anyway wrong! This fix worked for me.
transformers
version: 4.26.0