-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove check of device consistency for balanced_low_0
.
#2591
Conversation
Seems `balanced_low_0` can leave GPU 0 empty and breaks this check. According to the discussion this check may be outdated. Resolve huggingface#2429
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @xkszltl, thanks for the PR. Could you run the bitsandbytes tests to see if this PR do not break any tests ? Otherwise, I'll do that after we merge a pretty important PR to fix the quantization CI ! |
Feel free to sequence it after your work. BTW there's no CI coverage? |
This won't break anything as long as it compiles, because it's just removing an assertion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm not sure we should remove this check as it is important to keep it to educate users about how to train quantized models using multiple GPUs. Is there a way to make sure we are under balanced_low_0
regime and relax the constraint only for that case?
I'm not familiar with the details here but design wise that's a bad choice due to additional coupling. Mem affinity can be versatile and low 0 is only 1 of the profile to generate a counter example. I believe in NPP the entire GPU array is treated as a group so it shouldn't matter which is the model device. |
Any other concerns? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Closing this since this issue should be solved by this PR |
Seems
balanced_low_0
can leave GPU 0 empty and breaks this check. According to the discussion this check may be outdated.Resolve #2429