Remove check of device consistency for `balanced_low_0`. #2591

xkszltl · 2024-03-27T08:20:45Z

Seems balanced_low_0 can leave GPU 0 empty and breaks this check. According to the discussion this check may be outdated.

Resolve #2429

Seems `balanced_low_0` can leave GPU 0 empty and breaks this check. According to the discussion this check may be outdated. Resolve huggingface#2429

xkszltl · 2024-03-27T08:21:19Z

CC @SunMarc @younesbelkada

HuggingFaceDocBuilderDev · 2024-03-27T08:47:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc · 2024-04-02T10:07:09Z

Hi @xkszltl, thanks for the PR. Could you run the bitsandbytes tests to see if this PR do not break any tests ? Otherwise, I'll do that after we merge a pretty important PR to fix the quantization CI !

xkszltl · 2024-04-02T10:36:42Z

Feel free to sequence it after your work.
I'm busy with something else as well and local resources are all occupied.

BTW there's no CI coverage?

xkszltl · 2024-04-02T10:38:04Z

This won't break anything as long as it compiles, because it's just removing an assertion.

younesbelkada

Hmmm not sure we should remove this check as it is important to keep it to educate users about how to train quantized models using multiple GPUs. Is there a way to make sure we are under balanced_low_0 regime and relax the constraint only for that case?

xkszltl · 2024-04-05T11:06:49Z

I'm not familiar with the details here but design wise that's a bad choice due to additional coupling. Mem affinity can be versatile and low 0 is only 1 of the profile to generate a counter example. I believe in NPP the entire GPU array is treated as a group so it shouldn't matter which is the model device.

xkszltl · 2024-04-19T21:51:37Z

Any other concerns?

github-actions · 2024-05-14T15:07:15Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

SunMarc · 2024-05-14T15:23:14Z

Closing this since this issue should be solved by this PR

Remove check of device consistency for balanced_low_0.

8bfaed1

Seems `balanced_low_0` can leave GPU 0 empty and breaks this check. According to the discussion this check may be outdated. Resolve huggingface#2429

younesbelkada reviewed Apr 5, 2024

View reviewed changes

SunMarc closed this May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove check of device consistency for `balanced_low_0`. #2591

Remove check of device consistency for `balanced_low_0`. #2591

xkszltl commented Mar 27, 2024

xkszltl commented Mar 27, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 27, 2024

SunMarc commented Apr 2, 2024

xkszltl commented Apr 2, 2024

xkszltl commented Apr 2, 2024 •

edited

Loading

younesbelkada left a comment

xkszltl commented Apr 5, 2024

xkszltl commented Apr 19, 2024

github-actions bot commented May 14, 2024

SunMarc commented May 14, 2024 •

edited

Loading

Remove check of device consistency for balanced_low_0. #2591

Remove check of device consistency for balanced_low_0. #2591

Conversation

xkszltl commented Mar 27, 2024

xkszltl commented Mar 27, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Mar 27, 2024

SunMarc commented Apr 2, 2024

xkszltl commented Apr 2, 2024

xkszltl commented Apr 2, 2024 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

xkszltl commented Apr 5, 2024

xkszltl commented Apr 19, 2024

github-actions bot commented May 14, 2024

SunMarc commented May 14, 2024 • edited Loading

Remove check of device consistency for `balanced_low_0`. #2591

Remove check of device consistency for `balanced_low_0`. #2591

xkszltl commented Mar 27, 2024 •

edited

Loading

xkszltl commented Apr 2, 2024 •

edited

Loading

SunMarc commented May 14, 2024 •

edited

Loading