Paligemma- fix devices and dtype assignments #31008

molbap · 2024-05-24T09:41:02Z

What does this PR do?

Moves tensors to correct devices in case of multi-gpu training on accelerate and device_map = auto.
Additionally ensures bf16 training works as well.

Fixes #30997

HuggingFaceDocBuilderDev · 2024-05-24T10:01:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

pcuenca

LGTM, thanks for the quick fix!

molbap · 2024-05-24T11:34:16Z

cc @ArthurZucker wdyt?

ArthurZucker

thanks, will ping offline our accelerate experts I want to understand a bit better what's going on + why our tests did not catch this!

src/transformers/models/paligemma/modeling_paligemma.py

ArthurZucker · 2024-05-24T11:46:56Z

src/transformers/models/paligemma/modeling_paligemma.py

                causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
-                    token_type_ids[:, None, None, :] == 0, 0
+                    token_type_ids[:, None, None, :].to(causal_mask.device) == 0, 0
                )


this one does not make sense to me

with the masked_fill, you need both tensors to be on the same device, right?

Ah, sorry, I read to fast.
So token type ids's device is not correctly inferred ? Or are we not creating the causal mask on the correct device? It should be created on the input or attention mask's device for consistency, since when it's used accelerate will transfer it accordingly I think.

you're right! will update to move to device at creation time

hmm, I think we are setting the causal_mask to the correct device. It's the token_type_id device that is indeed not correctly inferred

But from the comment of @SunMarc I would suspect both devices to be the same no

hmm, doesn't seem like it, tried on a multi-gpu env with device_map to auto and got

@SunMarc if you have an idea here - token_type_ids is created by the processor along with input_ids and passed to the forward normally

So, from the code and the image you shared, I see that token_type_ids is indeed on the same device as input_ids. However, since you created the causal_mask to be on the same device as inputs_embeds.device, token_type_ids and input_ids might not be on the same device.

causal_mask = torch.full( (sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=device )

where
dtype, device = inputs_embeds.dtype, inputs_embeds.device

alright, thanks! in that case, we can keep it as it is? The other way is to create the causal mask on the input_ids.device, I'm not sure if one is better than the other - inputs_embeds is much larger in general

src/transformers/models/paligemma/modeling_paligemma.py

grahamannett · 2024-05-24T15:29:15Z

fwiw most of the lines in here are nearly identical to the changes i have done locally as well besides the final_embedding related one which i believe can be done with only 1 cast but didnt think too deeply about it

molbap · 2024-05-24T15:37:36Z

@grahamannett , good to know. For final_embedding it's also to fix the bf16 dtype mismatch.

* fix devices and dtype assignments * [run-slow]paligemma

fix devices and dtype assignments

7f6052c

molbap added the run-slow label May 24, 2024

[run-slow]paligemma

9f510e2

pcuenca approved these changes May 24, 2024

View reviewed changes

ArthurZucker reviewed May 24, 2024

View reviewed changes

ArthurZucker approved these changes May 24, 2024

View reviewed changes

molbap merged commit bdb9106 into main May 24, 2024
22 checks passed

molbap deleted the paligemma_fix_bf16_multigpu branch May 24, 2024 17:02

ArthurZucker pushed a commit that referenced this pull request May 30, 2024

Paligemma- fix devices and dtype assignments (#31008)

9ccdc84

* fix devices and dtype assignments * [run-slow]paligemma

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paligemma- fix devices and dtype assignments #31008

Paligemma- fix devices and dtype assignments #31008

molbap commented May 24, 2024

HuggingFaceDocBuilderDev commented May 24, 2024

pcuenca left a comment

molbap commented May 24, 2024

ArthurZucker left a comment

ArthurZucker May 24, 2024

molbap May 24, 2024

ArthurZucker May 24, 2024

molbap May 24, 2024

molbap May 24, 2024

ArthurZucker May 24, 2024

molbap May 24, 2024

molbap May 24, 2024

SunMarc May 24, 2024 •

edited

Loading

molbap May 24, 2024

grahamannett commented May 24, 2024

molbap commented May 24, 2024

Paligemma- fix devices and dtype assignments #31008

Paligemma- fix devices and dtype assignments #31008

Conversation

molbap commented May 24, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented May 24, 2024

pcuenca left a comment

Choose a reason for hiding this comment

molbap commented May 24, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc May 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grahamannett commented May 24, 2024

molbap commented May 24, 2024

SunMarc May 24, 2024 •

edited

Loading