Fix flex_attention in training mode #35605

Cyrilvallez · 2025-01-10T10:01:56Z

What does this PR do?

As per the title.
In training mode, the in-place operation would produce RuntimeError: FakeTensor is wrapped to wrong device, found cpu, expected cuda:1. It is however working when gradients are not required (i.e. with generate), which is the reason why it wasn't detected before.
I am not entirely sure what's happening behind the scene and the root of the issue, but switching to explicit operation fixes the issue.

ArthurZucker

Let's add a test!

HuggingFaceDocBuilderDev · 2025-01-10T10:29:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks

* fix flex * add test * style

bursteratom · 2025-01-10T14:18:30Z

thank you so much @ArthurZucker @Cyrilvallez ! I will test it out later today!

fix flex

6e45c3c

Cyrilvallez requested review from SunMarc, MekkCyber and muellerzr as code owners January 10, 2025 10:01

ArthurZucker approved these changes Jan 10, 2025

View reviewed changes

MekkCyber approved these changes Jan 10, 2025

View reviewed changes

add test

0959bd8

Cyrilvallez requested a review from Rocketknight1 as a code owner January 10, 2025 10:30

ArthurZucker reviewed Jan 10, 2025

View reviewed changes

style

d072a8b

Cyrilvallez merged commit ccc0381 into main Jan 10, 2025
26 checks passed

Cyrilvallez deleted the fix-flex branch January 10, 2025 10:49

ArthurZucker pushed a commit that referenced this pull request Jan 10, 2025

Fix flex_attention in training mode (#35605)

59e28c3

* fix flex * add test * style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flex_attention in training mode #35605

Fix flex_attention in training mode #35605

Cyrilvallez commented Jan 10, 2025 •

edited

Loading

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Jan 10, 2025

ArthurZucker left a comment

bursteratom commented Jan 10, 2025

Fix flex_attention in training mode #35605

Fix flex_attention in training mode #35605

Conversation

Cyrilvallez commented Jan 10, 2025 • edited Loading

What does this PR do?

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 10, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

bursteratom commented Jan 10, 2025

Cyrilvallez commented Jan 10, 2025 •

edited

Loading