[`CI` / `core`] Fix CI with GC + pytorch 2.2 #29026

younesbelkada · 2024-02-14T23:11:53Z

What does this PR do?

Fixes the current failing CI for Mistral, Mixtral and Qwen2 for gradient checkpointing. For some reason, since pytorch 2.2, gradient checkpointing raises an error when going through in-place operations such as tensor.mul_(xxx) which was not the case in earlier versions.

Simply replacing causal_mask.mul_(~torch.eq(causal_mask, causal_mask.min()).all(dim=-1)[..., None]) by causal_mask = causal_mask * (~torch.eq(causal_mask, causal_mask.min()).all(dim=-1)[..., None])

This makes me think we should maybe have a job that runs the CI on torch nightly to catch these early bugs, do we have that already? If not, happy to have a look

cc @ArthurZucker @amyeroberts

HuggingFaceDocBuilderDev · 2024-02-14T23:35:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada · 2024-02-14T23:49:15Z

It appears the rootcause was slightlly different, see: #29027

fix gc CI for pytroch 2.2

27d9a75

younesbelkada requested review from ArthurZucker and amyeroberts February 14, 2024 23:13

quality

bd3cc51

ArthurZucker mentioned this pull request Feb 14, 2024

[CLeanup] Revert SDPA attention changes that got in the static kv cache PR #29027

Merged

ArthurZucker closed this in #29027 Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`CI` / `core`] Fix CI with GC + pytorch 2.2 #29026

[`CI` / `core`] Fix CI with GC + pytorch 2.2 #29026

younesbelkada commented Feb 14, 2024

HuggingFaceDocBuilderDev commented Feb 14, 2024

younesbelkada commented Feb 14, 2024

[CI / core] Fix CI with GC + pytorch 2.2 #29026

[CI / core] Fix CI with GC + pytorch 2.2 #29026

Conversation

younesbelkada commented Feb 14, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Feb 14, 2024

younesbelkada commented Feb 14, 2024

[`CI` / `core`] Fix CI with GC + pytorch 2.2 #29026

[`CI` / `core`] Fix CI with GC + pytorch 2.2 #29026