Fix Mega chunking error when using decoder-only model #25765

tanaymeh · 2023-08-25T17:48:04Z

What does this PR do?

This PR aims to fix the error caused by MegaModel when the is_decoder setting is used in conjunction with use_chunking and chunk_size settings.

The error is described in detail here.

Fixes #23331

Who can review?

@ArthurZucker

ArthurZucker · 2023-08-28T07:33:57Z

Hey! Feel free to ping me when the PR is ready for review 😉

HuggingFaceDocBuilderDev · 2023-08-28T07:53:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

tanaymeh · 2023-08-28T10:10:43Z

@ArthurZucker You may review it now!

ArthurZucker

Nice! Can you add a test in the testing suite? This way we can make sure the chunking mode is properly tested!

tanaymeh · 2023-08-28T14:48:47Z

Will do. Two questions regarding this:

For the test of chunking, I should only check the expected sizes of output or something else?
Conceptually, do you see any troubles with this solution in its current state? I am a little suspicious because of how simple the solution was 😅

ArthurZucker · 2023-08-28T15:14:15Z

Yes you could check expected sizes on a smaller model (using the small configs)
I don't really see a problem since the attention mask created use the newly defined sequence length, and the code was already very clean so probably juste a typo!

tanaymeh · 2023-09-04T07:51:03Z

Thanks for confirming @ArthurZucker. I have added a test that checks here if the attentions returned in the CausalLMOutputWithCrossAttentions have their last dimension (shape[-1]), which is supposed to be sequence_length is equal to the chunk_size or not.

If the checks I added in modeling_mega.py are correct, it will use chunk_size instead of the actual sequence_length.

Is this correct, or shall I make any changes?

Update: The tests are failing because of an error in Wav2Vec2 model, here: test_modeling_wav2vec2.py::Wav2Vec2RobustModelTest::test_model_for_pretraining

A Github Pull should fix it.

ArthurZucker

This is nice! Thanks for the long due fix 😉

tests/models/mega/test_modeling_mega.py

tanaymeh · 2023-09-05T17:53:56Z

Added your suggested changes @ArthurZucker! With input_mask, the mega tests now pass.

ArthurZucker · 2023-09-05T18:23:24Z

Perfect! If you can just rebase on main to make sure the CIs are green?

tanaymeh · 2023-09-05T19:19:23Z

Perfect! If you can just rebase on main to make sure the CIs are green?

Done @ArthurZucker!

ArthurZucker · 2023-09-05T19:51:30Z

Congrats on the PR 🚀 thanks for fixing!

tanaymeh · 2023-09-05T20:04:41Z

Thanks a lot for helping @ArthurZucker!

) * add: potential fix to mega chunking in decoder only model bug * add: decoder with chunking test * add: input_mask passed with input_ids

tanaymeh changed the title ~~Fix Mega chunking error when using decoder-only model.~~ Fix Mega chunking error when using decoder-only model Aug 26, 2023

tanaymeh marked this pull request as ready for review August 28, 2023 10:10

ArthurZucker reviewed Aug 28, 2023

View reviewed changes

ArthurZucker approved these changes Sep 4, 2023

View reviewed changes

tests/models/mega/test_modeling_mega.py Outdated Show resolved Hide resolved

tests/models/mega/test_modeling_mega.py Outdated Show resolved Hide resolved

tanaymeh added 3 commits September 6, 2023 00:16

add: potential fix to mega chunking in decoder only model bug

c4812a0

add: decoder with chunking test

6454539

add: input_mask passed with input_ids

aa2ff48

tanaymeh force-pushed the fix_mega_chunking_errors branch from ae262e5 to aa2ff48 Compare September 5, 2023 18:49

ArthurZucker merged commit b8def68 into huggingface:main Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Mega chunking error when using decoder-only model #25765

Fix Mega chunking error when using decoder-only model #25765

tanaymeh commented Aug 25, 2023 •

edited

Loading

ArthurZucker commented Aug 28, 2023

HuggingFaceDocBuilderDev commented Aug 28, 2023

tanaymeh commented Aug 28, 2023

ArthurZucker left a comment

tanaymeh commented Aug 28, 2023 •

edited

Loading

ArthurZucker commented Aug 28, 2023

tanaymeh commented Sep 4, 2023 •

edited

Loading

ArthurZucker left a comment

tanaymeh commented Sep 5, 2023 •

edited

Loading

ArthurZucker commented Sep 5, 2023

tanaymeh commented Sep 5, 2023

ArthurZucker commented Sep 5, 2023

tanaymeh commented Sep 5, 2023

Fix Mega chunking error when using decoder-only model #25765

Fix Mega chunking error when using decoder-only model #25765

Conversation

tanaymeh commented Aug 25, 2023 • edited Loading

What does this PR do?

Who can review?

ArthurZucker commented Aug 28, 2023

HuggingFaceDocBuilderDev commented Aug 28, 2023

tanaymeh commented Aug 28, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

tanaymeh commented Aug 28, 2023 • edited Loading

ArthurZucker commented Aug 28, 2023

tanaymeh commented Sep 4, 2023 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

tanaymeh commented Sep 5, 2023 • edited Loading

ArthurZucker commented Sep 5, 2023

tanaymeh commented Sep 5, 2023

ArthurZucker commented Sep 5, 2023

tanaymeh commented Sep 5, 2023

tanaymeh commented Aug 25, 2023 •

edited

Loading

tanaymeh commented Aug 28, 2023 •

edited

Loading

tanaymeh commented Sep 4, 2023 •

edited

Loading

tanaymeh commented Sep 5, 2023 •

edited

Loading