Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Mega chunking error when using decoder-only model #25765

Merged
merged 3 commits into from
Sep 5, 2023

Conversation

tanaymeh
Copy link
Contributor

@tanaymeh tanaymeh commented Aug 25, 2023

What does this PR do?

This PR aims to fix the error caused by MegaModel when the is_decoder setting is used in conjunction with use_chunking and chunk_size settings.

The error is described in detail here.

Fixes #23331

Who can review?

@ArthurZucker

@tanaymeh tanaymeh changed the title Fix Mega chunking error when using decoder-only model. Fix Mega chunking error when using decoder-only model Aug 26, 2023
@ArthurZucker
Copy link
Collaborator

Hey! Feel free to ping me when the PR is ready for review 😉

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@tanaymeh tanaymeh marked this pull request as ready for review August 28, 2023 10:10
@tanaymeh
Copy link
Contributor Author

@ArthurZucker You may review it now!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Can you add a test in the testing suite? This way we can make sure the chunking mode is properly tested!

@tanaymeh
Copy link
Contributor Author

tanaymeh commented Aug 28, 2023

Will do. Two questions regarding this:

  1. For the test of chunking, I should only check the expected sizes of output or something else?
  2. Conceptually, do you see any troubles with this solution in its current state? I am a little suspicious because of how simple the solution was 😅

@ArthurZucker
Copy link
Collaborator

  1. Yes you could check expected sizes on a smaller model (using the small configs)
  2. I don't really see a problem since the attention mask created use the newly defined sequence length, and the code was already very clean so probably juste a typo!

@tanaymeh
Copy link
Contributor Author

tanaymeh commented Sep 4, 2023

Thanks for confirming @ArthurZucker. I have added a test that checks here if the attentions returned in the CausalLMOutputWithCrossAttentions have their last dimension (shape[-1]), which is supposed to be sequence_length is equal to the chunk_size or not.

If the checks I added in modeling_mega.py are correct, it will use chunk_size instead of the actual sequence_length.

Is this correct, or shall I make any changes?

Update: The tests are failing because of an error in Wav2Vec2 model, here: test_modeling_wav2vec2.py::Wav2Vec2RobustModelTest::test_model_for_pretraining

A Github Pull should fix it.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! Thanks for the long due fix 😉

tests/models/mega/test_modeling_mega.py Outdated Show resolved Hide resolved
tests/models/mega/test_modeling_mega.py Outdated Show resolved Hide resolved
@tanaymeh
Copy link
Contributor Author

tanaymeh commented Sep 5, 2023

Added your suggested changes @ArthurZucker! With input_mask, the mega tests now pass.

@ArthurZucker
Copy link
Collaborator

Perfect! If you can just rebase on main to make sure the CIs are green?

@tanaymeh tanaymeh force-pushed the fix_mega_chunking_errors branch from ae262e5 to aa2ff48 Compare September 5, 2023 18:49
@tanaymeh
Copy link
Contributor Author

tanaymeh commented Sep 5, 2023

Perfect! If you can just rebase on main to make sure the CIs are green?

Done @ArthurZucker!

@ArthurZucker ArthurZucker merged commit b8def68 into huggingface:main Sep 5, 2023
@ArthurZucker
Copy link
Collaborator

Congrats on the PR 🚀 thanks for fixing!

@tanaymeh
Copy link
Contributor Author

tanaymeh commented Sep 5, 2023

Thanks a lot for helping @ArthurZucker!

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
)

* add: potential fix to mega chunking in decoder only model bug

* add: decoder with chunking test

* add: input_mask passed with input_ids
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeError: The size of tensor a (16) must match the size of tensor b (16000) at non-singleton dimension 2
3 participants