Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update modeling_mamba2.py, fix pad size #32599

Merged
merged 5 commits into from
Sep 20, 2024
Merged

Conversation

klae01
Copy link
Contributor

@klae01 klae01 commented Aug 11, 2024

What does this PR do?

The update involves a change in how pad_size is calculated. Originally, pad_size was computed as self.chunk_size - (seq_len % self.chunk_size). The new formula ensures that pad_size will be the remainder of the subtraction self.chunk_size - seq_len % self.chunk_size and then a modulus with self.chunk_size. This change is intended to fix an issue where unnecessary padding was added when seq_len % self.chunk_size == 0. This behavior optimizes the padding process and reduces computational overhead.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @klae01 - thanks for opening this PR!

Could you expand on what this PR is addressing a bit more - ideally linking to a related github issue or providing a code snippet which reproduces the issue?

In the previous definition

pad_size = self.chunk_size - (seq_len % self.chunk_size)

pad_size will always be 0 <= pad_size <= chunk_size.

I'm guessing the change is really so the padding is 0 if the sequence can be divided perfectly by chunk_size?

@vasqu
Copy link
Contributor

vasqu commented Aug 14, 2024

@amyeroberts Original author who created this in mamba2 (for transformers). Yup, it's to reduce the padding size to 0 if we have seq_len % chunk_size == 0. The way it is now, we have an unnecessary padding of size chunk_size so fixing this definitely saves compute 😄.

It was originally meant to pad seq_len to a multiple of chunk_size (due to the algorithm working in a fixed chunk len fashion). So pad_size is the remaining length to get to a multiple of it.

LGTM cc @molbap for Mamba2

@amyeroberts
Copy link
Collaborator

@vasqu OK great, thanks for confirming and clarifying!

@klae01 Could you:

  • Update the PR description to reflect this
  • Rebase on main to include upstream changes? This should resolve the failing tests

@vasqu
Copy link
Contributor

vasqu commented Aug 15, 2024

We should wait for rebase until #32694 is merged or something similar. There's currently a bug in cached generation with input_embeds.

@klae01
Copy link
Contributor Author

klae01 commented Aug 15, 2024

Hi @vasqu, @amyeroberts,

Thank you for the feedback and guidance. I noticed that the force push I made may have been a bit premature, especially considering the advice to wait until PR #32694 (or a similar PR) is merged to avoid potential issues with cached generation.

To align with the suggestion, I’ll hold off on any further rebasing or force pushes until PR #32694 is merged. Once that’s done, I’ll rebase my branch on the updated main and ensure everything is in order.

Please let me know if there’s anything else I should do in the meantime. I appreciate your time and help with reviewing this PR!

Thanks!

@vasqu
Copy link
Contributor

vasqu commented Aug 16, 2024

@klae01 Thanks for bearing with me. You can only wait for now I guess but I'll ping you again when the other PR is merged and then this should be a quick rebase and go 👀

@vasqu
Copy link
Contributor

vasqu commented Aug 19, 2024

@klae01 The patch has been merged 😄 thanks for waiting!

@klae01
Copy link
Contributor Author

klae01 commented Aug 20, 2024

@amyeroberts I've rebased my branch now that PR #32694 is merged and force-pushed the updates. Everything should be up-to-date. Let me know if anything else is needed!

@vasqu
Copy link
Contributor

vasqu commented Aug 23, 2024

@klae01 Could you push an empty commit with the message [run_slow] mamba2. That should trigger the CI.

cc @amyeroberts for slow run

@vasqu
Copy link
Contributor

vasqu commented Aug 23, 2024

Ah and maybe rebase before, we removed a flag in #32686

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me - thanks!

Just looking into how we can enable the tests cc @ydshieh

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 28, 2024

Change looks good to me - thanks!

Just looking into how we can enable the tests cc @ydshieh

Hi, I am lacking a bit of the context. Do you still wait my inputs ..? If so could you elaborate a bit more the situation to resolve?

@amyeroberts
Copy link
Collaborator

@ydshieh Apologies for not providing context! This was one of the related issues with tests failing because of access to a gated model. I'll trigger a re-run of the slow tests, as I think this was resolved (by you - thank you!) yesterday

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 28, 2024

OK! I also sent you a Slack DM regarding CI bot :-)

@vasqu
Copy link
Contributor

vasqu commented Sep 16, 2024

Any updates on this PR?

gentle ping @amyeroberts

@amyeroberts
Copy link
Collaborator

@vasqu Apologies for the delay. The runner should now have access to run the mistral tests. I re-triggered the tests, but it seems they still failed, I'm guessing because retriggering uses the same settings i.e. tokens. @vasqu Could you push another [run-slow] commit to trigger these? IT should all pass and then we can merge

@vasqu
Copy link
Contributor

vasqu commented Sep 17, 2024

No worries, I'm not the author so can't commit.

@klae01 Could you push an empty commit again, i.e. [run-slow] mamba2. This time it should clear the CI 🤞

@klae01
Copy link
Contributor Author

klae01 commented Sep 17, 2024

Hello, @amyeroberts

I've added the [run-slow] mamba2 commit as requested, and I see that all CI tests have passed. Please let me know if there's anything else I should address.

Thank you!

@klae01
Copy link
Contributor Author

klae01 commented Sep 18, 2024

Hello @amyeroberts,

I've added the [run-slow] mamba2 commit to re-trigger the tests, but it seems the CI tests are failing due to permission issues. Checking the logs, the tests are failing because they cannot access the gated repository:

Cannot access gated repo for url https://huggingface.co/mistralai/Mamba-Codestral-7B-v0.1/resolve/refs%2Fpr%2F9/config.json.
Access to model mistralai/Mamba-Codestral-7B-v0.1 is restricted. You must have access to it and be authenticated to access it. Please log in.

Could you please advise on how to resolve this issue? Thank you!

@amyeroberts
Copy link
Collaborator

@klae01 Could you add the @require_read_token decorator on the tests which are failing? This should properly propogate the required access token

@amyeroberts
Copy link
Collaborator

@klae01 It's a bit annoying, but anything after [run-slow] will be interpreted as a model name (so in this case it's looking for the models "add" and "@require_read_token"

Could you push an empty commit with [run_slow] mamba2?

klae01 and others added 2 commits September 20, 2024 10:09
Fix pad_size calculation to ensure it's less than self.chunk_size
@klae01
Copy link
Contributor Author

klae01 commented Sep 20, 2024

Hello, @amyeroberts.

I’ve just pushed a commit that rebases my branch with the latest changes from the main branch. I expect that these updates should properly resolve the issue with mamba2.

By the way, could you let me know how long the run-slow tests typically take to execute? Also, where can I find documentation on commit conventions (or guidelines) for transformers?

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 20, 2024

Hi @klae01

We don't have a public doc for that yet. We are trying to add a comment like below


Before merging this pull request, slow tests CI should be triggered. To enable this:

  • Add the run-slow label to the PR
  • When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
  • A transformers maintainer will then approve the workflow to start the tests

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 20, 2024

Let's ignore the multi-gpu job (it's an infra issue)

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 20, 2024

@amyeroberts

It's a bit annoying, but anything after [run-slow] will be interpreted as a model name

I can change to make it less annoying, like [run-slow] .... models=[model_1, model_2] .... if that is more flexible

@amyeroberts
Copy link
Collaborator

I can change to make it less annoying, like [run-slow] .... models=[model_1, model_2] .... if that is more flexible

@ydshieh I think it's hard to find a good solution tbh -- most of the time the commit to trigger the tests are empty as they're requested at the end of PR review, so just [run-slow] model_1, model_2, ... etc. is easier. Happy to go for [run-slow] .... models=[model_1, model_2] though if you think it's better as it's more explicit

@klae01 On top of what @ydshieh has said above, regarding your question: By the way, could you let me know how long the run-slow tests typically take to execute? the answer is it depends on a few things:

  • When they're actually triggered. A HF employee has to approve the workflow, so there's going to be time between your commit and when it "starts". It's done this way for security reasons
  • Availability of compute - we sometimes have to wait for runners to become available
  • The tests themselves. Some models have lots of slow tests, or the model being tested is very large, both of which will increase the completion time.

You can see in the last single-gpu run for mamba2 that running the tests themselves took about 26s, but initializing the container for testing took 4m 27s.

@amyeroberts
Copy link
Collaborator

All passing. Thanks for your patience and the fix @klae01!

@amyeroberts amyeroberts merged commit ec1424c into huggingface:main Sep 20, 2024
16 of 17 checks passed
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* Update modeling_mamba2.py

Fix pad_size calculation to ensure it's less than self.chunk_size

* [run_slow] mamba2

* [run-slow] mamba2

* [run-slow] Add @require_read_token decorator to failing tests for token propagation

* [run_slow] mamba2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants