-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update modeling_mamba2.py, fix pad size #32599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @klae01 - thanks for opening this PR!
Could you expand on what this PR is addressing a bit more - ideally linking to a related github issue or providing a code snippet which reproduces the issue?
In the previous definition
pad_size = self.chunk_size - (seq_len % self.chunk_size)
pad_size will always be 0 <= pad_size <= chunk_size.
I'm guessing the change is really so the padding is 0 if the sequence can be divided perfectly by chunk_size?
@amyeroberts Original author who created this in mamba2 (for transformers). Yup, it's to reduce the padding size to 0 if we have It was originally meant to pad LGTM cc @molbap for Mamba2 |
We should wait for rebase until #32694 is merged or something similar. There's currently a bug in cached generation with input_embeds. |
Hi @vasqu, @amyeroberts, Thank you for the feedback and guidance. I noticed that the force push I made may have been a bit premature, especially considering the advice to wait until PR #32694 (or a similar PR) is merged to avoid potential issues with cached generation. To align with the suggestion, I’ll hold off on any further rebasing or force pushes until PR #32694 is merged. Once that’s done, I’ll rebase my branch on the updated Please let me know if there’s anything else I should do in the meantime. I appreciate your time and help with reviewing this PR! Thanks! |
@klae01 Thanks for bearing with me. You can only wait for now I guess but I'll ping you again when the other PR is merged and then this should be a quick rebase and go 👀 |
@klae01 The patch has been merged 😄 thanks for waiting! |
@amyeroberts I've rebased my branch now that PR #32694 is merged and force-pushed the updates. Everything should be up-to-date. Let me know if anything else is needed! |
@klae01 Could you push an empty commit with the message cc @amyeroberts for slow run |
Ah and maybe rebase before, we removed a flag in #32686 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me - thanks!
Just looking into how we can enable the tests cc @ydshieh
Hi, I am lacking a bit of the context. Do you still wait my inputs ..? If so could you elaborate a bit more the situation to resolve? |
@ydshieh Apologies for not providing context! This was one of the related issues with tests failing because of access to a gated model. I'll trigger a re-run of the slow tests, as I think this was resolved (by you - thank you!) yesterday |
OK! I also sent you a Slack DM regarding CI bot :-) |
Any updates on this PR? gentle ping @amyeroberts |
@vasqu Apologies for the delay. The runner should now have access to run the mistral tests. I re-triggered the tests, but it seems they still failed, I'm guessing because retriggering uses the same settings i.e. tokens. @vasqu Could you push another |
No worries, I'm not the author so can't commit. @klae01 Could you push an empty commit again, i.e. |
Hello, @amyeroberts I've added the Thank you! |
Hello @amyeroberts, I've added the
Could you please advise on how to resolve this issue? Thank you! |
@klae01 Could you add the |
@klae01 It's a bit annoying, but anything after Could you push an empty commit with |
Fix pad_size calculation to ensure it's less than self.chunk_size
Hello, @amyeroberts. I’ve just pushed a commit that rebases my branch with the latest changes from the main branch. I expect that these updates should properly resolve the issue with mamba2. By the way, could you let me know how long the run-slow tests typically take to execute? Also, where can I find documentation on commit conventions (or guidelines) for transformers? |
Hi @klae01 We don't have a public doc for that yet. We are trying to add a comment like below Before merging this pull request, slow tests CI should be triggered. To enable this:
|
Let's ignore the |
I can change to make it less annoying, like |
@ydshieh I think it's hard to find a good solution tbh -- most of the time the commit to trigger the tests are empty as they're requested at the end of PR review, so just @klae01 On top of what @ydshieh has said above, regarding your question:
You can see in the last single-gpu run for mamba2 that running the tests themselves took about 26s, but initializing the container for testing took 4m 27s. |
All passing. Thanks for your patience and the fix @klae01! |
* Update modeling_mamba2.py Fix pad_size calculation to ensure it's less than self.chunk_size * [run_slow] mamba2 * [run-slow] mamba2 * [run-slow] Add @require_read_token decorator to failing tests for token propagation * [run_slow] mamba2
What does this PR do?
The update involves a change in how
pad_size
is calculated. Originally,pad_size
was computed asself.chunk_size - (seq_len % self.chunk_size)
. The new formula ensures thatpad_size
will be the remainder of the subtractionself.chunk_size - seq_len % self.chunk_size
and then a modulus withself.chunk_size
. This change is intended to fix an issue where unnecessary padding was added whenseq_len % self.chunk_size == 0
. This behavior optimizes the padding process and reduces computational overhead.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.