-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing a single example for BLOOM 176-B affects forward pass for other examples in a batch #18809
Comments
Hey! It's a bit hard to run a testing env with bloom, can you share a reproductible script with a smaller model? This looks like some instabilities from torch.bfloat16, and I'm willing to bet that those values come from there (both 3.28 occurences are exactly the same, so seems like a rounding error to me, we can perhaps check that those values are consecutive values in bfloat16, ie there's no value between 3.28 and 3.29). What I think might be happening is you're adding Also if you can run on |
Thanks @thomasw21 for taking a look at this. I will try to reproduce this with a smaller model (say GPT-2) and get back on this. I will also try main branch. |
Also, since there are no batch-norm ops in BLOOM. I don't really understand why this should happen. Also, since the pads have been given an attention mask = 0. Shouldn't the output be the same? |
hi @mayank31398 !
and getting
I suspect that logits may be flaky when using half-precision models, therefore I second what @thomasw21 |
Hey, first of all: sorry for late reply. |
Okay I think gpt2 test isn't instability. Essentially it's absolute positional embeddings that's screwing with you as you move things to the right and adding padding to the left as you increase the label size, which is why you see big shifts in the loss. I do think that the bloom test is instability. Typically
So as you said you can try computing the logits in fp32, which will increase precision (but will be slower). There's a bit of a workaround as you need to cast the embedding layers to fp32 and such. |
Everything makes sense in your explanation @thomasw21 ! Missed the absolute positional embedding part. Thanks for explaining it 💪 |
I guess this is not a fixable problem then right? I think we can close this? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers
version: 4.21.2Who can help?
@thomasw21, @younesbelkada This issue if for unexpected BLOOM outputs.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I wrote this script to do get the conditional NLL for the labels given the context.
Tried different batches with only the first example changing and rest of the examples fixed in the batch. However, after a certain point, the changing of first examples, affects the NLL for other examples.
This is not supposed to happen.
Value drops from 3.29 to 3.28 in column 2 when only example for column 0 is changed. Even column 3 changes in last case.
Only column 0 is supposed to change here.
Expected behavior
The text was updated successfully, but these errors were encountered: