-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bf16 is more unstable than fp16, when looking at the difference of generation logprobs and forward logprobs #31267
Comments
I also encounter more instability with bf16 in rope related models like pythia (i.e. the gptneox architecture). This occurred to me when working on SDPA support for the architecture at #31031. My guess it is due to rope which forces upcasting to fp32. Further down the line it is usually downcasted somewhere along there which is more severe for bf16 (as it has less precision) which results into this phenomenon. For example, this upcasting is also circumvented in the flash attention 2 variant by downcasting: transformers/src/transformers/models/gpt_neox/modeling_gpt_neox.py Lines 364 to 369 in 940fde8
Edit: tl;dr: more precision is needed which bf16 can't offer |
Yep, I think you should check if #29285 applied to the model you are using does not solved the issue? 🤗 |
I tried applying the same fix in #29285 naively but it doesn't seem to make a difference ![]() Did some further digging, the logprobs of the forward pass feel length related. For example, if I run the model through the first 20 generated tokens, I get
If I do the forward pass on the first 40 generated tokens, I get
Notice how the logprob changed from |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi, is there any updates? Thanks! |
This is basically fixed in most models (the rope part for example) the rest can be traced back to #25420 (comment) |
System Info
transformers
version: 4.40.1Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python bf16_logprobs.py --fp32
python bf16_logprobs.py --fp16
python bf16_logprobs.py --bf16
Expected behavior
Basically, the ratio should be 1s
The text was updated successfully, but these errors were encountered: