-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Decoding of LMs will cause different outputs with different batch size #25921
Comments
I can confirm that it's not due to left padding, since even with same-length inputs in the batch, the same issue persists:
The output I got is:
|
In my environment, even the same examples in a single batch sometimes give different outputs for bfloat models. I'm not totally sure yet, but I suspect the issue is that the precision conversion is non-deterministic, see RMSNorm. When a bfloat16 number is converted to fp32 format, the fraction part of the converted fp32 number might not be the same. Same for the softmax operation. There might be other places where the precision conversion happens. FYI, this might also be related to #25420 |
Hi @wenhuchen @da03 @csarron 👋 Thank you for raising this issue. We are aware of this phenomenon on all (or nearly all) models that contain rotary position embeddings (Llama, Llama2, Falcon, GPTNeoX, ...). Running things in We have to dive deep to find the root cause, but our bandwidth is limited and we can't provide a time estimate. I'll keep this issue open -- however, if there are volunteers to explore the issue, let me know! |
@xiangyue9607, please take a look at this. |
@gante, thanks for letting us know. We are using fp32 at this point. But we notice that fp32 normally leads to compromised results than bf16. Anyway, looking forward to your PR to fix this issue. |
Any update on this issue? My T5 model produces different outputs (with greedy decoding) for the same prompt depending on batch size, even if I create a batch by copying the same prompt. It occurs even on cpu with float32 but is more common on cuda with bfloat16. A self-contained example is below. Seeding and making torch use deterministic algorithms does not help, but I'm adding it here for completeness. # make torch deterministic
import os
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":16:8"
import torch
torch.use_deterministic_algorithms(True)
import random
import transformers
import numpy as np
# seed everything
torch.manual_seed(0)
np.random.seed(0)
random.seed(0)
transformers.set_seed(0)
model_id = "MU-NLPC/calcformer-instruct-flan-xl_step-128k"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, use_fast=False)
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(model_id).to("cuda").to(torch.bfloat16).eval()
question = 'In order to help the victims of the earthquake in Sichuan, the factory rushed to make a batch of disaster relief tents. The first workshop completed (1/5) of this batch of tents, the second workshop completed (1/4) of this batch of tents, and the remaining batch of tents What percentage of it is not completed?'
inputs = tokenizer([question], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
decoded_bs1 = tokenizer.decode(outputs[0], skip_special_tokens=True, spaces_between_special_tokens=False)
inputs = tokenizer([question] * 4, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
decoded_bs4 = tokenizer.decode(outputs[0], skip_special_tokens=True, spaces_between_special_tokens=False)
decoded_bs1 == decoded_bs4 == My environment:
|
Hi @prompteus 👋 Have a look at this comment -- #25420 (comment) |
System Info
Transformers=4.31
Torch=2.01
Cuda=11.8
Python=3.10
A100 GPU 80GB
Who can help?
@ArthurZucker , @younesbelkada , @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Running the following examples will produce different outputs for the first input.
Expected behavior
The produced outputs are supposed to be the same and should not be affected by the batch size.
The text was updated successfully, but these errors were encountered: