-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical inaccuracy in unpad_image (LlavaOnevison) #33531
Comments
Hey @dom-dziela ! Thanks for reporting the issue. Yes, we already had to introduce some changes to account for that, because padding/patching in the preprocessing stage uses integers and that cause similar errors. For ex here we enforce list format before getting the best resolution. I think we should do the same conversion in transformers/src/transformers/models/llava_onevision/modeling_llava_onevision.py Lines 63 to 69 in ce62a41
[ I will later have to figure out a better way especially to satusfy |
Are you explicitly casting inputs = processor(text=prompt, images=raw_image, return_tensors='pt').to(0, torch.bfloat16)
inputs['image_sizes'], inputs['image_sizes'].dtype
>>> (tensor([[2136, 3212]], device='cuda:0'), torch.int64) original_size = torch.tensor([2136,3212], device = "cuda:0", dtype = torch.bfloat16)
original_height, original_width = original_size
current_height, current_width = 108, 162
original_width / original_height
>>> tensor(1.5000, device='cuda:0', dtype=torch.bfloat16) original_size = torch.tensor([2136,3212], device = "cuda:0")
original_height, original_width = original_size
current_height, current_width = 108, 162
original_width / original_height
>>> tensor(1.5037, device='cuda:0') |
Thanks for the replies. (original_height,original_height.dtype), (original_width, original_width.dtype)
>>> ((tensor(2136, device='cuda:0'), torch.int64), (tensor(3212, device='cuda:0'), torch.int64))
original_aspect_ratio = original_width / original_height
original_aspect_ratio
>>> tensor(1.5000, device='cuda:0') I would guess this in then more likely a version thing? However the solution |
@hlky Thanks for pointing to the issue! @dom-dziela Am I correct that the suggested solution fixes the bug but only with certain hardware/torch-versions? I am okay with adding the |
I would assume that your suggested solution would fix that particular bug in any hardware/torch-version combination, since it detatches the calculation from torch and only uses base python. |
System Info
System Info:
transformers
version: 4.45.0.dev0Who can help?
@amyeroberts, @qubvel
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
In unpad_image we found a numerical inaccuracy, if
original_aspect_ratio==current_aspect_ratio
. Which occurs in DocVQA on training sample 32673. See for example the snippet below:Testing showed, if orignal_height and original_width are integers, that this inaccuracy does not occur.
In die docstring the unpad function asks to be original_size to be a tuple (no type annotation tho), however it will always get a torch.tensor.
Expected behavior
The new_width value shoud be 162. You can see that, if you write down the formula for the aspect ratios, equal them, and multiply by current_height, then you have original_width*scaling_factor=current_width(=new_width).
PS My first issue ever, have patience please.
The text was updated successfully, but these errors were encountered: