LlavaForConditionalGeneration logit values are batch size dependent. #29327

ShahRutav · 2024-02-27T20:05:34Z

Hi @ArthurZucker,
Thanks for the response. From the comment, the possible reasons in llama models are -

Left-side padding: There is no padding in my script.
KV Caching in lower precision operations.

I tested with float32, my observations are as follows -

No issue (mismatch < 1e-5) if I don't use images in the input and float32 precision. This is equivalent to using only the llama model.
The mismatch error is very high, ~11, if I use images in input (and random for different batch sizes), even in float32.

I am attaching the updated script to reproduce the mismatch,

import torch
import requests
from PIL import Image
import peft
from transformers import BitsAndBytesConfig
from transformers import LlavaForConditionalGeneration, AutoProcessor

test_image = True
dtype = torch.float32
model_id = "llava-hf/llava-1.5-7b-hf"
processor = AutoProcessor.from_pretrained(
    model_id,
    padding_side="left",
    dtype=dtype,
)
if test_image:
    prompt = "<image>\nUSER: What's the content of the image?\nASSISTANT:"
    url = "https://www.ilankelman.org/stopsigns/australia.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    inputs = processor(text=[prompt], images=[image], return_tensors="pt")
else:
    prompt = "USER: What's the content of the image?\nASSISTANT:"
    inputs = processor.tokenizer(prompt, return_tensors="pt")
inputs_2 = {k: torch.cat([v, v], dim=0).clone() for k, v in inputs.items()}

model = LlavaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=dtype,
).eval()
if dtype != torch.float32: # no need to go through this if dtype is already float32
    peft.prepare_model_for_kbit_training(model)

with torch.no_grad():
    print(inputs['input_ids'])
    model_output = model(**inputs).logits
    model_output_2 = model(**inputs_2).logits

model_output_2 = model_output_2[:model_output.shape[0]]
assert model_output.shape == model_output_2.shape, \
    f"Shapes must be same for comparing. {model_output.shape} vs {model_output_2.shape}"
assert torch.allclose(model_output, model_output_2, atol=1e-5), \
    f"Logits are not the same with maximum difference {torch.max(torch.abs(model_output - model_output_2))}.\n \
    Values:\n{model_output}\nvs\n{model_output_2}"

Originally posted by @ShahRutav in #29282 (comment)

ArthurZucker · 2024-02-28T02:41:00Z

Sorry but I cannot reproduce your error:

The logits are close enough.
I used this:

import torch
import requests
from PIL import Image
import peft
from transformers import BitsAndBytesConfig
from transformers import LlavaForConditionalGeneration, AutoProcessor

test_image = True
dtype = torch.float32
model_id = "llava-hf/llava-1.5-7b-hf"
processor = AutoProcessor.from_pretrained(
    model_id,
    padding_side="left",
    dtype=dtype,
)
if test_image:
    prompt = "USER: <image>\nWhat's the content of the image?\nASSISTANT:"
    url = "https://www.ilankelman.org/stopsigns/australia.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    inputs = processor(text=[prompt], images=[image], return_tensors="pt")

inputs_2 =  processor(text=[prompt, prompt], images=[image, image], return_tensors="pt")

model = LlavaForConditionalGeneration.from_pretrained(model_id, torch_dtype=dtype,).eval()

with torch.no_grad():
    print(inputs['input_ids'])
    model_output = model(**inputs).logits
    model_output_2 = model(**inputs_2).logits

model_output_2 = model_output_2[:model_output.shape[0]]
assert model_output.shape == model_output_2.shape, "Shapes must be same for comparing. {} vs {}".format(model_output.shape, model_output_2.shape)
assert torch.allclose(model_output, model_output_2, atol=1e-5), "Logits are not the same with maximum difference {}. Values:\n{} vs\n{}".format(torch.max(torch.abs(model_output - model_output_2)), model_output, model_output_2)

issue might be how you create your inputs.

ArthurZucker · 2024-02-28T02:41:06Z

Also the format you are using is wrong

ShahRutav · 2024-02-28T02:58:29Z

#29327 (comment)
Thanks for experimenting with it.

I copied exactly the test script you attached, and I got the maximum difference in logits ~ 0.00145 (which is quite larger than yours)
This difference does not change by creating the inputs differently.

Unless you use a different transformer version, is this likely due to different system specifications?

ArthurZucker · 2024-02-28T05:21:17Z

I am using the latest release of transformers.

ArthurZucker · 2024-02-28T05:21:43Z

If there was a bug, it was fixe 😉

ShahRutav · 2024-02-28T07:14:19Z

#29327 (comment) Thanks for experimenting with it.

I copied exactly the test script you attached, and I got the maximum difference in logits ~ 0.00145 (which is quite larger than yours)

This difference does not change by creating the inputs differently.

Unless you use a different transformer version, is this likely due to different system specifications?

I am also using the latest transformers from the main branch. I cannot explain the difference between your execution (~1e-5) and mine (~1e-3). I'm unsure if this is relevant, but I use CPU since the model won't fit in my GPU without quantization. I am avoiding quantization of the base model to isolate different sources of error.

ArthurZucker · 2024-02-28T07:49:48Z

1e-3 is already close enough. I don't know which CPU you are using but I think we can agree that the outputs are not batch dependent.

ShahRutav · 2024-02-28T08:23:59Z

sounds good. I am closing the issue. Thanks for the help!

ShahRutav changed the title ~~Hi @ArthurZucker,~~ LlavaForConditionalGeneration logit values are batch size dependent. Feb 27, 2024

ShahRutav closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlavaForConditionalGeneration logit values are batch size dependent. #29327

LlavaForConditionalGeneration logit values are batch size dependent. #29327

ShahRutav commented Feb 27, 2024

ArthurZucker commented Feb 28, 2024

ArthurZucker commented Feb 28, 2024

ShahRutav commented Feb 28, 2024 •

edited

Loading

ArthurZucker commented Feb 28, 2024

ArthurZucker commented Feb 28, 2024

ShahRutav commented Feb 28, 2024

ArthurZucker commented Feb 28, 2024

ShahRutav commented Feb 28, 2024

LlavaForConditionalGeneration logit values are batch size dependent. #29327

LlavaForConditionalGeneration logit values are batch size dependent. #29327

Comments

ShahRutav commented Feb 27, 2024

ArthurZucker commented Feb 28, 2024

ArthurZucker commented Feb 28, 2024

ShahRutav commented Feb 28, 2024 • edited Loading

ArthurZucker commented Feb 28, 2024

ArthurZucker commented Feb 28, 2024

ShahRutav commented Feb 28, 2024

ArthurZucker commented Feb 28, 2024

ShahRutav commented Feb 28, 2024

ShahRutav commented Feb 28, 2024 •

edited

Loading