Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: add llava to autoawq #250

Merged
merged 10 commits into from
Dec 23, 2023
Merged

FEAT: add llava to autoawq #250

merged 10 commits into from
Dec 23, 2023

Conversation

younesbelkada
Copy link
Collaborator

LLava is an new and exciting multi-modal architecture that has been recently integrated in HF transformers
This PR adds llava support in transformers.

With huggingface/transformers#27950 you can load the converted llava weights in 4bit:

from transformers import pipeline
from PIL import Image    
import requests

model_id = "ybelkada/llava-1.5-7b-hf"
pipe = pipeline("image-to-text", model=quant_path, device=0)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-neg.png"

image = Image.open(requests.get(url, stream=True).raw)
prompt = "USER: <image>\nCan you please describe this image?\nASSISTANT:"

outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 100})
print(outputs[0]["generated_text"])

image

USER: \nCan you please describe this image?\nASSISTANT: The image features a brown and white cat sitting on a green surface, possibly a carpet or a grassy area. The cat is holding a red ball in its paws, seemingly playing with it. The cat appears to be focused on the ball, possibly preparing to play or just enjoying the toy.

cc @casper-hansen

@younesbelkada
Copy link
Collaborator Author

This PR is in a draft state, will ping you once ready

@younesbelkada younesbelkada marked this pull request as ready for review December 11, 2023 18:39
@younesbelkada
Copy link
Collaborator Author

The PR is now ready for review!

@casper-hansen
Copy link
Owner

Looking forward to LLaVa being added. Can we also run inference in AutoAWQ after this PR?

@younesbelkada
Copy link
Collaborator Author

@casper-hansen I will try out and let you know

@younesbelkada
Copy link
Collaborator Author

I can confirm this script worked fine for me:

import requests
import torch
from PIL import Image

from awq import AutoAWQForCausalLM
from transformers import AutoProcessor

quant_path = "ybelkada/llava-1.5-7b-hf-awq"

# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, safetensors=True, device_map={"": 0})
processor = AutoProcessor.from_pretrained(quant_path)

prompt = "USER: <image>\nWhat are these?\nASSISTANT:"
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"

raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
# Generate output
generation_output = model.generate(
    **inputs, 
    max_new_tokens=512
)

print(processor.decode(generation_output[0], skip_special_tokens=True))

Let me know if I should modify anyhting else

@casper-hansen
Copy link
Owner

casper-hansen commented Dec 23, 2023

I tried running through the llava quantization and generation example. They both work but there is one problem with the quantization example. It seems we are not saving the preprocessor_config.json, so if you run the quant example and then generation example after, it does not work because it's missing a config file.

It seems that AutoProcessor.from_pretrained() has no equivalent for save_pretrained().

EDIT: Seems this could work processor.image_processor.save_pretrained()

EDIT 2: Fixed this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants