-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: add llava to autoawq #250
Conversation
This PR is in a draft state, will ping you once ready |
The PR is now ready for review! |
Looking forward to LLaVa being added. Can we also run inference in AutoAWQ after this PR? |
@casper-hansen I will try out and let you know |
I can confirm this script worked fine for me: import requests
import torch
from PIL import Image
from awq import AutoAWQForCausalLM
from transformers import AutoProcessor
quant_path = "ybelkada/llava-1.5-7b-hf-awq"
# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, safetensors=True, device_map={"": 0})
processor = AutoProcessor.from_pretrained(quant_path)
prompt = "USER: <image>\nWhat are these?\nASSISTANT:"
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
# Generate output
generation_output = model.generate(
**inputs,
max_new_tokens=512
)
print(processor.decode(generation_output[0], skip_special_tokens=True)) Let me know if I should modify anyhting else |
I tried running through the llava quantization and generation example. They both work but there is one problem with the quantization example. It seems we are not saving the preprocessor_config.json, so if you run the quant example and then generation example after, it does not work because it's missing a config file. It seems that EDIT: Seems this could work EDIT 2: Fixed this! |
LLava is an new and exciting multi-modal architecture that has been recently integrated in HF transformers
This PR adds llava support in transformers.
With huggingface/transformers#27950 you can load the converted llava weights in 4bit:
cc @casper-hansen