Pali gemma modeling #1895

drbh · 2024-05-14T19:08:49Z

This PR adds paligemma modeling code

Blog post: https://huggingface.co/blog/paligemma
Transformers PR: huggingface/transformers#30814

install the latest changes and run with

# get the weights
# text-generation-server download-weights gv-hf/PaliGemma-base-224px-hf

# run TGI
text-generation-launcher --model-id gv-hf/PaliGemma-base-224px-hf

basic example sending various requests

from huggingface_hub import InferenceClient

client = InferenceClient("http://127.0.0.1:3000")


images = [
    "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/cow_beach_1.png",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png",
]

prompts = [
    "What animal is in this image?",
    "Name three colors in this image.",
    "What are 10 colors in this image?",
    "Where is the cow standing?",
    "answer en Where is the cow standing?",
    "Is there a bird in the image?",
    "Is ther a cow in the image?",
    "Is there a rabbit in the image?",
    "how many birds are in the image?",
    "how many rabbits are in the image?",
]

for img in images:
    print(f"\nImage: {img.split('/')[-1]}")
    for prompt in prompts:
        inputs = f"![]({img}){prompt}\n"
        json_data = {
            "inputs": inputs,
            "parameters": {
                "max_new_tokens": 30,
                "do_sample": False,
            },
        }
        generated_output = client.text_generation(prompt, max_new_tokens=30, stream=False)
        print([f"{prompt}\n{generated_output}"])

Narsil · 2024-05-15T07:31:48Z

server/text_generation_server/models/vlm_causal_lm.py

+                    if config.model_type == "paligemma":
+                        full_text += "<bos>" + chunk["content"] + "\n"
+                    else:
+                        full_text += chunk["content"]


Can you revert this ? This is already taken care of by the PaliGemmaBatch.

Also we should probably raise an error when the query is not {image}, {text}. (single text, single image, image before text)

Narsil

We also need to add the causal flag to all flash attention places.

This reverts commit 79b15fe.

This reverts commit ec92601.

env.

This PR adds paligemma modeling code Blog post: https://huggingface.co/blog/paligemma Transformers PR: huggingface/transformers#30814 install the latest changes and run with ```bash # get the weights # text-generation-server download-weights gv-hf/PaliGemma-base-224px-hf # run TGI text-generation-launcher --model-id gv-hf/PaliGemma-base-224px-hf ``` basic example sending various requests ```python from huggingface_hub import InferenceClient client = InferenceClient("http://127.0.0.1:3000") images = [ "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/cow_beach_1.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png", ] prompts = [ "What animal is in this image?", "Name three colors in this image.", "What are 10 colors in this image?", "Where is the cow standing?", "answer en Where is the cow standing?", "Is there a bird in the image?", "Is ther a cow in the image?", "Is there a rabbit in the image?", "how many birds are in the image?", "how many rabbits are in the image?", ] for img in images: print(f"\nImage: {img.split('/')[-1]}") for prompt in prompts: inputs = f"![]({img}){prompt}\n" json_data = { "inputs": inputs, "parameters": { "max_new_tokens": 30, "do_sample": False, }, } generated_output = client.text_generation(prompt, max_new_tokens=30, stream=False) print([f"{prompt}\n{generated_output}"]) ``` --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

drbh and others added 16 commits May 14, 2024 12:30

feat: load and query model

5fd72ed

feat: improve config and refactor

b07b53e

fix: debugging

2329434

fix: adjust siglip attention

e13c08f

fix: debug avoid scaling embed

d503007

fix: adjust image and text merge logic

36fb4b5

fix: typo and lint

4df1b25

fix: adjust inputs_embeds passed to language model and debug

6e8a211

fix: prefer gemma rotary embed and split attention weight

5b3b8fd

fix: small test tweak

9b9614c

Don't break what's not broken.

ebbe7ed

Back functional gemma.

67e833c

Fixed PaliGemma.

c119ac4

fix: apply paligemma template conditionally

d6e306c

fix: improve pali test and add snapshot

70713fc

fix: default add special tokens to avoid vlm regressions

17ac93e

Narsil reviewed May 15, 2024

View reviewed changes

Narsil added 12 commits May 15, 2024 10:17

Working integration-tests.

65bc0aa

Fixed.

1bcaf8f

Small updates.

e8d0218

Installing git.

79b15fe

Revert "Installing git."

ec92601

This reverts commit 79b15fe.

Revert "Revert "Installing git.""

81e7aac

This reverts commit ec92601.

Trying to understand the weird failure.

368c057

Change the dockerfile. It builds locally, something might be up in AWS

dc0b8d7

env.

DEbugging this nightmare.

f3f7140

Using updated runner.

f8337a9

Another attempt.

fcb62c7

Sshing a cuda 12.4

9005970

Upgrade mamba.

7f97fda

Narsil approved these changes May 16, 2024

View reviewed changes

Narsil merged commit 40213c9 into main May 16, 2024
8 checks passed

Narsil deleted the pali-gemma-modeling branch May 16, 2024 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pali gemma modeling #1895

Pali gemma modeling #1895

drbh commented May 14, 2024 •

edited

Loading

Narsil May 15, 2024

Narsil left a comment

Pali gemma modeling #1895

Pali gemma modeling #1895

Conversation

drbh commented May 14, 2024 • edited Loading

Narsil May 15, 2024

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

drbh commented May 14, 2024 •

edited

Loading