Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test fetcher (doctest) + Idefics2's doc example #30274

Merged
merged 1 commit into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 5 additions & 8 deletions src/transformers/models/idefics2/modeling_idefics2.py
Original file line number Diff line number Diff line change
Expand Up @@ -1786,17 +1786,13 @@ def forward(
>>> from transformers import AutoProcessor, AutoModelForVision2Seq
>>> from transformers.image_utils import load_image

>>> DEVICE = "cuda:0"

>>> # Note that passing the image urls (instead of the actual pil images) to the processor is also possible
>>> image1 = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
>>> image2 = load_image("https://cdn.britannica.com/59/94459-050-DBA42467/Skyline-Chicago.jpg")
>>> image3 = load_image("https://cdn.britannica.com/68/170868-050-8DDE8263/Golden-Gate-Bridge-San-Francisco.jpg")

>>> processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b-base")
>>> model = AutoModelForVision2Seq.from_pretrained(
... "HuggingFaceM4/idefics2-8b-base",
>>> ).to(DEVICE)
>>> model = AutoModelForVision2Seq.from_pretrained("HuggingFaceM4/idefics2-8b-base", device_map="auto")

>>> BAD_WORDS_IDS = processor.tokenizer(["<image>", "<fake_token_around_image>"], add_special_tokens=False).input_ids
>>> EOS_WORDS_IDS = [processor.tokenizer.eos_token_id]
Expand All @@ -1805,15 +1801,16 @@ def forward(
>>> prompts = [
... "<image>In this image, we can see the city of New York, and more specifically the Statue of Liberty.<image>In this image,",
... "In which city is that bridge located?<image>",
>>> ]
... ]
>>> images = [[image1, image2], [image3]]
>>> inputs = processor(text=prompts, padding=True, return_tensors="pt").to(DEVICE)
>>> inputs = processor(text=prompts, padding=True, return_tensors="pt").to("cuda")

>>> # Generate
>>> generated_ids = model.generate(**inputs, bad_words_ids=BAD_WORDS_IDS, max_new_tokens=500)
>>> generated_ids = model.generate(**inputs, bad_words_ids=BAD_WORDS_IDS, max_new_tokens=20)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very slow, so let's just use 20

>>> generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

>>> print(generated_texts)
['In this image, we can see the city of New York, and more specifically the Statue of Liberty. In this image, we can see the city of New York, and more specifically the Statue of Liberty.\n\n', 'In which city is that bridge located?\n\nThe bridge is located in the city of Pittsburgh, Pennsylvania.\n\n\nThe bridge is']
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results are not very good. But we don't have it previously, so I don't know if we consider it bad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by these generations - they also don't look like the outputs when I was integrating the model 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you were using other type of GPUs. CI is using T4.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly - I was using NVIDIA A10G. I find it quite concerning the generations can be so much worse between hardware though, especially given this isn't a tiny model

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check it tomorrow. But I will merge first 🙏

```"""

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
Expand Down
2 changes: 1 addition & 1 deletion utils/tests_fetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,7 +507,7 @@ def get_all_doctest_files() -> List[str]:
# change to use "/" as path separator
test_files_to_run = ["/".join(Path(x).parts) for x in test_files_to_run]
# don't run doctest for files in `src/transformers/models/deprecated`
test_files_to_run = [x for x in test_files_to_run if "models/deprecated" not in test_files_to_run]
test_files_to_run = [x for x in test_files_to_run if "models/deprecated" not in x]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad


# only include files in `src` or `docs/source/en/`
test_files_to_run = [x for x in test_files_to_run if x.startswith(("src/", "docs/source/en/"))]
Expand Down
Loading