Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask2former & Maskformer Fast Image Processor #35685

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

SangbumChoi
Copy link
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SangbumChoi
Copy link
Contributor Author

@yonigozlan @qubvel Hi, While I am doing this PR I had a one question of pad_size. Normally original processor of transformer have both do_pad and pad_size. I think the exsitance of pad_size can be many reason but one is for making strict size for inference in ONNX or torch.jit.trace model. (Since it can only accept the static size of image)

However, currently maskformer's processing policy is always pad the max size of width height in the given batch of images. So I was planning to give pad_size in the MaskFormerImageProcessor in order to add pad_size in the MaskFormerImageProcessorFast and modular_file in mask2former also. Can I have some feedback about this idea?

@qubvel
Copy link
Member

qubvel commented Jan 16, 2025

Hi @SangbumChoi! Thanks for working on fast processors 🤗 I think it would be nice to have in case the change is backward compatible with current configs, e.g. pad_size=None by default

@yonigozlan
Copy link
Member

Hi @SangbumChoi great to see more work with fast processors :). Agreed with @qubvel, I don't see any issues with adding functionalities to image processor as long as they are fully backward compatible and the default processing stays the same.

@SangbumChoi SangbumChoi marked this pull request as ready for review January 17, 2025 08:57
@SangbumChoi
Copy link
Contributor Author

@qubvel @yonigozlan Requesting first round review.

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me! It looks like there are a few things we can remove or simplify for maskformer and in the modular of mask2former

@SangbumChoi
Copy link
Contributor Author

@yonigozlan Hi, I have pushed all the commit except for #35685 (comment).

Test failure seem unrelated!

@SangbumChoi
Copy link
Contributor Author

@yonigozlan Reminder :)

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now! Thanks a lot for working on this 🤗.
Apart from the unnecessary imports to remove, LGTM!
@ArthurZucker This should be ready for your review

@SangbumChoi
Copy link
Contributor Author

@ArthurZucker Soft ping

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Sorry I delayed the review a bit waiting for #35069! Could you rebase ? The integration should be "simpler"

Otherwise great addition! A bit of benches would be welcome, but eitherway thanks! 🤗

@yonigozlan
Copy link
Member

Thanks again @SangbumChoi for adding this! As Arthur said, since my last review, there was a big refactor of fast image processors. Now in most cases you should only have to add custom code in the _preprocess function, and add docstrings to init and preprocess for added image processing kwargs if needed. However maskformer seems a bit more involved for backward compatibility because of the deprecated kwargs such as _maxsize, though the modifications needed should be very similar to the ones that were made for detr in #35069 , so hopefully that can help !
If you can make these modifications that would be great, and as this is quite a recent refactor I’d be glad to have your feedback and I’m happy to answer any questions you may have 🤗

@SangbumChoi
Copy link
Contributor Author

SangbumChoi commented Feb 21, 2025

@yonigozlan @ArthurZucker Hi team, I want to also upload the benchmark also. I think I have seen some code from @yonigozlan to test these changes and visualize in the graph but I think it can be accessed only internally can you share? (Otherwise you can share me through slack)

@yonigozlan
Copy link
Member

Hi @SangbumChoi! Could you run a simple benchmark like this one?

def benchmark_image_processor(image_processor, images,benchmark_it=10, warmup_it=10):
    # warm up
    for _ in range(warmup_it):
        _ = image_processor(images=images, return_tensors="pt", device=device)
    # benchmark
    start_time = time.time()
    for _ in range(benchmark_it):
        _ = image_processor(images=images, return_tensors="pt", device=device)
    end_time = time.time()

    return (end_time - start_time) / benchmark_it

image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
checkpoint = "maskformer/mask2former checkpoint"
image_processor_fast = AutoImageProcessor.from_pretrained(checkpoint, use_fast=True)
image_processor_slow = AutoImageProcessor.from_pretrained(checkpoint)
device = "cuda"
batch_size = 4

slow_time_one = benchmark_image_processor(image_processor_slow, image, benchmark_it=10)
fast_time_one = benchmark_image_processor(image_processor_fast, image, benchmark_it=10)
slow_time_batch = benchmark_image_processor(image_processor_slow, [image]*batch_size, benchmark_it=10)
fast_time_batch = benchmark_image_processor(image_processor_fast, [image]*batch_size, benchmark_it=10)

print(f"slow_time_one: {slow_time_one}, fast_time_one: {fast_time_one}, speedup: {slow_time_one/fast_time_one}")
print(f"slow_time_batch: {slow_time_batch}, fast_time_batch: {fast_time_batch}, speedup: {slow_time_batch/fast_time_batch}")

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants