Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix max size deprecated warning #34998

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

HichTala
Copy link

@HichTala HichTala commented Nov 28, 2024

This pull request focuses on removing the deprecated max_size argument from the preprocess method across multiple image processing modules in the transformers library. This change simplifies the code and aligns with the planned deprecation of the max_size argument.

Key changes include:

Removal of max_size argument:

Fixes #34977

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@qubvel here is my pull request.

…` and triggered unnecessary deprecated warning
…` and triggered unnecessary deprecated warning
@qubvel qubvel self-requested a review November 28, 2024 14:32
@HichTala
Copy link
Author

While working in this PR I've notices some other possible improvement, to advance with the expected depreciation of the max_size argument.

Here I'm not sure what type size we want to handle :

if "max_size" in kwargs:
logger.warning_once(
"The `max_size` parameter is deprecated and will be removed in v4.26. "
"Please specify in `size['longest_edge'] instead`.",
)
max_size = kwargs.pop("max_size")
else:
max_size = None
size = get_size_dict(size, max_size=max_size, default_to_square=False)

If it is a only Dict type, like the typing of the function specify, then max_size variable can be removed, we can imagine something like this ?

size = size if size is not None else {"shortest_edge": 800, "longest_edge": 1333}
if "max_size" in kwargs:
    logger.warning_once(
        "The `max_size` parameter is deprecated and will be removed in v4.26. "
        "Please specify in `size['longest_edge'] instead`.",
    )
    size['longest_edge'] = kwargs.pop("max_size")

size = get_size_dict(size, default_to_square=False)

I've also noticed weird use of size in SameImageProcessor

size = size if size is not None else {"longest_edge": 1024}
size = get_size_dict(max_size=size, default_to_square=False) if not isinstance(size, dict) else size

I think we should normalized to use of Dict across the code, for example here by passing something like {'longest_edge': size} as argument ?

And here also the typing of the function specify the Dict type for size, but we are also handling other types, is this ok ?

If I get more precision, I'd be happy to contribute and may be work on the suppression of the max_size parameter across the code ?

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HichTala Thanks for cleaning this up!

@HichTala
Copy link
Author

No worries, if you'd like me to clean up a bit more as I explained in my previous comment, don't hesitate to let me know!

@qubvel
Copy link
Member

qubvel commented Nov 28, 2024

I suppose max_size can be entirely removed here, because we are at version 4.47 right now

if "max_size" in kwargs: 
     logger.warning_once( 
         "The `max_size` parameter is deprecated and will be removed in v4.26. " 
         "Please specify in `size['longest_edge'] instead`.", 
     ) 
     max_size = kwargs.pop("max_size") 
 else: 
     max_size = None 
 size = get_size_dict(size, max_size=max_size, default_to_square=False)

@qubvel
Copy link
Member

qubvel commented Nov 28, 2024

And here also the typing of the function specify the Dict type for size, but we are also handling other types, is this ok ?

There might be configs on the Hub that still have an integer as a size parameter. Thats why here we make it for backward compatability

@HichTala
Copy link
Author

Ok then, I'll try to clean up the detr variants a bit more and try not to break everything. 🫡

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Add a test to ensure test can pass successfully and backward compatibility
@HichTala
Copy link
Author

The tests pipelines still use max_size

image_processor = image_processing_class.from_dict(
self.image_processor_dict, size=42, max_size=84, pad_and_return_pixel_mask=False
)

I don't know if I'm allowed to modify those ?

Remove `max_size` from test pipelines and replace by `size` by a `Dict` with `'shortest_edge'` `'longest_edge'` as keys
@HichTala HichTala requested a review from qubvel November 28, 2024 17:47
@qubvel
Copy link
Member

qubvel commented Nov 28, 2024

Thanks for iterating! it seems its a breaking change, you can check with https://huggingface.co/microsoft/table-transformer-detection/ model on your branch and main:

from transformers import AutoProcessor

image_processor = AutoProcessor.from_pretrained(model.id)
print(image_processor.size)

Comment on lines 880 to 883
size = (
{"shortest_edge": size, "longest_edge": 1333} if isinstance(size, int) else size
) # Backwards compatibility
size = get_size_dict(size, default_to_square=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment should be above to avoid this line break

@HichTala
Copy link
Author

HichTala commented Nov 28, 2024

The model you sent use max_size:
https://huggingface.co/microsoft/table-transformer-detection/blob/main/preprocessor_config.json
(L16)

Do we raise an error saying it's deprecated, or do we put back the deprecation warnings?

@qubvel
Copy link
Member

qubvel commented Nov 29, 2024

Ideally, it would be great to make PRs with updated size for all detection models with over 100 downloads on the hub, merge them, and then eliminate the max size in the codebase. It might be a bit time-consuming, so it's up to you! If you choose this route, I can share a draft script on how to fetch the required configs and make PRs. Otherwise, let's revert to backward compatibility with a warning and clean up just where it's possible.

I'm happy with whichever decision you make!

@HichTala
Copy link
Author

I'm ok to work on this. I think I'll rollback this PR to point where backward compatibility is ensured with warning, I'll update the "version 4.26" to "future version" it's may be more meaning full. That way, it can be merged and my code will finally be able to launch without those flooding warning messages. I'll then open some new PRs to start eliminating the max size.

It would be great if you can share the draft script with me.

@qubvel
Copy link
Member

qubvel commented Nov 29, 2024

Here is a draft I used to fetch object detection models image processor's configs:

from transformers import AutoProcessor
from huggingface_hub import list_models, hf_hub_download
import json
import os

# Filter models with the tag 'pipeline_tag=object-detection'
models = list_models(tags="object-detection", gated=False, sort="downloads", limit=1000)

count = 0
for model in models:
    try:
        # Download preprocessing_config.json
        config_path = hf_hub_download(repo_id=model.id, filename="preprocessor_config.json")
        with open(config_path, "r") as file:
            config = json.load(file)

        # Check if 'size' is a dict
        if not isinstance(config.get("size"), dict):
            print(f"#{count} Model: {model.id}")
            print(f"Created: {model.created_at}")
            print(f"Downloads: {model.downloads}\n")
            print()
            count += 1
            
            # resotor processor
            image_processor = AutoProcessor.from_pretrained(model.id)
            print(image_processor.size)

    except Exception as e:
        print(f"Error processing model {model.id}: {e}")

The idea is to run this on main to load configs and then push them

You first load it

image_processor = AutoProcessor.from_pretrained(model.id)

And then push back + open a PR

image_processor.save_preatrained(model.id, create_pr=True)   # or smth similar

I will also need links to all the PRs, to share them with someone who can merge them.

Please be careful in order not to spam all models, lets do it batch-wise, for the first iteration we can do it for the first five models.

@HichTala
Copy link
Author

Thanks for sharing! I’ll probably work on this at the start of next week. Could you let me know how can I provide links to the PRs once they’re created?

@qubvel
Copy link
Member

qubvel commented Nov 29, 2024

Probably, save_pretrained returns something, but I'm not sure! It's something that has to be explored. Alternatively, you can use the huggingface_hub library to upload a json file and similarly open a PR. This way, it will return a link/object containing a link.

@qubvel
Copy link
Member

qubvel commented Nov 29, 2024

I’ll probably work on this at the start of next week.

No worries at all! Manage your time according to your preferences, and we will always be happy with your contributions. 🤗

@qubvel qubvel self-requested a review December 3, 2024 10:46
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We can merge it in this state and continue in a follow-up PR.

@qubvel qubvel requested a review from ArthurZucker December 3, 2024 11:07
@qubvel
Copy link
Member

qubvel commented Dec 3, 2024

@ArthurZucker, a minor cleanup for max_size variables and warnings in image processors.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Scanned a bit, I am guessing this was breaking for a few models so we are not removing support?
IMO we should still break as we had a deprecation cycle but try to help the models on the hub, unless our deprecation cycle did not cover a certain case -> we do another deprecation cycle for the missing cases!

@qubvel
Copy link
Member

qubvel commented Dec 20, 2024

@ArthurZucker this one should not break anything, just a minor clean-up. The second one (linked) is breaking, we are working to update hub config first

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Let's juste remove the futur version, it's vague and unhelpful. We can leave it as is people know it's already deprecated!

"The `max_size` parameter is deprecated and will be removed in v4.26. "
"The `max_size` parameter is deprecated and will be removed in a future version. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to revert!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@@ -1347,7 +1346,7 @@ def preprocess(

do_resize = self.do_resize if do_resize is None else do_resize
size = self.size if size is None else size
size = get_size_dict(size=size, max_size=max_size, default_to_square=False)
size = get_size_dict(size=size, default_to_square=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecation Warning for max_size in DetrImageProcessor.preprocess
4 participants