-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ImageProcessorFast to Qwen2.5-VL processor #36164
Merged
+20
−748
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
ce16648
add qwen2 fast image processor to modular file
Isotr0py 9d3063a
fix modular
Isotr0py 762be48
fix circle import
Isotr0py 099cb99
add docs
Isotr0py bd06d88
Merge branch 'main' into qwen2_5-fastprocessor
Isotr0py 1b96d6e
fix typo
Isotr0py b8262bb
Merge branch 'main' into qwen2_5-fastprocessor
Isotr0py 0b57b91
add modular generated files
Isotr0py 1b8c746
Merge branch 'main' into qwen2_5-fastprocessor
Isotr0py 5a933d0
revert qwen2vl fast image processor
Isotr0py 228f2ee
remove qwen2.5-vl image processor from modular
Isotr0py e68dd27
re-generate qwen2.5-vl files
Isotr0py b21f773
remove unnecessary test
Isotr0py 0827393
fix auto map
Isotr0py fac86d5
cleanup
Isotr0py cf1ed99
fix model_input_names
Isotr0py b92c728
Merge branch 'main' into qwen2_5-fastprocessor
Isotr0py f45bbac
remove import
Isotr0py 97469e7
Merge branch 'main' into qwen2_5-fastprocessor
Isotr0py 7e35cd2
make fix-copies
Isotr0py cc350d0
Merge branch 'main' into qwen2_5-fastprocessor
Isotr0py File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
426 changes: 0 additions & 426 deletions
426
src/transformers/models/qwen2_5_vl/image_processing_qwen2_5_vl.py
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,7 +29,6 @@ | |
from torch.nn import CrossEntropyLoss | ||
|
||
from transformers.models.qwen2_vl.configuration_qwen2_vl import Qwen2VLConfig | ||
from transformers.models.qwen2_vl.image_processing_qwen2_vl import Qwen2VLImageProcessor | ||
from transformers.models.qwen2_vl.modeling_qwen2_vl import ( | ||
PatchEmbed, | ||
PatchMerger, | ||
|
@@ -830,48 +829,6 @@ def prepare_inputs_for_generation( | |
return model_inputs | ||
|
||
|
||
class Qwen2_5_VLImageProcessor(Qwen2VLImageProcessor): | ||
r""" | ||
Constructs a Qwen2.5-VL image processor that dynamically resizes images based on the original images. | ||
|
||
Args: | ||
do_resize (`bool`, *optional*, defaults to `True`): | ||
Whether to resize the image's (height, width) dimensions. | ||
resample (`PILImageResampling`, *optional*, defaults to `Resampling.BICUBIC`): | ||
Resampling filter to use when resizing the image. | ||
do_rescale (`bool`, *optional*, defaults to `True`): | ||
Whether to rescale the image by the specified scale `rescale_factor`. | ||
rescale_factor (`int` or `float`, *optional*, defaults to `1/255`): | ||
Scale factor to use if rescaling the image. | ||
do_normalize (`bool`, *optional*, defaults to `True`): | ||
Whether to normalize the image. | ||
image_mean (`float` or `List[float]`, *optional*, defaults to `[0.48145466, 0.4578275, 0.40821073]`): | ||
Mean to use if normalizing the image. This is a float or list of floats for each channel in the image. | ||
image_std (`float` or `List[float]`, *optional*, defaults to `[0.26862954, 0.26130258, 0.27577711]`): | ||
Standard deviation to use if normalizing the image. This is a float or list of floats for each channel in the image. | ||
do_convert_rgb (`bool`, *optional*, defaults to `True`): | ||
Whether to convert the image to RGB. | ||
min_pixels (`int`, *optional*, defaults to `56 * 56`): | ||
The min pixels of the image to resize the image. | ||
max_pixels (`int`, *optional*, defaults to `28 * 28 * 1280`): | ||
The max pixels of the image to resize the image. | ||
patch_size (`int`, *optional*, defaults to 14): | ||
The spacial patch size of the vision encoder. | ||
temporal_patch_size (`int`, *optional*, defaults to 2): | ||
The temporal patch size of the vision encoder. | ||
merge_size (`int`, *optional*, defaults to 2): | ||
The merge size of the vision encoder to llm encoder. | ||
""" | ||
|
||
model_input_names = [ | ||
"pixel_values", | ||
"image_grid_thw", | ||
"pixel_values_videos", | ||
"video_grid_thw", | ||
"second_per_grid_ts", | ||
] | ||
|
||
|
||
class Qwen2_5_VLVideosProcessorKwargs(VideosKwargs, total=False): | ||
fps: Union[List[float], float] | ||
|
||
|
@@ -889,18 +846,25 @@ class Qwen2_5_VLProcessorKwargs(ProcessingKwargs, total=False): | |
class Qwen2_5_VLProcessor(Qwen2VLProcessor): | ||
r""" | ||
Constructs a Qwen2.5-VL processor which wraps a Qwen2.5-VL image processor and a Qwen2 tokenizer into a single processor. | ||
[`Qwen2_5_VLProcessor`] offers all the functionalities of [`Qwen2_5_VLImageProcessor`] and [`Qwen2TokenizerFast`]. See the | ||
[`Qwen2_5_VLProcessor`] offers all the functionalities of [`Qwen2VLImageProcessor`] and [`Qwen2TokenizerFast`]. See the | ||
[`~Qwen2_5_VLProcessor.__call__`] and [`~Qwen2_5_VLProcessor.decode`] for more information. | ||
Args: | ||
image_processor ([`Qwen2_5_VLImageProcessor`], *optional*): | ||
image_processor ([`Qwen2VLImageProcessor`], *optional*): | ||
The image processor is a required input. | ||
tokenizer ([`Qwen2TokenizerFast`], *optional*): | ||
The tokenizer is a required input. | ||
chat_template (`str`, *optional*): A Jinja template which will be used to convert lists of messages | ||
in a chat into a tokenizable string. | ||
""" | ||
|
||
image_processor_class = "Qwen2_5_VLImageProcessor" | ||
image_processor_class = "AutoImageProcessor" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. most needed! |
||
|
||
@property | ||
def model_input_names(self): | ||
tokenizer_input_names = self.tokenizer.model_input_names | ||
image_processor_input_names = self.image_processor.model_input_names | ||
names_from_processor = list(dict.fromkeys(tokenizer_input_names + image_processor_input_names)) | ||
return names_from_processor + ["second_per_grid_ts"] | ||
|
||
def __call__( | ||
self, | ||
|
@@ -913,7 +877,7 @@ def __call__( | |
Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text` | ||
and `kwargs` arguments to Qwen2TokenizerFast's [`~Qwen2TokenizerFast.__call__`] if `text` is not `None` to encode | ||
the text. To prepare the vision inputs, this method forwards the `vision_infos` and `kwrags` arguments to | ||
Qwen2_5_VLImageProcessor's [`~Qwen2_5_VLImageProcessor.__call__`] if `vision_infos` is not `None`. | ||
Qwen2VLImageProcessor's [`~Qwen2VLImageProcessor.__call__`] if `vision_infos` is not `None`. | ||
|
||
Args: | ||
images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`): | ||
|
@@ -1016,6 +980,5 @@ def __call__( | |
"Qwen2_5_VLForConditionalGeneration", | ||
"Qwen2_5_VLModel", | ||
"Qwen2_5_VLPreTrainedModel", | ||
"Qwen2_5_VLImageProcessor", | ||
"Qwen2_5_VLProcessor", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not breaking because we added it in this release!