[Proposal, open for discussion] Better way of extracting hidden states #27873

NielsRogge · 2023-12-06T20:22:05Z

What does this PR do?

Currently, our AutoBackbone classes allow to get specific feature maps out of a certain vision model. For example:

from transformers import ConvNextBackbone
import torch

model = ConvNextBackbone.from_pretrained("facebook/convnext-small-224", out_indices=[0,1,2,3])

pixel_values = torch.randn(1, 3, 224, 224)

feature_maps = model(pixel_values)
for i in feature_maps:
   print(i.shape)

However, they currently extract all intermediate hidden states, store them in memory, and return the ones required by the user. This is not efficient, we should store only activations required by the user in memory.

This current PR proposes to only return the hidden states specified by config.out_indices when the user sets output_hidden_states=True. However, this is not backwards compatible (as by default we do return all hidden states). So I'm open for suggestions on how we could improve this. Alternatively, we could make it backwards compatible by setting out_indices to all stages by default.

I think this could be an argument that is part of all configs, or at least vision encoders, which typically only require certain hidden states to be extracted.

Curious to hear opinions of @ArthurZucker @amyeroberts

ArthurZucker · 2023-12-07T07:20:50Z

Yep I like this optimisation, non breaking overall

NielsRogge · 2023-12-07T07:44:05Z

@ArthurZucker it is a breaking change in its current state, since out_indices currently defaults to the last stage index if the user doesn't specify them (think @amyeroberts added that here). So if we were to add this with backwards compatibility, we would have to update the default out_indices to all stages in case they are not specified.

ArthurZucker · 2023-12-07T09:39:49Z

We can set it to -1 to return everything maybe but I mean we can make it BC!

NielsRogge · 2023-12-11T10:42:25Z

I'd like to have @amyeroberts's opinion on this one

amyeroberts · 2023-12-11T17:47:47Z

it is a breaking change in its current state, since out_indices currently defaults to the last stage index if the user doesn't specify them (think @amyeroberts added that here).

This was just matching the logic that was originally implemented for the out_features (selecting the last layer). As you added this @NielsRogge you'll know the motivation for this better than me :)

As it stands this, I'm not in favour of this as this requires adding in backbone API / logic into standard model APIs. This is essentially making things leaky: why do I need to know about out_indices to get my hidden states if I'm not loading a backbone?

Moreover, this is going to break a tonne of stuff, as users who have created checkpoints which are not backbones will still have out_indices set in the model config. This isn't easy to rectify: how would we know if the values in the config are what the user wanted e.g. just the last hidden state, or it just happened to be the default when the config was created?

It introduces inconsistencies in our models forward passes, which makes the code harder to understand and is tying non-backbone logic to an API which still isn't 100% stable at the moment.

An alternative approach would be to have a different argument in the config which defaults to all the layers but then can be overridden by the config's out_indices when loading a backbone.

amyeroberts · 2023-12-11T17:58:04Z

Actually, not another config parameter - because then the source of truth isn't clear and behaviour for the user can be unexpected.

github-actions · 2024-01-06T08:03:37Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

NielsRogge added 2 commits December 6, 2023 20:57

Add first draft

c57f25b

Fix tests

6376507

NielsRogge mentioned this pull request Dec 7, 2023

[Llava] Add Llava to transformers #27662

Merged

ArthurZucker requested a review from amyeroberts December 11, 2023 14:17

github-actions bot closed this Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal, open for discussion] Better way of extracting hidden states #27873

[Proposal, open for discussion] Better way of extracting hidden states #27873

NielsRogge commented Dec 6, 2023 •

edited

Loading

ArthurZucker commented Dec 7, 2023

NielsRogge commented Dec 7, 2023 •

edited

Loading

ArthurZucker commented Dec 7, 2023

NielsRogge commented Dec 11, 2023

amyeroberts commented Dec 11, 2023

amyeroberts commented Dec 11, 2023

github-actions bot commented Jan 6, 2024

[Proposal, open for discussion] Better way of extracting hidden states #27873

[Proposal, open for discussion] Better way of extracting hidden states #27873

Conversation

NielsRogge commented Dec 6, 2023 • edited Loading

What does this PR do?

ArthurZucker commented Dec 7, 2023

NielsRogge commented Dec 7, 2023 • edited Loading

ArthurZucker commented Dec 7, 2023

NielsRogge commented Dec 11, 2023

amyeroberts commented Dec 11, 2023

amyeroberts commented Dec 11, 2023

github-actions bot commented Jan 6, 2024

NielsRogge commented Dec 6, 2023 •

edited

Loading

NielsRogge commented Dec 7, 2023 •

edited

Loading