Modify resize_token_embeddings to ensure output type is same as input #31979

bayllama · 2024-07-15T16:15:32Z

What does this PR do?

Modified resize_token_embeddings to return the same class that is passed as input to it. Today, even if a custom embedding class is passed, resize_token_embeddings converts it to a nn.Embedding but this commit makes sure that does not happen and the custom embedding class is returned.

Fixes # (31835)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@zucchini-nlp

zucchini-nlp

Great work @bayllama !

One thing to note is that the newly created Embedding will not have the same embed scale as the old one, because we don't pass embed_scale at creation and the default is 1.0.

What if we don't rely on what type of embedding class is being used and modify the weights of the old_embedding in-place and return it as new_embeddings. Something like this added at the end before returning

old_embeddings.weight.data = new_embeddings.weight.data
return old_embeddings

@amyeroberts WDYT of this idea? It doesn't break BC and makes ModelScaledEmbedding happy

bayllama · 2024-07-16T15:31:35Z

@zucchini-nlp
That's a good idea. I have made the change you recommended.

amyeroberts

Thanks for fixing and nice suggestion @zucchini-nlp!

src/transformers/modeling_utils.py

bayllama · 2024-07-18T04:42:43Z

@zucchini-nlp @amyeroberts Added the comment describing the change

zucchini-nlp

Perfect, thanks for working on this! I'll make sure that tests are passing and it can be merged later today

HuggingFaceDocBuilderDev · 2024-07-18T05:11:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2024-07-18T08:08:16Z

@bayllama the tests are failing because we pass device and dtype to the Embedding Class when resizing, but the custom classes don't accept any kwargs. Can you add **kwargs plz to init in the correct device and dtype?

You should be able to verify all tests are passing by this command :)

pytest -k test_resize_token tests/models/

bayllama · 2024-07-18T16:11:43Z

@zucchini-nlp

When I run the pytest that you suggested I don't see the error that you have mentioned above. We don't set the device and dtype for the Custom Embedding Class at all right?

new_embeddings = nn.Embedding(
            new_num_tokens,
            old_embedding_dim,
            device=old_embeddings.weight.device,
            dtype=old_embeddings.weight.dtype,
        )

We only do it for the nn.Embedding and the replace the weight.data in the old embeddings which is already in the correct device and has the right dtype.

However I do see another issue which is in hte resize_token_embeddings, this attribute "text_config" is being modified and I haven't accounted for that in my previous commit. I will work on a fix for this. However can you please give me more details on the dtype and device issue, because I don't see it in my set of tests.

zucchini-nlp · 2024-07-19T05:23:39Z

@bayllama my bad, I was running in the a different branch. You're right, some VLMs are failing, as I see it should be fixed with one line after we swapped weights! Let me know when it's fixed :)

bayllama · 2024-07-19T07:48:04Z

@zucchini-nlp Changing the shape of the old_embeddings would fix this, however shape attribute is not writable in torch, hence we cannot do something like this,

old_embeddings.weight.shape = new_embeddings.weight.shape

Hence I am thinking of what would be the best way to do this. If nothing works out I may need to go to the method I was following in my first PR where I check the type of input embedding and create a new object and return it.

zucchini-nlp · 2024-07-19T08:05:25Z

@bayllama I see, but we can also change the num_embeddings attribute which is writable.
Like old_embeddings.num_embeddings = new_embeddings.weight.data.shape[0]

bayllama · 2024-07-20T04:29:47Z

@zucchini-nlp In addition to the above I found a couple of more things,

The padding_idx must be updated to None if the number of tokens in the new embeddings is smaller than the padding_idx.
The model Lxmert has a bias term that has to be updated similar to the final_logits_bias in mbart. This if not updated would cause some failures in test cases. This problem has been there from before and I believe was not addressed.

In addition to what we already discussed about, I have made these changes in this commit as well. All of the test cases are passing for me now.

bayllama · 2024-07-21T18:34:29Z

@zucchini-nlp I am not sure what this tests_hub failure is. Could you please help out

zucchini-nlp

Great! Some final nits and we need to merge main as a final step, then we're good to go.

The failing tests are not related to this PR and should be resolved by rerunning them.

src/transformers/modeling_utils.py

zucchini-nlp · 2024-07-20T07:34:51Z

src/transformers/models/lxmert/modeling_lxmert.py

+    def _resize_bias(self, new_num_tokens: int) -> None:
+        old_num_tokens = self.bias.shape[0]
+        if new_num_tokens <= old_num_tokens:
+            new_bias = self.bias[:new_num_tokens]
+        else:
+            extra_bias = torch.zeros(new_num_tokens - old_num_tokens, device=self.bias.device)
+            new_bias = torch.cat([self.bias, extra_bias])
+        self.bias = nn.Parameter(new_bias)
+


Thanks, looks good. Imo this should be moved to the LxmertForPretraining, as this method is not going to be used by the head itself.

bayllama · 2024-07-22T06:44:05Z

@zucchini-nlp Made the changes that you have recommended. Please take a look. Thanks!

zucchini-nlp · 2024-07-22T07:45:53Z

@bayllama thanks, everything looks good! Rerunning the tests didn't help, so can you merge main in case it was already fixed there

zucchini-nlp · 2024-07-22T15:14:54Z

@amyeroberts can you merge this plz? Unrelated hub failures, from internal slack seems like we're not the only ones seeing it

amyeroberts · 2024-07-22T15:20:50Z

@zucchini-nlp There's been a fix push to main. @bayllama could you try rebasing?

I've also re-requested review as there's been several commits since my approval

…assed to it

bayllama · 2024-07-22T16:05:20Z

@zucchini-nlp @amyeroberts Seems like the tests went through after rebasing. Let me know if anything else is required here.

amyeroberts

Looks great - thanks @bayllama!

One last thing I forgot to ask for before merge is to add a test. There's already tests for resizing embeddings, so extending those to check the returned type should be enough

Could you also update the title so it's no longer truncated?

src/transformers/modeling_utils.py

…rtForPreTraining

bayllama · 2024-07-23T01:26:05Z

@amyeroberts @zucchini-nlp Added the test to make sure the correct type is returned. Also made the other changes suggested.

seokhyunan · 2024-07-24T13:02:34Z

This PR causes model.resize_token_embeddings to set vocab_size to zero (check this thread). Reverting this PR resolved the issue. Could you help with this?

bayllama · 2024-07-24T15:05:39Z

@zucchini-nlp @amyeroberts I believe zucchini-nlp already pushed in a fix for this. Let me know if I need to do something.

bayllama mentioned this pull request Jul 15, 2024

Embedding class is replaced when calling resize_token_embeddings #31835

Closed

4 tasks

bayllama force-pushed the progress branch 4 times, most recently from d8682a7 to 441ed3b Compare July 16, 2024 01:55

zucchini-nlp reviewed Jul 16, 2024

View reviewed changes

bayllama force-pushed the progress branch from 441ed3b to d9e1b5e Compare July 16, 2024 15:19

amyeroberts approved these changes Jul 17, 2024

View reviewed changes

src/transformers/modeling_utils.py Show resolved Hide resolved

zucchini-nlp approved these changes Jul 18, 2024

View reviewed changes

bayllama force-pushed the progress branch 3 times, most recently from 240bc5e to c023ebb Compare July 20, 2024 04:07

bayllama force-pushed the progress branch 2 times, most recently from 799299e to 9ebcd03 Compare July 21, 2024 16:21

zucchini-nlp reviewed Jul 22, 2024

View reviewed changes

bayllama force-pushed the progress branch from ab249b9 to 98819cb Compare July 22, 2024 06:35

bayllama force-pushed the progress branch 2 times, most recently from 9330df2 to 98819cb Compare July 22, 2024 15:09

amyeroberts self-requested a review July 22, 2024 15:20

Prashanth Sateesh and others added 3 commits July 22, 2024 08:44

Change resize_token_embeddings to make it return same Class that is p…

236f89e

…assed to it

Add explanatory comment as requested in review

94c86f2

Add explanatory comments for add resizing function in lxmert

2437b23

bayllama force-pushed the progress branch from 98819cb to 39a4bf0 Compare July 22, 2024 15:44

amyeroberts reviewed Jul 22, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

bayllama changed the title ~~Change resize_token_embeddings to make it return same Class that is p…~~ Modify resize_token_embeddings to ensure output type is same as input Jul 23, 2024

Add comment for padding_idx and moving _resize_bias in lxmert to Lxme…

91857f4

…rtForPreTraining

bayllama force-pushed the progress branch from 39a4bf0 to 91857f4 Compare July 23, 2024 01:14

amyeroberts merged commit 5a4a76e into huggingface:main Jul 23, 2024
23 checks passed

seokhyunan mentioned this pull request Jul 24, 2024

Llama 3 - RuntimeError: shape '[-1, 0]' is invalid for input of size 41041920 #32170

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify resize_token_embeddings to ensure output type is same as input #31979

Modify resize_token_embeddings to ensure output type is same as input #31979

bayllama commented Jul 15, 2024

zucchini-nlp left a comment

bayllama commented Jul 16, 2024

amyeroberts left a comment

bayllama commented Jul 18, 2024

zucchini-nlp left a comment

HuggingFaceDocBuilderDev commented Jul 18, 2024

zucchini-nlp commented Jul 18, 2024

bayllama commented Jul 18, 2024

zucchini-nlp commented Jul 19, 2024

bayllama commented Jul 19, 2024

zucchini-nlp commented Jul 19, 2024

bayllama commented Jul 20, 2024

bayllama commented Jul 21, 2024

zucchini-nlp left a comment

zucchini-nlp Jul 20, 2024

bayllama commented Jul 22, 2024

zucchini-nlp commented Jul 22, 2024

zucchini-nlp commented Jul 22, 2024

amyeroberts commented Jul 22, 2024

bayllama commented Jul 22, 2024

amyeroberts left a comment

bayllama commented Jul 23, 2024

seokhyunan commented Jul 24, 2024

bayllama commented Jul 24, 2024

Modify resize_token_embeddings to ensure output type is same as input #31979

Modify resize_token_embeddings to ensure output type is same as input #31979

Conversation

bayllama commented Jul 15, 2024

What does this PR do?

Before submitting

Who can review?

zucchini-nlp left a comment

Choose a reason for hiding this comment

bayllama commented Jul 16, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

bayllama commented Jul 18, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 18, 2024

zucchini-nlp commented Jul 18, 2024

bayllama commented Jul 18, 2024

zucchini-nlp commented Jul 19, 2024

bayllama commented Jul 19, 2024

zucchini-nlp commented Jul 19, 2024

bayllama commented Jul 20, 2024

bayllama commented Jul 21, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

zucchini-nlp Jul 20, 2024

Choose a reason for hiding this comment

bayllama commented Jul 22, 2024

zucchini-nlp commented Jul 22, 2024

zucchini-nlp commented Jul 22, 2024

amyeroberts commented Jul 22, 2024

bayllama commented Jul 22, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

bayllama commented Jul 23, 2024

seokhyunan commented Jul 24, 2024

bayllama commented Jul 24, 2024