-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TensorFlow implementation of ConvNeXTv2 #25558
Conversation
cc @amyeroberts |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice PR - thanks for adding this model!
Just a few small nits and comments. Once those are resolved we're good to merge! 🤗
0468796
to
cedf3ac
Compare
OK, pushed some updates! I think I've addressed everything. To get the nice docstring fixes and type annotation changes I had to modify the ConvNeXT v1 model files (since Of note, I'm getting some weird results when I run the
So only hidden states are out of spec, not actual logits. The strange thing is, if I run it against ConvNeXT v1, I get a similar result: $ CUDA_VISIBLE_DEVICES="" NVIDIA_TF32_OVERRIDE=0 transformers-cli pt-to-tf --no-pr \
--model-name facebook/convnext-tiny-224 --local-dir temp/convnext-tiny-224
# [...]
ValueError: The cross-loaded TensorFlow model has different outputs, something went wrong!
List of maximum output differences above the threshold (5e-05):
List of maximum hidden layer differences above the threshold (5e-05):
hidden_states[1]: 1.583e-04
hidden_states[2]: 1.480e-03
hidden_states[3]: 2.380e-03
hidden_states[4]: 1.595e-04 The hidden states are actually more out of range for the v1 model. Not sure what's going on there; I tried loading the model with fewer stages (as suggested in the earlier PR) but it runs into layer input/output shape issues and fails to load, and attempting to inspect layers via The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neggles Awesome work - thanks again for adding this model and for the detailed comments!
Regarding the difference in hidden states, this is indeed Expected™️ 😅. Often we see small differences creep in for certain layers e.g. batch norm which we're not able to reduce any further (without a lot more work). These then get amplified during the forward pass as the inputs get passed through the layers. Some models even have differences on the order of 1e-2 for the hidden layers! As long as the difference between the PT and TF model output logits are small ~1e-5 then it's OK.
@neggles For uploading the checkpoints, as long as the output logits have a small difference, you can override the checks. Let us know when you've pushed the TF checkpoints to the hub, or if you need help/permission to do so and then we can merge. |
@amyeroberts thanks for the help/feedback! TensorFlow may be a little old hat these days but, well, Cloud TPUs 🤷 OK, cool, I figured that was probably the case. I'm seeing about 1.75e-5 difference in output logits which seems reasonable enough to me; Anyway! Have opened PRs for conversion on the smaller models: facebook/convnextv2-atto-1k-224
|
Found it 🤦 missed a pair of |
I see most of the weight PRs have been merged, yay! I'm not sure what happened with the CI pipelines, though - looks like an unrelated error, but I don't have permissions to hit the rerun button 😢 @amyeroberts should I rebase this and push to make CI re-run? |
Sure rebasing is always a good thing’ Amy is out for a while, ping me whenever |
8116fce
to
82da736
Compare
Ok, the tf weights are missing on the HUB I asked for a merge of your PRs! 😉 |
TF weights are in! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Oops, forgot about this! Yes, TF weights are all in now - thanks much! - lemme resolve that conflict... |
82da736
to
50245b0
Compare
OK! Rebased one more time and did one other thing; I've dropped the Should be good to merge now, I think? Sorry for the delay this end! Been a busy few weeks. (ping @ArthurZucker 😄) |
Sure! I'll let @Rocketknight1 handle this! |
Hey @neggles, this looks good, but needs |
@Rocketknight1 So this is... weird. When i run
While CircleCI shows it complaining about these new models despite the fact that - as far as I can tell - the docstrings do match for both
Maybe something screwy with TensorFlow model init signature parsing? If i edit --- a/utils/check_docstrings.py
+++ b/utils/check_docstrings.py
@@ -1195,7 +1195,7 @@ def check_docstrings(overwrite: bool = False):
if overwrite:
fix_docstring(obj, old_doc, new_doc)
else:
- failures.append(name)
+ failures.append(name + "\n Corrected docstring:\n " + new_doc)
elif not overwrite and new_doc is not None and ("<fill_type>" in new_doc or "<fill_docstring>" in new_doc):
to_clean.append(name) I get this output:
Which is the same as what's currently in there, but without the I also note that both |
50245b0
to
319249a
Compare
Ah, ugh, this might indeed indicate an issue with the docstring parsing for that file, I didn't realize |
319249a
to
c4929f4
Compare
@Rocketknight1 OK, no problem - have added exclusions and comment in C'mon, tests! please? 🤞 |
Yay, tests passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed - overall this is super-clean and it's basically ready to merge now! I left two nits, but neither is blocking.
TFConvNextV2Model and TFConvNextV2ForImageClassification have docstrings which are equivalent to their PyTorch cousins, but a parsing issue prevents them from passing the test. Adding exclusions for these two classes as discussed in huggingface#25558.
c4929f4
to
8f954e5
Compare
Done and done! I've elected not to rebase again just to reduce the likelihood of tests getting angry 😅 |
This looks good now and I'm happy to merge! cc @amyeroberts - you don't need to do another review, but let me know if there's anything you think is unresolved before we merge this. |
Did a quick scan over that changes. All looks good to me. Thanks again @neggles for adding this model and for such a clean PR! |
* Add type annotations to TFConvNextDropPath * Use tf.debugging.assert_equal for TFConvNextEmbeddings shape check * Add TensorFlow implementation of ConvNeXTV2 * check_docstrings: add TFConvNextV2Model to exclusions TFConvNextV2Model and TFConvNextV2ForImageClassification have docstrings which are equivalent to their PyTorch cousins, but a parsing issue prevents them from passing the test. Adding exclusions for these two classes as discussed in huggingface#25558.
What does this PR do?
This adds TensorFlow support for ConvNeXTV2, following the pattern of the existing PyTorch ConvNeXTV2 implementation and the existing ConvNeXT(V1) TensorFlow implementation.
Before submitting
Tests are in,
make fixup
andmake quality
are happy, andNVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 py.test -vv tests/models/convnextv2/test_modeling_tf_convnextv2.py
passes everything except forfrom_pretrained
(unsurprisingly,"facebook/convnextv2-tiny-1k-224"
lacks TensorFlow weights, butfrom_pt=True
works swimmingly)Getting this one to pass tests was a little tricky, the outputs from the model were quite variable run-to-run. Still not entirely sure exactly what I did to fix it, but it looks like TensorFlow doesn't like it when you do this:
and wants you to do this instead:
🤷 who knows what cursed machinations the XLA compiler gets up to while nobody's looking.
There was a prior (seemingly abandoned) port attempt in #23155 which I referenced a little while building this; just to address one of the review comments on that PR,
config.layer_norm_eps
only applies to theTFConvNextV2MainLayer.layernorm
layer, not the other norm layers or the GRN layer, which are fixed at1e-6
epsilon (see the existing PyTorch implementation & original code). Using the (typically1e-12
) value fromconfig.layer_norm_eps
in those layers will produce aggressively incorrect outputs 😭Based on the PR template, it looks like I should tag @amyeroberts and maybe @alaradirik (majority holder of
git blame
for the PyTorch implementation)?