TF implementation of RegNets #17554

ariG23498 · 2022-06-04T04:23:43Z

In this PR in which we (/w @sayakpaul) are proting the RegNets model into TensorFlow.

Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.

HuggingFaceDocBuilderDev · 2022-06-04T04:33:12Z

The documentation is not available anymore as the PR was closed or merged.

did not change the documentation yet, yet to try the playground on the model

* fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn.

sayakpaul · 2022-06-06T17:29:48Z

@amyeroberts

If we run the following:

from PIL import Image
import numpy as np

from src.transformers.models.regnet.modeling_tf_regnet import (
    TFRegNetForImageClassification
)
from transformers import AutoFeatureExtractor

def prepare_img():
    image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
    return image

feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/regnet-y-040")
model = TFRegNetForImageClassification.from_pretrained("facebook/regnet-y-040", from_pt=True)

image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="tf") 
outputs = model(**inputs, training=False)

print(outputs.logits.shape)

expected_slice = np.array([-0.4180, -1.5051, -3.4836])

np.testing.assert_allclose(outputs.logits[0, :3].numpy(), expected_slice, atol=1e-4)

First, it complains the moving_mean and moving_variance params are not loaded properly.

We tested your solution in #17571. With that, we're running into mismatches of num_batches_tracked and even moving_mean. It also complains about some of the mismatches stemming from the shortcut layer which wasn't the case for the earlier setup.

Do you have any thoughts?

amyeroberts · 2022-06-07T11:30:55Z

Hi @sayakpaul

Could you give a bit more information about the mismatches i.e. the printouts you're currently getting?

Regarding num_batches_tracked, I don't believe this parameter will ever be cross-loaded into a tf.keras.layers.BatchNormalization layer as there isn't an equivalent parameter. This is only important if the corresponding PyTorch batch norm layer doesn't have its momentum set c.f. param updates, which you'll need to verify for this model. I suggest looking at the implementations of both the TF and PyTorch layer to see when/if these differences are important. If the parameter is necessary, then I think one approach might be subclassing to build a new layer and include the parameter as a registered weight + any necessary logic to use it, but I'm not sure at the moment.

sayakpaul · 2022-06-07T11:36:12Z

I tried debugging this today but no luck yet. But here's some information for all of us to navigate this through:

Amending src/transformers/modeling_tf_pytorch_utils.py (following Add batchnorm running calc weight to porting script #17571) resulted in this: https://pastebin.com/0CZJmvzh.
num_batches_tracked is likely not needed, I don't suspect that to be a trained parameter anyway. However, happy to stand corrected.
But what is surprising is even after incorporating the changes from Add batchnorm running calc weight to porting script #17571 there's a complaint about moving_mean and moving_variance.
There's also a complaint about convolution params.

All these mismatches seem to be stemming from the layers.0 of RegNet stages. Mismatches stemming from other layers (layers.2 for example) are related to num_batches_tracked.

The test used to gather this information is the same one as mentioned in #17554 (comment).

@amyeroberts

amyeroberts · 2022-06-08T09:19:37Z

@sayakpaul Thanks for your detailed update. Comments below:

OK - thanks for posting that it really helps!
num_batches_tracked isn't trainable, but it is updated during training. As I mentioned above, if the layer has momentum set (it's not None) then you can ignore it. However, if momentum isn't set, then the layer uses num_batches_tracked to update the running_mean and running_var calculations, which are used during evaluation to normalize the batch. You can quickly check if the momentum is set for the batchnorm layers running something like all([x.momentum is not None for x in model.modules() if isinstance(x, nn.BatchNorm2d)]).
Looking at the printout you pasted above, it says All the weights of TFRegNetForImageClassification were initialized from the PyTorch model.. If this is the case, and some of the PyTorch weights weren't used, it makes me think some layers might be missing in your implementation. I would look at the two architectures and see if they differ anywhere.

sayakpaul · 2022-06-09T08:52:22Z

@amyeroberts a quick update:

momentum is actually not set. This is why we need to also retrieve num_batches_tracked too. We need to figure out a way to factor it in to use with layers.BatchNormalization in TensorFlow.
The TF model has a fewer number of params than the PT model so we'll look into why this is the case. One immediate reason would be the absence of num_batches_tracked. But that contributes a very small difference. We currently have 629440 fewer parameters in the TF model than the PT one.

amyeroberts · 2022-06-09T10:09:25Z

@sayakpaul Thanks for the update!

OK, this makes things a bit more difficult. Let me know if you want any help for this step. It's something that will likely need to be done in other PT -> TF ports so definitely valuable to the community if you added this!
It might be easier to print out the weight names instead of comparing number of parameters. The porting code works on the names, and so seeing where the two models differ can really help pinpoint what's happening. What I typically do is use the porting code to convert the tensorflow weight names and compare the two sets. For this model, it would look something like:

from transformers import RegNetForImageClassification
# import directly once __init__ files updated
from transformers.models.regnet.modeling_tf_regnet import TFRegNetForImageClassification 
from transformers.modeling_tf_pytorch_utils import convert_tf_weight_name_to_pt_weight_name

checkpoint = "facebook/regnet-y-040"
tf_model = TFRegNetForImageClassification.from_pretrained(checkpoint, from_pt=True)
pt_model = RegNetForImageClassification.from_pretrained(checkpoint)

tf_model_weights = set([convert_tf_weight_name_to_pt_weight_name(x.name)[0] for x in tf_model.trainable_variables])
pt_model_weights = set(pt_model.state_dict().keys())

print(tf_model_weights - pt_model_weights)
print(pt_model_weights - tf_model_weights)

sayakpaul · 2022-06-09T10:21:28Z

Thanks for the suggestions. Will try them out and update.

sayakpaul · 2022-06-10T02:59:21Z

@amyeroberts

I had to do a few minor modifications to your snippet in #17554 (comment):

tf_model_weights = set(
    [
        convert_tf_weight_name_to_pt_weight_name(x.name)[0]
        for x in tf_model.trainable_variables + tf_model.non_trainable_variables
    ]
)
pt_model_weights = set(pt_model.state_dict().keys())
tf_model_weights_new = set()

for name in tf_model_weights:
    if "moving_mean" in name:
        name = name.replace("moving_mean", "running_mean")
    elif "moving_variance" in name:
        name = name.replace("moving_variance", "running_var")
    tf_model_weights_new.add(name)


print(f"Differences in the TF model and PT model: {tf_model_weights_new - pt_model_weights}")
print(f"Differences in the PT model and TF model: {pt_model_weights - tf_model_weights_new}")
print(f"Total weights differing: {len(pt_model_weights - tf_model_weights_new)}")

convert_tf_weight_name_to_pt_weight_name() doesn't change the moving_mean and moving_variance to running_mean and running_var respectively. Instead, currently, it's handled here so that this query is successful.

With this change, the result of pt_model_weights - tf_model_weights_new is exactly matching with the complaint:

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRegNetForImageClassification ...

(Full output here).

I have gone over the modeling_tf_regnet.py script a couple of times but I don't yet know what I can do here. Let me know what you usually do when you have these differences.

sayakpaul · 2022-06-10T09:59:54Z

Also an oversight on my end in reporting momentum in #17554 (comment).

all([x.momentum is not None for x in model.modules() if isinstance(x, nn.BatchNorm2d)]) actually gives True which means it's okay to ignore num_batches_tracked.

sayakpaul · 2022-06-13T03:35:44Z

@amyeroberts we were able to rectify the model implementation and make it work. The integration test (mentioned in #17554 (comment)) is passing now.

The tests, however, are failing for a weird reason:

Parameter config in `TFRegNetModel(config)` should be an instance of class `PretrainedConfig`. To create a model from a pretrained model use `model = TFRegNetModel.from_pretrained(PRETRAINED_MODEL_NAME)`

Weird because we tested a couple of things in isolation:

from transformers import RegNetConfig

config_class = RegNetConfig()

print(f"RegNet Config class type: {type(config_class)}.")
print(f"RegNet Config is an instance of PretrainedConfig: {isinstance(config_class, PretrainedConfig)}")

The final print statement gives True. But when we do the following:

from src.transformers.models.regnet.modeling_tf_regnet import TFRegNetForImageClassification, TFRegNetModel

class_from_config = TFRegNetModel(config_class)
print("Model class from config was initialized.")

it complains:

Parameter config in `TFRegNetModel(config)` should be an instance of class `PretrainedConfig`. To create a model from a pretrained model use `model = TFRegNetModel.from_pretrained(PRETRAINED_MODEL_NAME)`

Do you have any suggestions for this?

Feat/tf regnets

sayakpaul · 2022-06-16T13:48:01Z

@sgugger @Rocketknight1 the PR is now ready for review.

This particular model actually has the largest vision model checkpoint available to date: https://huggingface.co/facebook/regnet-y-10b-seer. It's still in PyTorch and the corresponding model makes use of the low_cpu_usage argument.

I had a chat with @Rocketknight1 a few days back on the possibility of supporting this checkpoint in TensorFlow too. This will require tweaks and they will be contributed in a separate PR.

NielsRogge · 2022-06-23T09:33:39Z