Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handeling of hardcoded component in PretrainedModel.from_pretrained. #35617

Closed
1 of 4 tasks
princethewinner opened this issue Jan 10, 2025 · 3 comments
Closed
1 of 4 tasks
Labels

Comments

@princethewinner
Copy link

System Info

  • transformers version: 4.42.0
  • Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.2
  • Accelerate version: 0.28.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: yes
  • Using GPU in script?: yes
  • GPU type: Tesla V100-SXM2-32GB

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The hardcoded component to replace the key names in the loaded model needs better handling (See snippets below). I had named a few variable as beta and gamma in my layers. the from_pretrained function was replacing these names with bias and weights thus loaded model was performing differently.

Probably, raise a warning/error if these names are part of layer names in the architecture to avoid trouble at the later stage of development.

https://github.com/huggingface/transformers/blob/15bd3e61f8d3680ca472c9314ad07584d20f7b81/src/transformers/modeling_utils.py#L4338C1-L4358C19

    @staticmethod
    def _fix_state_dict_key_on_load(key):
        """Replace legacy parameter names with their modern equivalents. E.g. beta -> bias, gamma -> weight."""

        if "beta" in key:
            return key.replace("beta", "bias")
        if "gamma" in key:
            return key.replace("gamma", "weight")

        # to avoid logging parametrized weight norm renaming
        if hasattr(nn.utils.parametrizations, "weight_norm"):
            if "weight_g" in key:
                return key.replace("weight_g", "parametrizations.weight.original0")
            if "weight_v" in key:
                return key.replace("weight_v", "parametrizations.weight.original1")
        else:
            if "parametrizations.weight.original0" in key:
                return key.replace("parametrizations.weight.original0", "weight_g")
            if "parametrizations.weight.original1" in key:
                return key.replace("parametrizations.weight.original1", "weight_v")
        return key

Expected behavior

Loading of the pre-trained model should not raise missing/unexpected layer warnings.

@Lala2398
Copy link

Thank you for bringing up this issue. Actually, it`s my first issue comment, I hope this will be help you.
You can use mapping based approach or import warnings :)
What do you think about these suggestion? Please give your feedback.

@Rocketknight1
Copy link
Member

Hi @princethewinner, we have a PR open to fix this at #35615!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants