You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using distributed or parallel set-up in script?: yes
Using GPU in script?: yes
GPU type: Tesla V100-SXM2-32GB
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
The hardcoded component to replace the key names in the loaded model needs better handling (See snippets below). I had named a few variable as beta and gamma in my layers. the from_pretrained function was replacing these names with bias and weights thus loaded model was performing differently.
Probably, raise a warning/error if these names are part of layer names in the architecture to avoid trouble at the later stage of development.
Thank you for bringing up this issue. Actually, it`s my first issue comment, I hope this will be help you.
You can use mapping based approach or import warnings :)
What do you think about these suggestion? Please give your feedback.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.42.0Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The hardcoded component to replace the key names in the loaded model needs better handling (See snippets below). I had named a few variable as
beta
andgamma
in my layers. thefrom_pretrained
function was replacing these names with bias and weights thus loaded model was performing differently.Probably, raise a
warning/error
if these names are part of layer names in the architecture to avoid trouble at the later stage of development.https://github.com/huggingface/transformers/blob/15bd3e61f8d3680ca472c9314ad07584d20f7b81/src/transformers/modeling_utils.py#L4338C1-L4358C19
Expected behavior
Loading of the pre-trained model should not raise missing/unexpected layer warnings.
The text was updated successfully, but these errors were encountered: