-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for qwen + allow uninitialized weights in Llama model #8552
base: jz/export_qwen
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8552
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9b5516b with merge base 1858086 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
5ef3fef
to
c58edc5
Compare
c58edc5
to
955b991
Compare
try: | ||
# assign=True: load params/buffers by assignment instead of performing an in-place copy. | ||
# Because we are using device="meta", tensors do not have memory associated with them | ||
# and an in-place copy is a no-op. Use assign=True in load_state_dict for this scenario. | ||
missing, unexpected = self.model_.load_state_dict( | ||
checkpoint, | ||
strict=False, | ||
assign=True, | ||
) # self.model_ = Transformer(gptconf) | ||
except RuntimeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't error out when loading examples/models/llama/params/demo_rand_params.pth
or any checkpoint that is incompatible with the model architecture. We also have no way to not specify a checkpoint, I looked into removing the default val for that arg but it's going to take some work since it's relied on internally in a lot of places
try: | ||
# assign=True: load params/buffers by assignment instead of performing an in-place copy. | ||
# Because we are using device="meta", tensors do not have memory associated with them | ||
# and an in-place copy is a no-op. Use assign=True in load_state_dict for this scenario. | ||
missing, unexpected = self.model_.load_state_dict( | ||
checkpoint, | ||
strict=False, | ||
assign=True, | ||
) # self.model_ = Transformer(gptconf) | ||
except RuntimeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't error out when loading examples/models/llama/params/demo_rand_params.pth
or any checkpoint that is incompatible with the model architecture. We also have no way to not specify a checkpoint, I looked into removing the default val for that arg but it's going to take some work since it's relied on internally in a lot of places
EXECUTORCH_DEFINED_MODELS = [ | ||
"stories110m", | ||
"llama2", | ||
"llama3", | ||
"llama3_1", | ||
"llama3_2", | ||
"static_llama", | ||
"qwen2_5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I accidentally deleted the original comment about ordering, but I was going to say that I think this is clearer to list all the llama models first
47bed4c
to
9b5516b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the changes but I'm really concerned on llama/model.py and I think we should clean it up. I'll create a separate issue.
Summary
Add basic ci test for Qwen model. Requires some changes to
llama/model.py
to allow uninitialized (random) weights.Test plan