Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parameter validation specific to TPU Training plugins #7415

Merged
merged 2 commits into from
May 24, 2021

Conversation

kaushikb11
Copy link
Contributor

@kaushikb11 kaushikb11 commented May 7, 2021

What does this PR do?

Follow up to #5441

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented May 7, 2021

Codecov Report

Merging #7415 (c32227a) into master (2103b5e) will decrease coverage by 5%.
The diff coverage is 57%.

@@           Coverage Diff           @@
##           master   #7415    +/-   ##
=======================================
- Coverage      93%     88%    -5%     
=======================================
  Files         200     200            
  Lines       12962   12966     +4     
=======================================
- Hits        11998   11377   -621     
- Misses        964    1589   +625     

@kaushikb11 kaushikb11 marked this pull request as ready for review May 12, 2021 03:32
@kaushikb11 kaushikb11 self-assigned this May 12, 2021
@kaushikb11 kaushikb11 added the accelerator: tpu Tensor Processing Unit label May 12, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@kaushikb11 kaushikb11 added the ready PRs ready to be merged label May 14, 2021
@lezwon
Copy link
Contributor

lezwon commented May 15, 2021

@kaushikb11 the TPU tests are being skipped. Probably the TPU device is not being detected :)


# model = Model()
# trainer = Trainer(checkpoint_callback=True, max_epochs=1, tpu_cores=1)
def on_post_move_to_device(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this check slightly smarter but checking parameters names ?

If I do self.layer_3.weight = self.layer_1.weight in the init function and mess up and do self.layer_3.weight = self.layer_2.weight, I won't get a warning but tying is different. Ideally it would be great to explicitly tell which weights are shared or do it automatically for the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, will follow up.

@tchaton tchaton self-requested a review May 17, 2021 08:03
@@ -171,6 +172,7 @@ def new_process(self, process_idx: int, trainer, mp_queue) -> None:
if self.global_rank == 0:
time.sleep(2)

@parameter_validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how sow is this?

@Borda Borda enabled auto-merge (squash) May 19, 2021 19:00
@@ -71,12 +71,11 @@ def auto_transfer_args(self, *args, **kwargs):

def parameter_validation(fn: Callable) -> Callable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now that you changed the decorator target to self.model, this decorator may no longer fit very well into core/decorators because it is basically now specific to the plugin having the attribute self.model.
What do you think about moving it?

Just for consideration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @awaelchli 's suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Will do a follow-up PR for this.

@Borda Borda merged commit 3f460b1 into Lightning-AI:master May 24, 2021
@Borda Borda mentioned this pull request May 25, 2021
@awaelchli awaelchli added this to the v1.3.x milestone May 26, 2021
awaelchli pushed a commit that referenced this pull request May 26, 2021
* Move parameter validation specific to TPU Training plugins

* update docstring
Borda pushed a commit that referenced this pull request May 26, 2021
* Move parameter validation specific to TPU Training plugins

* update docstring
lexierule pushed a commit that referenced this pull request May 26, 2021
* Move parameter validation specific to TPU Training plugins

* update docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants