Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Properly checkpoint models trained with Apex #3875

Closed
JohnGiorgi opened this issue Feb 28, 2020 · 5 comments · Fixed by #3992
Closed

Properly checkpoint models trained with Apex #3875

JohnGiorgi opened this issue Feb 28, 2020 · 5 comments · Fixed by #3992
Assignees
Milestone

Comments

@JohnGiorgi
Copy link
Contributor

JohnGiorgi commented Feb 28, 2020

Is your feature request related to a problem? Please describe.

#3866 enables mixed precision training with Apex, but pushes the changes required to properly save and load a model trained with amp down the line. As discussed, I am simply opening this issue so I don't forget and will (hopefully) make a PR in the next couple of days.

Describe the solution you'd like

Follow the instructions in the Apex docs to checkpoint a model trained with amp.

Additional context

Apex docs.

@schmmd
Copy link
Member

schmmd commented Mar 6, 2020

Thanks @JohnGiorgi

@schmmd schmmd added this to the 1.0.0 milestone Mar 6, 2020
@schmmd
Copy link
Member

schmmd commented Mar 23, 2020

@JohnGiorgi are you still able to create a PR for this? We would love a contribution before our 1.0 release, otherwise this is unlikely to make it in.

@matt-gardner
Copy link
Contributor

I'm also trying to give time estimates for the remaining issues in the 1.0 milestone. Looking at the docs that @JohnGiorgi linked to, this is probably a very fast change, perhaps as little as an hour. But I'm giving it a "day" label, as there might be some complexity here that I'm missing.

@JohnGiorgi
Copy link
Contributor Author

JohnGiorgi commented Mar 23, 2020

Thanks for pinging me -- I do still plan on working on this, just got swamped lately. Hopefully I can finish this before the end of the week.

@schmmd
Copy link
Member

schmmd commented Mar 23, 2020

Cool, that'd be great! We'd love to have this in 1.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants