Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running validation every fixed number of examples #348

Closed
ehhuang opened this issue Oct 9, 2019 · 3 comments · Fixed by #405
Closed

Support running validation every fixed number of examples #348

ehhuang opened this issue Oct 9, 2019 · 3 comments · Fixed by #405
Labels
feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@ehhuang
Copy link

ehhuang commented Oct 9, 2019

Is your feature request related to a problem? Please describe.
For IterableDataset, we may not know the length of the dataset in advance. Running validation every X examples would be helpful.

@ehhuang ehhuang added feature Is an improvement or enhancement help wanted Open to be worked on labels Oct 9, 2019
@fellnerse
Copy link

I really would love this! Any updates on that?

@Ir1d
Copy link
Contributor

Ir1d commented Oct 18, 2019

👍 on this, I guess it should be easy to implement with a Hook

@fellnerse
Copy link

fellnerse commented Oct 21, 2019

I'm able to do something similar that by setting up the trainer like this:

trainer = Trainer(
    ...,
    val_percent_check=kwargs["val_check_interval"],
    val_check_interval=kwargs["val_check_interval"],
)

If you set val_check_interval to (batch_size / len(train_dataset)) * nb_batches validating every nb_batches works. Just be sure to make the DataLoaders you use shuffle the data, otherwise the same data will be used over and over again for validation.

Also the ModelCheckpoint-call back has to be adjusted:

checkpoint_callback = ModelCheckpoint(
    ...,
    period=1 / kwargs["val_check_interval"],
)

Early-stopping seems not be influenced by the validation frequency.

Only caveat is that the validation batches are sampled now randomly from the validation dataset, and it's not guaranteed that all data is used after one epoch. Not too sure if that's an issue tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants