-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043) #2038
Conversation
DeepSpeech.py
Outdated
set_loss = run_set('dev', epoch, init_op, dataset=csv) | ||
dev_loss += set_loss | ||
log_progress('Finished validating epoch %d on %s - loss: %f' % (epoch, csv, set_loss)) | ||
dev_loss = dev_loss / len(dev_csvs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does diving by len(dev_csvs)
give the same result as before this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if the sizes of the dev sets are unbalanced then the mean of sample means will not match the mean of the whole population. We can get to the same result by doing a weighted average if we know the sample size, will have to return that value from the run_set
function since the dataset size is only known after a full iteration has been done through it. I don't expect the difference to be large but I'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know the train-time implications of not having cached features for validation and testing?
Validation epochs are normally so fast that I don't think I've ever actually cached them in runs, so I can't give you numbers, but that goes to show that it probably does not make a huge difference :) As for test epochs, the acoustic model prediction is usually done in the same time as the validation sets, it's the decoding that takes the majority of the time. |
9076adf
to
946828c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but before merging all checks should pass.
All green. |
@kdavis-mozilla review ping |
@reuben As I mentioned I dont' have time to do the review. If tilman gave it a r+, I'm fine with that. |
2351bf1
to
904ab1e
Compare
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
As a consequence this also removes support for caching to disk the processed validation and test sets. They're usually small enough that caching to disk makes very little difference, and the complexity of keeping track of multiple cache paths for each CSV is not worth it IMO.
Sorry for the continued churn on the progress bar/logging part of the code. This PR also includes some cleanup of that code using
progressbar.NullBar
so that there's now a single check for the--show_progressbar
flag. With this PR, the log with and without progress bars should be consistent in the information it provides.I recommend disabling whitespace changes for reviewing the last commit that touches
evaluate.py
, otherwise the diff ends up looking like a mess.