Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043) #2038

reuben · 2019-04-10T20:24:42Z

As a consequence this also removes support for caching to disk the processed validation and test sets. They're usually small enough that caching to disk makes very little difference, and the complexity of keeping track of multiple cache paths for each CSV is not worth it IMO.

Sorry for the continued churn on the progress bar/logging part of the code. This PR also includes some cleanup of that code using progressbar.NullBar so that there's now a single check for the --show_progressbar flag. With this PR, the log with and without progress bars should be consistent in the information it provides.

I recommend disabling whitespace changes for reviewing the last commit that touches evaluate.py, otherwise the diff ends up looking like a mess.

kdavis-mozilla · 2019-04-11T08:08:04Z

DeepSpeech.py

+                        set_loss = run_set('dev', epoch, init_op, dataset=csv)
+                        dev_loss += set_loss
+                        log_progress('Finished validating epoch %d on %s - loss: %f' % (epoch, csv, set_loss))
+                    dev_loss = dev_loss / len(dev_csvs)


Does diving by len(dev_csvs) give the same result as before this PR?

No, if the sizes of the dev sets are unbalanced then the mean of sample means will not match the mean of the whole population. We can get to the same result by doing a weighted average if we know the sample size, will have to return that value from the run_set function since the dataset size is only known after a full iteration has been done through it. I don't expect the difference to be large but I'll fix this.

tilmankamp

Do we know the train-time implications of not having cached features for validation and testing?

evaluate.py

DeepSpeech.py

util/flags.py

reuben · 2019-04-11T10:20:42Z

Validation epochs are normally so fast that I don't think I've ever actually cached them in runs, so I can't give you numbers, but that goes to show that it probably does not make a huge difference :)

As for test epochs, the acoustic model prediction is usually done in the same time as the validation sets, it's the decoding that takes the majority of the time.

tilmankamp

LGTM, but before merging all checks should pass.

reuben · 2019-04-12T14:07:51Z

All green.

reuben · 2019-04-15T20:22:51Z

@kdavis-mozilla review ping

kdavis-mozilla · 2019-04-16T07:35:37Z

@reuben As I mentioned I dont' have time to do the review. If tilman gave it a r+, I'm fine with that.

lock · 2019-05-16T15:26:41Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

reuben requested a review from tilmankamp April 10, 2019 20:24

reuben changed the title ~~Perform separate validation and test epochs per dataset when multiple files are specified~~ Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634) Apr 10, 2019

kdavis-mozilla reviewed Apr 11, 2019

View reviewed changes

tilmankamp suggested changes Apr 11, 2019

View reviewed changes

evaluate.py Outdated Show resolved Hide resolved

DeepSpeech.py Outdated Show resolved Hide resolved

util/flags.py Outdated Show resolved Hide resolved

reuben force-pushed the split-dev-test-epochs branch from 9076adf to 946828c Compare April 11, 2019 14:09

reuben requested review from tilmankamp and kdavis-mozilla April 11, 2019 18:43

reuben changed the title ~~Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634)~~ Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043) Apr 11, 2019

tilmankamp approved these changes Apr 12, 2019

View reviewed changes

kdavis-mozilla removed their request for review April 16, 2019 08:01

reuben added 6 commits April 16, 2019 11:01

Do separate validation epochs if multiple input files are specified

a85af3d

Log total optimization time

58e9b1a

Do separate test epochs if multiple input files are specified

911a1ce

Compute weighted average of individual dev set losses

bfa070e

Rename --train_cached_features_path to --feature_cache

9586fbb

Centralize progress logging and progress bar logic

904ab1e

reuben force-pushed the split-dev-test-epochs branch from 2351bf1 to 904ab1e Compare April 16, 2019 14:07

reuben merged commit 1e601d5 into master Apr 16, 2019

reuben deleted the split-dev-test-epochs branch April 16, 2019 15:23

lock bot locked and limited conversation to collaborators May 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043) #2038

Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043) #2038

reuben commented Apr 10, 2019

kdavis-mozilla Apr 11, 2019

reuben Apr 11, 2019

reuben Apr 11, 2019

tilmankamp left a comment

reuben commented Apr 11, 2019

tilmankamp left a comment

reuben commented Apr 12, 2019

reuben commented Apr 15, 2019

kdavis-mozilla commented Apr 16, 2019

lock bot commented May 16, 2019

Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043) #2038

Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043) #2038

Conversation

reuben commented Apr 10, 2019

kdavis-mozilla Apr 11, 2019

Choose a reason for hiding this comment

reuben Apr 11, 2019

Choose a reason for hiding this comment

reuben Apr 11, 2019

Choose a reason for hiding this comment

tilmankamp left a comment

Choose a reason for hiding this comment

reuben commented Apr 11, 2019

tilmankamp left a comment

Choose a reason for hiding this comment

reuben commented Apr 12, 2019

reuben commented Apr 15, 2019

kdavis-mozilla commented Apr 16, 2019

lock bot commented May 16, 2019