Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored splitter tests #176

Merged
merged 9 commits into from
Nov 7, 2018
Merged

Refactored splitter tests #176

merged 9 commits into from
Nov 7, 2018

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Nov 7, 2018

Related issues
Test code was too repetitive and was not testing splitter summary correctly.

Describe the proposed solution

  • Added SplitterSummaryAsserts for asserting SplitterSummary classes
  • Refactored tests for DataCutter, DataSplitter, DataBalancer and OpValidator to use SplitterSummaryAsserts
  • Minor idents and cleanups

Describe alternatives you've considered
NA

Additional context
TBD - also needs checks & tests for empty datasets - #161

@tovbinm tovbinm requested a review from marcovivero November 7, 2018 08:27
@tovbinm tovbinm requested a review from leahmcguire as a code owner November 7, 2018 08:27
@tovbinm tovbinm requested a review from kinfaikan November 7, 2018 08:38
@codecov
Copy link

codecov bot commented Nov 7, 2018

Codecov Report

Merging #176 into master will decrease coverage by 3.96%.
The diff coverage is 97.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #176      +/-   ##
==========================================
- Coverage   86.31%   82.35%   -3.97%     
==========================================
  Files         305      305              
  Lines        9977     9976       -1     
  Branches      342      547     +205     
==========================================
- Hits         8612     8216     -396     
- Misses       1365     1760     +395
Impacted Files Coverage Δ
...op/stages/impl/tuning/OpTrainValidationSplit.scala 100% <ø> (ø) ⬆️
...orce/op/stages/impl/tuning/OpCrossValidation.scala 97.67% <0%> (ø) ⬆️
...sforce/op/stages/impl/selector/ModelSelector.scala 98.18% <100%> (ø) ⬆️
...alesforce/op/stages/impl/tuning/DataBalancer.scala 84.76% <100%> (-11.43%) ⬇️
...salesforce/op/stages/impl/tuning/OpValidator.scala 94.11% <100%> (-0.09%) ⬇️
...alesforce/op/stages/impl/tuning/DataSplitter.scala 60% <100%> (ø) ⬆️
.../salesforce/op/stages/impl/tuning/DataCutter.scala 95.65% <100%> (ø) ⬆️
...ce/op/stages/base/sequence/SequenceEstimator.scala 0% <0%> (-100%) ⬇️
...sforce/op/stages/base/unary/UnaryTransformer.scala 0% <0%> (-100%) ⬇️
...p/stages/base/quaternary/QuaternaryEstimator.scala 0% <0%> (-100%) ⬇️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b57b8b9...860ebf4. Read the comment docs.

Copy link
Contributor

@marcovivero marcovivero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, o/w looks good to me!


Spec[OpCrossValidation[_, _]] should "stratify binary class data" in {
val splits = cv.createTrainValidationSplits(condition, binaryDS, label.name, balancer)
val balancer = new DataBalancer()
val splits = cv.createTrainValidationSplits(cvStratifyCondition, binaryDS, label.name, Some(balancer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: use Option

}

it should "stratify multi class data" in {
val splits = ts.createTrainValidationSplits(condition, multiDS, multiLabel.name, cutter)
val splits = ts.createTrainValidationSplits(tsStratifyCondition, multiDS, multiLabel.name, Some(new DataCutter()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Spec[OpTrainValidationSplit[_, _]] should "stratify binary class data" in {
val splits = ts.createTrainValidationSplits(condition, binaryDS, label.name, balancer)
val balancer = new DataBalancer()
val splits = ts.createTrainValidationSplits(tsStratifyCondition, binaryDS, label.name, Some(balancer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

}

it should "stratify multi class data" in {
val splits = cv.createTrainValidationSplits(condition, multiDS, multiLabel.name, cutter)
val splits = cv.createTrainValidationSplits(cvStratifyCondition, multiDS, multiLabel.name, Some(new DataCutter()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

stratifyCondition: Boolean,
dataset: Dataset[T],
label: String,
splitter: Option[Splitter]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, it's a private abstract method,

@tovbinm tovbinm merged commit 76cd1e3 into master Nov 7, 2018
@tovbinm tovbinm deleted the mt/op-validator branch November 7, 2018 23:15
ericwayman pushed a commit that referenced this pull request Feb 8, 2019
@salesforce-cla
Copy link

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Matthew Tovbin <m***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants