-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to split data into train/test/validate sets #149
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested some improvements. Main one is around allowing an arbitrary number of splits rather than just 2 or 3. I think we'd want to let people split things as they see fit, but lmk what you think.
@martham93, can you also update/add the relevant params to the docs? You can just follow the format of the other params Here's the relevant page Then you should be able to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martham93, minor changes to make, but looks good overall. Good work
@drewbo, want to have a quick once over whenever you're back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The PR to address the enhancement outlined in issue 147 introduces the option to split data into added
train/test/validate
of user specified sizes, this code still keeps the previous label-maker default of.8/.2
train/test
split.cc @wronk , @drewbo