-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: random partition #181
Comments
I can't think of any simple way. But if
No. Not without layering your own encoding on top of CSV. If you need to handle arbitrary CSV data, then using other command line tools won't work. If you can guarantee that all CSV records occupy a single line, then other line oriented tools would work okay. |
@plainas This may or may not help but a while ago I wrote a separate tool for doing this: https://github.com/sd2k/ttv You can compose it with |
@sd2k Neat tool, although it doesn't look like it correctly supports CSV data? I don't see any CSV parsing happening in that tool. (A single CSV record can span an arbitrary number of lines.) |
Ah, I misread the initial description. You're right, that tool is completely naive when it comes to nested newlines. It could potentially be 'upgraded' if there's a need for it! |
There definitely is :) |
Y'all might consider my suggested implementation strategy. There's really no need for a separate tool for the stated use case. That is, all you need to do is add random sorting to |
For those of us working machine learning, a feature to quickly divide the data set into training data and test data would be a really nice to have.
Is there a way to do this already?
I am tempted to use other command line tools to achieve this by partitioning lines rather than csv rows. Is there a way to escape new lines inside values so I ensure that each line of output is exactly one CSV row?
The text was updated successfully, but these errors were encountered: