-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast merge #848
Fast merge #848
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wasade, looks great, thank you! One question.
list_list = [[r, c, v] for (r, c), v in collapsed_rcv.items()] | ||
|
||
return self.__class__(list_list, feature_order, sample_order) | ||
|
||
def merge(self, other, sample='union', observation='union', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be worth adding a flag to merge
to ignore metadata; this will be helpful if we wanted to force using the fast merge and our tables have metadata and we want to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ya, that's a really good idea. Will do
@wasade, thank you! Looks good! |
Added a (much) faster way to merge tables when the axes overlap.
Table.concat
is still (much) faster for the special case where one axis is assured to be disjoint. This revised merge method can operate on many tables at once, unlike the existingmerge
which took a single table.I don't have exact timing information, but the existing merge with four tables, each with roughly 600 samples and 1M features, was around 3 hours. This new method took a few minutes.