-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use drop_duplicates() instead of groupby (about 1.5~2x faster) #1617
Use drop_duplicates() instead of groupby (about 1.5~2x faster) #1617
Conversation
Hi @rightx2. Thanks for your PR. I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
Codecov Report
@@ Coverage Diff @@
## master #1617 +/- ##
==========================================
- Coverage 83.64% 83.64% -0.01%
==========================================
Files 67 67
Lines 5816 5814 -2
==========================================
- Hits 4865 4863 -2
Misses 951 951
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
/retest |
Signed-off-by: rightx2 <rightx2@gmail.com>
ec4c15d
to
ed34cdf
Compare
/lgtm |
/assign @woop |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably gotta approve again after being added to the OWNERS file
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: achals, rightx2 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Use drop_duplicates() instead of groupby (about 1.5~2x faster) Signed-off-by: rightx2 <rightx2@gmail.com> * Lint Signed-off-by: rightx2 <rightx2@gmail.com>
What this PR does / why we need it:
df.drop_duplicates()
is much faster thangroupby() + reset_index()
.You can test it with the below codes (You can change the number of unique number for each column):
Which issue(s) this PR fixes:
No issues related. It's sort of a little performance improvement
Does this PR introduce a user-facing change?: