Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add "select all" option for duplicates #10592

Closed
nickeubank opened this issue Jul 15, 2015 · 4 comments
Closed

ENH: Add "select all" option for duplicates #10592

nickeubank opened this issue Jul 15, 2015 · 4 comments

Comments

@nickeubank
Copy link
Contributor

How would people feel about me adding a "select_all" option to duplicates / duplicated that returns all duplicated rows, not just all but first or all but last?

I often check to see if I have duplicates by some primary key, and if I do, I then want to look at ALL rows with the same primary key. Right now, duplicates / duplicated won't do so and I need to use the following hack:

df[df.duplicated('key',take_last=True) | df.duplicated('key')]

It seems I"m not the only person with this issue:
http://stackoverflow.com/questions/26244309/how-to-analyze-all-duplicate-entries-in-this-pandas-dataframe
http://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python
http://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas

Happy to write PR if people support.

@jreback
Copy link
Contributor

jreback commented Jul 15, 2015

generally it makes more sense to groupby if you are going to work with duplicates other than the simpler cases that duplicated does

@nickeubank
Copy link
Contributor Author

you mean something like df[df.groupby('key').count()>1] ?

Seems like this particular situation isn't an uncommon use-case, and seems like a select_all option would really help make code readable. That's why programs like Stata offer this functionality (like the duplicates tag command).

@sinhrks
Copy link
Member

sinhrks commented Jul 15, 2015

I think this is covered in #10236.

@nickeubank
Copy link
Contributor Author

Yup, thanks @sinhrks ! Excited to see it merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants