ENH: Add "select all" option for duplicates #10592

nickeubank · 2015-07-15T21:29:31Z

How would people feel about me adding a "select_all" option to duplicates / duplicated that returns all duplicated rows, not just all but first or all but last?

I often check to see if I have duplicates by some primary key, and if I do, I then want to look at ALL rows with the same primary key. Right now, duplicates / duplicated won't do so and I need to use the following hack:

df[df.duplicated('key',take_last=True) | df.duplicated('key')]

It seems I"m not the only person with this issue:
http://stackoverflow.com/questions/26244309/how-to-analyze-all-duplicate-entries-in-this-pandas-dataframe
http://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python
http://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas

Happy to write PR if people support.

The text was updated successfully, but these errors were encountered:

jreback · 2015-07-15T21:33:13Z

generally it makes more sense to groupby if you are going to work with duplicates other than the simpler cases that duplicated does

nickeubank · 2015-07-15T21:46:32Z

you mean something like df[df.groupby('key').count()>1] ?

Seems like this particular situation isn't an uncommon use-case, and seems like a select_all option would really help make code readable. That's why programs like Stata offer this functionality (like the duplicates tag command).

sinhrks · 2015-07-15T22:04:34Z

I think this is covered in #10236.

nickeubank · 2015-07-15T22:05:32Z

Yup, thanks @sinhrks ! Excited to see it merged!

nickeubank closed this as completed Jul 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add "select all" option for duplicates #10592

ENH: Add "select all" option for duplicates #10592

nickeubank commented Jul 15, 2015

jreback commented Jul 15, 2015

nickeubank commented Jul 15, 2015

sinhrks commented Jul 15, 2015

nickeubank commented Jul 15, 2015

ENH: Add "select all" option for duplicates #10592

ENH: Add "select all" option for duplicates #10592

Comments

nickeubank commented Jul 15, 2015

jreback commented Jul 15, 2015

nickeubank commented Jul 15, 2015

sinhrks commented Jul 15, 2015

nickeubank commented Jul 15, 2015