You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How would people feel about me adding a "select_all" option to duplicates / duplicated that returns all duplicated rows, not just all but first or all but last?
I often check to see if I have duplicates by some primary key, and if I do, I then want to look at ALL rows with the same primary key. Right now, duplicates / duplicated won't do so and I need to use the following hack:
you mean something like df[df.groupby('key').count()>1] ?
Seems like this particular situation isn't an uncommon use-case, and seems like a select_all option would really help make code readable. That's why programs like Stata offer this functionality (like the duplicates tag command).
How would people feel about me adding a "select_all" option to duplicates / duplicated that returns all duplicated rows, not just all but first or all but last?
I often check to see if I have duplicates by some primary key, and if I do, I then want to look at ALL rows with the same primary key. Right now, duplicates / duplicated won't do so and I need to use the following hack:
It seems I"m not the only person with this issue:
http://stackoverflow.com/questions/26244309/how-to-analyze-all-duplicate-entries-in-this-pandas-dataframe
http://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python
http://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas
Happy to write PR if people support.
The text was updated successfully, but these errors were encountered: