-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas (0.18) Rank: unexpected behavior for method = 'dense' and pct = True #15630
Comments
so all
you want something like this I suppose, note that the original definitions are from : https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.rankdata.html (though scipy doesn't do pct, so I guess this doesn't matter).
code is here: if you'd like to see what (if anything) this change would break. (not you cannot directly use |
@FXLab91 another option is to not allow |
@shoyer any thoughts |
I agree with @FXLab91 that this is very strange behavior, and I can't see why anyone would want it. So I would be inclined to treat it as a bug and fix it for the next release. |
Does this suggest we should rethink the pct behaviour of some of the others as well? Something like [1,2,2] will give the same pct results under both min and dense (1/3, 2/3, 2/3). |
@dsm054 surely! yep these are prob not tested at all. |
- `DataFrame.rank()` and `Series.rank()` when `method='dense'` and `pct=True` now scales to 100%. See pandas-dev#15630
May be a bit premature but I just worked through a possible solution that only touches |
- `DataFrame.rank()` and `Series.rank()` when `method='dense'` and `pct=True` now scales to 100%. See pandas-dev#15630
Restating @dsm054's question (and asking a few of my own), should all other As @dsm054 noted, Now if method='max', #15639 will fix the |
- `DataFrame.rank()` and `Series.rank()` when `method='dense'` and `pct=True` now scales to 100%. See pandas-dev#15630
- `DataFrame.rank()` and `Series.rank()` when `method='dense'` and `pct=True` now scales to 100%. See pandas-dev#15630
- `DataFrame.rank()` and `Series.rank()` when `method='dense'` and `pct=True` now scales to 100%. See pandas-dev#15630
- `DataFrame.rank()` and `Series.rank()` when `method='dense'` and `pct=True` now scales to 100%. See pandas-dev#15630
- `DataFrame.rank()` and `Series.rank()` when `method='dense'` and `pct=True` now scales to 100%. See pandas-dev#15630
I find the behavior of rank function with method = 'dense' and pct = True unexpected as it looks like, in order to calculate percentile ranks, the function is using the total number of observations instead of the number of distinct observations.
Code Sample, a copy-pastable example if possible
Problem description
Expected Output
Something similar to:
Also, I would expected the result above to be invariant to n_rep.
i.e. I would expect a "mapping" {value -> pct_rank} that would not depend on how many times the value is repeated, while it is not the case here.
The text was updated successfully, but these errors were encountered: