-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST : Adding new test case for pivot_table() with Categorical data #21381
Conversation
TST: add additional test cases for pivot_table with categorical data pandas-dev#21370
Hello @uds5501! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 10, 2018 at 20:48 Hours UTC |
Fixed PEP-8 issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will fail linting
Fixed linting
@uds5501 generally ensure that your code contributions follow the PEP8 coding standard before pushing to GitHub. You can find more info on how to do that in the contributing guide: https://pandas.pydata.org/pandas-docs/stable/contributing.html#python-pep8 |
fixed linting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test for the columns
/values
case in addition to the index
/columns
/values
case?
pandas/tests/reshape/test_pivot.py
Outdated
def test_pivot_with_non_observable_dropna_civ(self, dropna): | ||
# gh-21370 | ||
arr = [np.nan, 'low', 'high', 'low', 'high', 'high', np.nan] | ||
df = pd.DataFrame({"In": pd.Categorical(arr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this technically passes LINTing I find the indentation very difficult to read. Can you try putting each key of the dict at the start of the line to see if that improves readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, I will look into it
Attempt on making the code readable
@jschendel Actually, I don't know how to recreate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better - just some more minor style nits
pandas/tests/reshape/test_pivot.py
Outdated
# gh-21370 | ||
arr = [np.nan, 'low', 'high', 'low', np.nan] | ||
df = pd.DataFrame( | ||
{"In": pd.Categorical(arr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the angled bracket to the line above. Move the word arr
to the next line and move the subsequent kwargs up to that line accordingly (so long as they fit)
pandas/tests/reshape/test_pivot.py
Outdated
result = df.pivot_table(index="In", columns="Col", values="Val", | ||
dropna=dropna) | ||
expected = pd.DataFrame( | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this up a line
pandas/tests/reshape/test_pivot.py
Outdated
'B': [2.0, np.nan], | ||
'C': [np.nan, 3.0]}, | ||
index=pd.Index( | ||
pd.Categorical.from_codes([0, 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try moving [0, 1]
to the next line and moving the subsequent kwargs up
@WillAyd I have tried to fix these according to your suggestions. look any good? |
"In": pd.Categorical(arr, | ||
categories=['low', 'high'], | ||
ordered=True), | ||
"Col": ["A", "B", "C", "A", "B"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be a Categorical
with NaN
values present in order to recreate the issue.
Here's a slightly modified version of your setup to address my last comment of In [2]: pd.__version__
Out[2]: '0.24.0.dev0+85.g4807905'
In [3]: idx = [np.nan, 'low', 'high', 'low', np.nan]
...: col = [np.nan, 'A', 'B', np.nan, 'A']
...: df = pd.DataFrame({
...: "In": pd.Categorical(idx, categories=['low', 'high'], ordered=True),
...: "Col": pd.Categorical(col, categories=['A', 'B'], ordered=True),
...: "Val": range(1, 6)})
In [4]: df
Out[4]:
In Col Val
0 NaN NaN 1
1 low A 2
2 high B 3
3 low NaN 4
4 NaN A 5 Then doing both variations of In [5]: df.pivot_table(index="In", columns="Col", values="Val")
Out[5]:
Col A B
In
low 2.0 NaN
high NaN 3.0
In [6]: df.pivot_table(columns="Col", values="Val")
Out[6]:
Col A B
Val 3.5 3.0 So you'll need to create Just to verify, the code above does indeed replicate the buggy behavior on 0.23.0: In [2]: pd.__version__
Out[2]: '0.23.0'
In [3]: idx = [np.nan, 'low', 'high', 'low', np.nan]
...: col = [np.nan, 'A', 'B', np.nan, 'A']
...: df = pd.DataFrame({
...: "In": pd.Categorical(idx, categories=['low', 'high'], ordered=True),
...: "Col": pd.Categorical(col, categories=['A', 'B'], ordered=True),
...: "Val": range(1, 6)})
In [4]: df.pivot_table(index="In", columns="Col", values="Val")
Out[4]:
Col NaN A
In
NaN 2.0 NaN
low NaN 3.0
In [5]: df.pivot_table(columns="Col", values="Val")
Out[5]:
Col NaN A
Val 3.5 3.0 |
@jschendel Thank you for the example! I will start working on it right away 😃 |
doesn't this duplicate #21393 ? |
@jreback No sir. the aim of this PR is to add new Test Cases which include |
@uds5501 ok pls rebase and fixup |
closing as stale |
TST: add additional test cases for pivot_table with categorical data #21370
git diff upstream/master -u -- "*.py" | flake8 --diff
Added a new test case for checking
index/column/value
workability as per the patch offered by PR #21252Note : the _civ means Column,Index and Value in the test name (wasn't being very creative here)