Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiIndex Bug Copying Values Incorrectly When Adding Values To Index #22247

Closed
JonahJ opened this issue Aug 8, 2018 · 4 comments · Fixed by #53010
Closed

MultiIndex Bug Copying Values Incorrectly When Adding Values To Index #22247

JonahJ opened this issue Aug 8, 2018 · 4 comments · Fixed by #53010
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@JonahJ
Copy link

JonahJ commented Aug 8, 2018

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(
    [
        ['A', np.nan, 1.23, 4.56],
        ['A', 'G', 1.23, 4.56],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
    if necessary_value not in df.index.get_level_values('pivot_1').tolist():
        print("Missing", necessary_value)
        
        df.at[(pivot_0, necessary_value), 'col_2'] = 0.0

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Fails: value of 1.23 from the first row in the df is copied. As of v0.22.0 this was np.nan

Problem description

When using the MultiIndex features of pandas, when an np.nan is in the index when new values are added to the DF then the values are not np.nan, but copied from the np.nan row.

This behavior shows for all versions v0.23.x, however is fine for 0.22.0.

Expected Output

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Pass, works in v0.22.0

Note this unexpected behavior does not show when the np.nan is not included in the index, nor for a single Index.

MultiIndex without np.nan
df = pd.DataFrame(
    [
        #['A', np.nan, 1.23, 4.56],  # Comment out the np.nan
        ['A', 'G', 1.23, 4.56],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)

pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
    if necessary_value not in df.index.get_level_values('pivot_1').tolist():
        print "Missing", necessary_value
        
        df.at[(pivot_0, necessary_value), 'col_2'] = 0.0

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Pass
Single Index with np.nan
df = pd.DataFrame(
    [
        [np.nan, 1.23, 4.56],
        ['G', 1.23, 4.56],
        ['D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'col_1', 'col_2'],
)
df.set_index(['pivot_0'], inplace=True)

necessary_pivot_0_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_0_values:
    if necessary_value not in df.index.get_level_values('pivot_0').tolist():
        print "Missing", necessary_value
        
        df.at[(necessary_value), 'col_2'] = 0.0

assert df.loc[('F')]['col_2'] == 0.0
assert pd.isnull(df.loc[('F')]['col_1'])

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 0, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.26.1
numpy: 1.12.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@JonahJ JonahJ changed the title MultiIndex Bug Copying Values Incorrectly When Adding Values MultiIndex Bug Copying Values Incorrectly When Adding Values To Index Aug 8, 2018
@gfyoung gfyoung added MultiIndex Regression Functionality that used to work in a prior pandas version labels Aug 8, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 8, 2018

How odd! git bisect investigation (if needed), patch, and PR are welcome!

@gfyoung gfyoung added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Aug 8, 2018
@hksonngan
Copy link
Contributor

I catch KeyError at this line 656, this code commited 8cbee356.
That exception make the set_value function error at 103

@mroeschke
Copy link
Member

The asserts look to work on master. The example could use a distillation to see what could be used as a unit test

In [4]: df = pd.DataFrame(
   ...:     [
   ...:         ['A', np.nan, 1.23, 4.56],
   ...:         ['A', 'G', 1.23, 4.56],
   ...:         ['A', 'D', 9.87, 10.54],
   ...:     ],
   ...:     columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
   ...: )
   ...: df.set_index(['pivot_0', 'pivot_1'], inplace=True)
   ...: pivot_0 = 'A'
   ...: necessary_pivot_1_values = ['D', 'E', 'F' ]
   ...: for necessary_value in necessary_pivot_1_values:
   ...:     if necessary_value not in df.index.get_level_values('pivot_1').tolist():
   ...:         print("Missing", necessary_value)
   ...:
   ...:         df.at[(pivot_0, necessary_value), 'col_2'] = 0.0
   ...:
   ...: assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
   ...: assert pd.isnull(df.loc[('A', 'F')]['col_1'])
Missing E
Missing F

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Regression Functionality that used to work in a prior pandas version labels Jun 21, 2021
@shteken
Copy link
Contributor

shteken commented Apr 28, 2023

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment