Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New behaviour regarding inplace values setting with iloc #47381

Closed
glemaitre opened this issue Jun 16, 2022 · 10 comments
Closed

New behaviour regarding inplace values setting with iloc #47381

glemaitre opened this issue Jun 16, 2022 · 10 comments
Labels
Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@glemaitre
Copy link
Contributor

In scikit-learn, when testing the pandas nightly build, we got a FutureWarning related to the following deprecation:

https://pandas.pydata.org/docs/dev/whatsnew/v1.5.0.html#try-operating-inplace-when-setting-values-with-loc-and-iloc

We have 2 related questions regarding this deprecation. First, it seems that we cannot reproduce the "Old Behaviour" with the latest available release:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: pd.__version__
Out[3]: '1.4.2'

In [4]: values = np.arange(4).reshape(2, 2)
   ...: 
   ...: df = pd.DataFrame(values)
   ...: 
   ...: ser = df[0]

In [5]: df.iloc[:, 0] = np.array([10, 11])

In [6]: ser
Out[6]: 
0    10
1    11
Name: 0, dtype: int64

Is there a reason for not spotting the behaviour shown in the documentation?

The second question (actually it is more a comment to open a discussion) concerns the proposed fix.

It is proposed to use df[df.columns[i]] = newvals instead of the df.iloc[:, i] = newvals.
I personally find this way a bit counterintuitive since the SettingWithCopyWarning proposes to change to df.loc[rows, cols] instead of df[cols][rows] to get the inplace behaviour.

If we consider that both approaches intend for an inplace change, the patterns used for "by position" (i.e. .iloc) or "by label" (i.e. .loc) are really different.

@phofl
Copy link
Member

phofl commented Jun 16, 2022

Hi @glemaitre,

thanks for the report.

values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values)
ser = df[0]
df.iloc[:, 0] = np.array([10, 11])

This should show the warning, correct? I can neither reproduce on main nor with our nightly build. Could you share a link to one of your builds?

@glemaitre
Copy link
Contributor Author

Yes in pandas 1.5.0.dev0, it is raising the warning as in this build (you can check the last warning in the stack).

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=43315&view=logs&j=dfe99b15-50db-5d7b-b1e9-4105c42527cf&t=ef785ae2-496b-5b02-9f0e-07a6c3ab3081&l=309511

What I don't get from the example above is that the behaviour does not seem to change between 1.4.2 and 1.5.0.dev0 apart from the additional warning.

@phofl
Copy link
Member

phofl commented Jun 16, 2022

I think this warning is shown because of the dtype missmatch in your test (e.g. float16 and float32). But will have to look more closely later

@phofl phofl added Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 16, 2022
@glemaitre
Copy link
Contributor Author

I think this warning is shown because of the dtype missmatch in your test (e.g. float16 and float32). But will have to look more closely later

OK I will check the other occurrence of the warning as well.

So the documentation example should trigger a change of data type to illustrate the change of behaviour then? I mean, it should look like:

values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values)
ser = df[0]
df.iloc[:, 0] = np.array([10, 11]).astype(np.int16)

then the output of ser is indeed the original series [0, 2].

@phofl
Copy link
Member

phofl commented Jun 16, 2022

The warning is about a change in the future. We want to operate inplace there, hence we would update ser too. If you want to avoid that, you should use the regular setitem.

Same dtypes are already operating inplace, so you do not get the warning.

The deprecation was added in #45333

@lesteve
Copy link
Contributor

lesteve commented Jun 20, 2022

It seems like when using .iloc on an empty dataframe, you get a similar warning which could maybe be avoided?

import numpy as np
import pandas as pd

arr = np.arange(6).reshape(3, 2).astype(np.float64)
df_orig = pd.DataFrame(arr, columns=['a', 'b'])
df_new = df_orig.iloc[[], :].copy()
df_new.iloc[:, 0] = np.array([1, 2, 4], dtype=np.float64)

I created #47433 to improve the whats_new entry.

@jorisvandenbossche
Copy link
Member

It seems like when using .iloc on an empty dataframe, you get a similar warning which could maybe be avoided?

Indeed, in this case, there is an "enlargement" of the dataframe when setting the values, and that can never be done inplace, and so we should avoid the warning in that case.

@lesteve
Copy link
Contributor

lesteve commented Jun 22, 2022

Indeed, in this case, there is an "enlargement" of the dataframe when setting the values, and that can never be done inplace, and so we should avoid the warning in that case.

Sounds great!

@lesteve
Copy link
Contributor

lesteve commented Jul 18, 2022

I think this one can be closed since #47433 has made the what's new entry clearer and #47621 has removed the warnings in the unwanted edge cases.

@glemaitre
Copy link
Contributor Author

Indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

5 participants