You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From #39146 (comment) (discovered while investigating a benchmark difference). It seems that in groupby/ops.py, the fast_apply (using libreduction) vs the generic python apply gives a different result in case of same-indexed output of the function.
Using a small example dataframe and a function to be applied which simply copies the input:
N=10df=pd.DataFrame(
{
"key": np.random.randint(0, 3, size=N),
"value1": np.random.randn(N),
"value2": ["foo", "bar"] * (N//2),
}
)
defdf_copy_function(g):
# ensure that the group name is available (see GH #15062)g.namereturng.copy()
By default you get this result:
In [3]: df.groupby("key").apply(df_copy_function)
Out[3]:
key value1 value2
key
0 8 0 -0.149534 foo
9 0 -0.391135 bar
1 1 1 -0.581107 bar
2 1 -0.338278 foo
3 1 0.768924 bar
6 1 -0.778718 foo
2 0 2 0.196477 foo
4 2 -0.364822 foo
5 2 -0.976079 bar
7 2 -2.671668 bar
But if I trigger to not take the fast apply path (in this case by making one column an extension dtype), we get a different result:
In [4]: df['value2'] = df["value2"].astype("string")
In [5]: df.groupby("key").apply(df_copy_function)
Out[5]:
key value1 value2
0 2 0.196477 foo
1 1 -0.581107 bar
2 1 -0.338278 foo
3 1 0.768924 bar
4 2 -0.364822 foo
5 2 -0.976079 bar
6 1 -0.778718 foo
7 2 -2.671668 bar
8 0 -0.149534 foo
9 0 -0.391135 bar
This might be another manifestation of #34998 and the issues linked from that PR.
The text was updated successfully, but these errors were encountered:
From #39146 (comment) (discovered while investigating a benchmark difference). It seems that in groupby/ops.py, the
fast_apply
(using libreduction) vs the generic python apply gives a different result in case of same-indexed output of the function.Using a small example dataframe and a function to be applied which simply copies the input:
By default you get this result:
But if I trigger to not take the fast apply path (in this case by making one column an extension dtype), we get a different result:
This might be another manifestation of #34998 and the issues linked from that PR.
The text was updated successfully, but these errors were encountered: