-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd.NA in object dtype #32931
Comments
It looks like pd.NA is dropped entirely when concatenating two dataframes with object dtype, and this is very similar to #33065:
This seems to be due to If the Would this work? |
Not quite related -- I am surprised that
Are these working as designed, already being ironed out, or should I open issues? df = pd.DataFrame([7,8,9,pd.NA])
print(df)
print('auto ', df.dtypes)
print('auto convert', df.convert_dtypes().dtypes)
dfi = pd.DataFrame([7,8,9,pd.NA], dtype='Int64')
print(dfi)
print('Int64 ', dfi.dtypes)
print('Int64 convert', dfi.convert_dtypes().dtypes)
print('pandas', pd.__version__) gives: 0
0 7
1 8
2 9
3 <NA>
auto 0 object
dtype: object
auto convert 0 object
dtype: object
0
0 7
1 8
2 9
3 <NA>
Int64 0 Int64
dtype: object
Int64 convert 0 Int64
dtype: object
pandas 1.0.3 |
since pd.NA is experimental, changing the constructor to default to the best possible dtypes using dtypes supporting pd.NA seems reasonable.
from https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.convert_dtypes.html
so again these seems to be a reasonable expectation. however, from https://pandas.pydata.org/docs/dev/user_guide/missing_data.html?highlight=convert_dtypes#conversion
so it maybe that the convert_dtypes docstring should also be more explicit about the conversion applies to np.nan
These two issues could be discussed/addressed independently, so if you could report these as two independent issues, that'll be great. |
I just hit this bug:
Expected
but got:
Maybe we need a dtype |
extract from #32075 (comment)
If we want to handle pd.NA in object dtype better, we will need to start using masks as well, and not rely on numpy behaviour.
For example, also this is wrong:
Related issues:
The text was updated successfully, but these errors were encountered: