Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assignment to a .loc view of a naive datetime column changes its dtype to object #49837

Closed
2 of 3 tasks
Tracked by #3
Terseus opened this issue Nov 22, 2022 · 3 comments · Fixed by #50037
Closed
2 of 3 tasks
Tracked by #3
Assignees
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions

Comments

@Terseus
Copy link

Terseus commented Nov 22, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from datetime import datetime

import pandas as pd


def main():
    df = pd.DataFrame({
        'field': pd.to_datetime([datetime(2022, 1, 20), datetime(2022, 1, 22)]),
        'update': [True, False],
    })
    print("Before:", df.info())
    print(df.head(2))
    print()
    df_to_update = df[df['update']]
    df.loc[df['update'], ['field']] = df_to_update['field']
    print("After:", df.info())
    print(df.head(2))


if __name__ == "__main__":
    main()

Issue Description

When doing a partial assignment to a view created by .loc[predicate, [column]] in a column with dtype datetime64[ns] (naive datetime) the column dtype changes to object and the datetimes assigned are represented as floats.

With non-naive datetimes it works as expected, maintaining the dtype as datetime64[ns, timezone].

The reproducible example below prints the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns](1)
memory usage: 146.0 bytes
Before: None
       field  update
0 2022-01-20    True
1 2022-01-22   False

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      object
 1   update  2 non-null      bool
dtypes: bool(1), object(1)
memory usage: 146.0+ bytes
After: None
                 field  update
0  1642636800000000000    True
1  2022-01-22 00:00:00   False

Expected Behavior

The assignment shouldn't change the values nor the dtype of the column.

As an example, see what's shown by the reproducible example when we add tzinfo=timezone.utc to the values:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns, UTC]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
Before: None
                      field  update
0 2022-01-20 00:00:00+00:00    True
1 2022-01-22 00:00:00+00:00   False

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns, UTC]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
After: None
                      field  update
0 2022-01-20 00:00:00+00:00    True
1 2022-01-22 00:00:00+00:00   False

As you can see, with a timezone the column doesn't change the dtype and the values are interpreted as datetimes, not floats.

Installed Versions

❯ python
Python 3.8.14 (default, Oct 10 2022, 16:44:50)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import pandas as pd
pd>>> pd.show_versions()

INSTALLED VERSIONS

commit : 91111fd
python : 3.8.14.final.0
python-bits : 64
OS : Linux
OS-release : 5.19.13-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Tue, 04 Oct 2022 14:36:58 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : es_ES.UTF-8
LOCALE : es_ES.UTF-8

pandas : 1.5.1
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 56.0.0
pip : 22.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@Terseus Terseus added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 22, 2022
@phofl
Copy link
Member

phofl commented Nov 22, 2022

Hi, thanks for your report. This works on main, could use a test

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions Needs Tests Unit test(s) needed to prevent regressions and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 22, 2022
@phofl phofl self-assigned this Nov 22, 2022
@MarcoGorelli
Copy link
Member

Just for reference, looks like this was fixed by #49161 (good one @phofl !)

https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=112905556

@alonme
Copy link
Contributor

alonme commented Dec 4, 2022

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants