Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_datetime parses dates incorrectly when the format includes '%W' and does not include day of week plus calendar year #16774

Closed
wes-turner opened this issue Jun 26, 2017 · 4 comments · Fixed by #17819
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas good first issue
Milestone

Comments

@wes-turner
Copy link

wes-turner commented Jun 26, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

# I'm not sure how a bare week should be interpreted.  But the result should
# probably still be of the same week number
print(pd.to_datetime('20', format='%W').strftime('%W'))  # 01

# Parsing is broken even when the week is fully specified
print(pd.to_datetime('2017-20', format='%Y-%W').strftime('%Y-%W'))  # 2017-00

Problem description

to_datetime parses a week XX as 0 for any valid XX. Interestingly, an impossible XX (ex: '70') throws a ValueError as expected.

Expected Output

20
2017-20

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.4.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-79-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.24.1
numpy: 1.13.0
scipy: 0.18.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@wes-turner
Copy link
Author

In the docs for datetime.datetime.strptime:

When used with the strptime() method, %U and %W are only used in calculations when the day of the week and the calendar year (%Y) are specified.

Turns out this is true with to_datetime, too. It'd also be a totally-valid resolution to this issue to include this note in the to_datetime docstring.

@wes-turner wes-turner changed the title to_datetime parses dates incorrectly when the format includes '%W' to_datetime parses dates incorrectly when the format includes '%W' and does not include day of week Jun 26, 2017
@wes-turner wes-turner changed the title to_datetime parses dates incorrectly when the format includes '%W' and does not include day of week to_datetime parses dates incorrectly when the format includes '%W' and does not include day of week plus calendar year Jun 26, 2017
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 26, 2017

When specifying the day of the week, it indeed seems to work (my locale is set to nl_BE here):

In [83]: pd.to_datetime('2017-20 Zondag', format='%Y-%W %A')
Out[83]: Timestamp('2017-05-21 00:00:00')

@wes-turner
Copy link
Author

I should have phrased the bug more clearly: "to_datetime /only/ parses %W when both day and year are also specified."

Ideal behavior would be for a timestamp to be generated for, say, the beginning of the week (as a timestamp is analagously generated for the beginning of a day when no time is specified). Alternate behavior would be for to_datetime to refuse (like datetime.datetime.strptime) to parse such dates but to document this behavior.

@jreback
Copy link
Contributor

jreback commented Jun 28, 2017

I would raise on this if %W is passed but not also day and year.

note #16661/ #16607 as well which does this type of checking (for ISO week)

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Datetime Datetime data dtype labels Jun 28, 2017
@jreback jreback added this to the Next Major Release milestone Jun 28, 2017
reidy-p added a commit to reidy-p/pandas that referenced this issue Oct 8, 2017
@jreback jreback modified the milestones: Next Major Release, 0.21.1, 0.21.0 Oct 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants