-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: _convert_and_box_cache raise ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True #26097
BUG: _convert_and_box_cache raise ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True #26097
Conversation
Hello @anmyachev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-07-03 11:44:41 UTC |
always tests first! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs tests and a whatsnew note
Codecov Report
@@ Coverage Diff @@
## master #26097 +/- ##
==========================================
- Coverage 91.96% 91.95% -0.01%
==========================================
Files 175 175
Lines 52412 52414 +2
==========================================
- Hits 48199 48197 -2
- Misses 4213 4217 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26097 +/- ##
==========================================
- Coverage 91.87% 91.86% -0.02%
==========================================
Files 180 180
Lines 50927 50714 -213
==========================================
- Hits 46790 46589 -201
+ Misses 4137 4125 -12
Continue to review full report at Codecov.
|
@mroeschke we've replaced bandaid fix with the one that applies same boxing logic whenever caching is on or off. @jreback we've added a test which checks this case for both whatsnew entry will be added a bit later. |
Could you describe your fix? Was it not boxing in the correct location or was the error parameter inconsistent? |
d665747
to
7900437
Compare
There was different boxing logic for
So the fix is to align the boxing logic for both |
errors, convertor): | ||
arg = [convertor(date1), convertor(date2)] * 5 + suffix | ||
|
||
def _get_answer(cache): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should not catch this error, rather for errors='raise' you use an assert_raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is, errors='raise'
does not guarantee it raises - that is, if to_datetime
input is valid it won't raise.
What I wanted to verify with this test is that to_datetime()
behaviour is identical regardless of whether it had cache=True
or cache=False
.
For this particular test scenario I don't even care if, say, errors='ignore'
raises a ValueError
as long as it is raised consistently with and without cache. It is up to other tests to verify that errors
parameter is doing what it should, this test only checks for integrity over cache
.
P.S. Renamed this test to test_to_datetime_cache_invariance
to underline its purpose more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want a try/except in a test; that just obscures what the test is checking; as I said if you expect it to raise then make it explici.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried re-working try..except
away, but I feel that it's somewhat impossible unless I hardcode some of current Pandas behaviour in the test itself.
My current intentions are like this - pd.to_datetime()
must behave the same when cache=True
and cache=False
, where "same" means "both return pd.Index
" or "both raise same exception".
So far I'm struggling with removal of try..except
to cover this intention due to the fact that pd.to_datetime
can raise an exception not only when errors="raise"
(or it can not raise an exception if hypothesis randomly generates equal timezones).
Example:
import pandas as pd
from datetime import datetime as dt
import pytz
pd.to_datetime([dt(2000,1,1),dt(2000,1,1,tzinfo=pytz.utc)], errors='coerce')
results in:
Traceback (most recent call last):
File "somewhere\pandas\core\arrays\datetimes.py", line 1850, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas\_libs\tslibs\conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
raise ValueError('Cannot mix tz-aware with '
ValueError: Cannot mix tz-aware with tz-naive values
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "somewhere\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "somewhere\pandas\core\tools\datetimes.py", line 642, in to_datetime
result = convert_listlike(arg, box, format)
File "somewhere\pandas\core\tools\datetimes.py", line 333, in _convert_listlike_datetimes
allow_object=True)
File "somewhere\pandas\core\arrays\datetimes.py", line 1855, in objects_to_datetime64ns
raise e
File "somewhere\pandas\core\arrays\datetimes.py", line 1846, in objects_to_datetime64ns
require_iso8601=require_iso8601
File "pandas\_libs\tslib.pyx", line 460, in pandas._libs.tslib.array_to_datetime
cpdef array_to_datetime(ndarray[object] values, str errors='raise',
File "pandas\_libs\tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime
raise ValueError('Tz-aware datetime.datetime '
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said if you expect it to raise then make it explici.
All that I said above is to illustrate that I actually cannot expect it to raise or not to raise, as right now the behaviour of pd.to_datetime()
regarding of exceptions can be described as "chaotic good" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the error='raise' case to another test then. we never do this in a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about exceptions which might be raised when errors != "raise"
? I feel this is overextending what the test is checking - when I describe what it must raise or that it musn't raise I start checking behaviour of pd.to_datetime()
for certain parameters, while the only check I wanted is that it behaves the same way with cache on/off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that this code is very hard to grok when errors=‘raise’
I agree that both cache=True or False must raise in this case, but I want these in a separate test where you assert that they raise
the issue is that catching a ValueError in a test can accidentally catch other things and has multiple times in the past in some test cases which turned out badly
hence I don’t allow this ever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I get what you want, but I seem to fail to convey my own point.
What you describe is testing that to_datetime()
behaves correctly when given these or that parameter compositions. Those tests are needed, they're worthwhile, etc. But they're not what I am adding here. What I want to test is something like "when given random parameters, to_datetime()
must show identical results for cache=True
and cache=False
", so that turning caching on or off is merely a question of whether we have enough memory to speed up stuff (not a question of having a behaviour change)!
This check also verifies that potential bugs in the logic when it raises even if it shouldn't, or if it doesn't raise when it should, are treated equally by cache
. Finding those bugs is not the purpose of this particular test!
This includes e.g. cases when to_datetime
raises an exception when errors="coerce"
, and I don't think I should be trying to fix all the errors in this function within this PR.
All in all, in this test I'm not interested ever if to_datetime
does what it should. It might be shooting a rocket to the Moon for all I care, but it must be shooting same rocket to the same Moon both with cache on and off!
Or, more precisely here, it might be raising some random ValueError
, but, unless it raises different ValueError
s for on/off cache, I consider it fine. Again, checking to_datetime
correctness is not the point of this particular test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that said, would it be easier to understand if I extract this _get_answer
function to be a module-level helper instead of a nested function (with a proper docstring to boot)? Probably named like _to_datetime_noexcept
.
cd1701e
to
6b5227e
Compare
Note to reviewers: I've rebased off |
errors, convertor): | ||
arg = [convertor(date1), convertor(date2)] * 5 + suffix | ||
|
||
def _get_answer(cache): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want a try/except in a test; that just obscures what the test is checking; as I said if you expect it to raise then make it explici.
7dfa8d4
to
26a2594
Compare
errors, convertor): | ||
arg = [convertor(date1), convertor(date2)] * 5 + suffix | ||
|
||
def _get_answer(cache): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the error='raise' case to another test then. we never do this in a test
errors, convertor): | ||
arg = [convertor(date1), convertor(date2)] * 5 + suffix | ||
|
||
def _get_answer(cache): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that this code is very hard to grok when errors=‘raise’
I agree that both cache=True or False must raise in this case, but I want these in a separate test where you assert that they raise
the issue is that catching a ValueError in a test can accidentally catch other things and has multiple times in the past in some test cases which turned out badly
hence I don’t allow this ever
Another example of
I'm starting to feel that this discussion (and the test I've added) is unearthing something that is much bigger than initial error reported. |
can you merge master and update |
d1078e6
to
1cc469e
Compare
@jreback ping on green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ex a question
pandas/core/tools/datetimes.py
Outdated
return Index(result, name=name) | ||
else: | ||
return DatetimeIndex(result, name=name) | ||
return _box_as_indexlike(result, tz=None, name=name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmyachev do we have a test like this?
you are passing tz in the call below
lgtm. ping on green. |
@jreback ping on green |
thanks @anmyachev |
git diff upstream/master -u -- "*.py" | flake8 --diff
Bug was found there: #26043 (comment)
Reproduction: