-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-42663: Fix parsing TZ strings in zoneinfo module #23825
bpo-42663: Fix parsing TZ strings in zoneinfo module #23825
Conversation
Also refactor parsing of numbers and times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no opinion on the overall approach, just some remarks on the implementation.
Modules/_zoneinfo.c
Outdated
.hour = hour, | ||
.minute = minute, | ||
.second = second, | ||
.month = Py_SAFE_DOWNCAST(month, int, uint8_t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Py_SAFE_DOWNCAST() should be avoided, it is not safe in release mode: use plain cast (uint8_t)month
since you just checked bounds, no?
This PR is stale because it has been open for 30 days with no activity. |
julian, day); | ||
return -1; | ||
} | ||
|
||
if (hour < -167 || hour > 167) { | ||
PyErr_Format(PyExc_ValueError, "Hour must be in [0, 167]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the minimum 0 or -167?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum of the absolute value is 0. But there may be a minus sign before time (HHH:MM:SS).
Modules/_zoneinfo.c
Outdated
@@ -58,16 +58,16 @@ typedef struct { | |||
uint8_t month; | |||
uint8_t week; | |||
uint8_t day; | |||
int8_t hour; | |||
int16_t hour; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to write a short comment explaining that hour is in range [-167; 167) (RFC ...)? Since it's non obvious why minute and second use int8_t, but hour use int16_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
167 is written in many places, it would help to avoid inconsistencies if it's updated tomorrow.
I suggested to add a comment (ex: mentioning this constant) to explain why hour requires int16_t and not int8_t, where the structure is defined.
Modules/_zoneinfo.c
Outdated
@@ -1200,14 +1199,14 @@ calendarrule_year_to_timestamp(TransitionRuleType *base_self, int year) | |||
} | |||
|
|||
int64_t ordinal = ymd_to_ord(year, self->month, month_day) - EPOCHORDINAL; | |||
return ((ordinal * 86400) + (int64_t)(self->hour * 3600) + | |||
return ((ordinal * 86400L) + (int64_t)(self->hour * 3600L) + | |||
(int64_t)(self->minute * 60) + (int64_t)(self->second)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer syntax (int64_t)var * constant, rather than trying to declare the constant as an int64_t or a long.
For example, (int64_t)self->hour * 3600
rather than (int64_t)(self->hour * 3600L)
.
PyErr_Format(PyExc_ValueError, "Day must be in [0, 6]"); | ||
return -1; | ||
} | ||
|
||
if (hour < -167 || hour > 167) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe declare a constant for this limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it would make code better. If it is a constant declared far from that place, somebody can change its vale without changing the type of the hour
field.
Modules/_zoneinfo.c
Outdated
if (self->julian && day >= 59 && is_leap_year(year)) { | ||
day += 1; | ||
} | ||
|
||
return ((days_before_year + day) * 86400) + (self->hour * 3600) + | ||
return ((days_before_year + day) * 86400L) + (self->hour * 3600L) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to declare constants to enhance the readability (60, 3600, 86400):
#define MIN_TO_SEC 60
#define HOUR_TO_SEC (60 * MIN_TO_SEC)
#define DAY_TO_SEC (24 * HOUR_TO_SEC)
Example of Python/pytime.c:
/* To millisecond (10^-3) */
#define SEC_TO_MS 1000
/* To microseconds (10^-6) */
#define MS_TO_US 1000
#define SEC_TO_US (SEC_TO_MS * MS_TO_US)
/* To nanoseconds (10^-9) */
#define US_TO_NS 1000
#define MS_TO_NS (MS_TO_US * US_TO_NS)
#define SEC_TO_NS (SEC_TO_MS * MS_TO_NS)
/* Conversion from nanoseconds */
#define NS_TO_MS (1000 * 1000)
#define NS_TO_US (1000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constants 60 and 3600 look more readable to me than some named constant (and I have problem with distinguishing similar looking names). 86400 is enough obvious too, but if you prefer, I'll add a named constant for it.
I want to replace a trio hour/minute/second with a single integer (time offset in seconds) in future, so constants 3600 and 60 may go out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I haven't had a chance to look at the changes in any detail yet, but I think this needs tests, right? Or is the mobile view obscuring them or something? |
Good point. I started this PR as pure refactoring (it reduces of the C code by 67 lines), but since it also changes the behavior, it needs tests. And it happens that the Python code did not support 24h offset, and did not detect some errors, and could raise wrong type of exception. |
@pganssle, could you please take another look? I added tests and found that the old implementation rejected some extreme values and accepted values outside the valid range. The Python implementation contained more bugs than the C implementation and raised incorrect types of exceptions on invalid data (e.g. IndexError). I consider now this change as a bugfix. |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12. |
Sorry, @serhiy-storchaka, I could not cleanly backport this to
|
Sorry, @serhiy-storchaka, I could not cleanly backport this to
|
…-23825) zipinfo now supports the full range of values in the TZ string determined by RFC 8536 and detects all invalid formats. Both Python and C implementations now raise exceptions of the same type on invalid data.. (cherry picked from commit ab08ff7) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-110882 is a backport of this pull request to the 3.12 branch. |
…ythonGH-23825) (pythonGH-110882) zipinfo now supports the full range of values in the TZ string determined by RFC 8536 and detects all invalid formats. Both Python and C implementations now raise exceptions of the same type on invalid data. (cherry picked from commit ab08ff7). (cherry picked from commit 72b0f0e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…H-23825) (GH-110882) (GH-110889) zipinfo now supports the full range of values in the TZ string determined by RFC 8536 and detects all invalid formats. Both Python and C implementations now raise exceptions of the same type on invalid data. (cherry picked from commit ab08ff7) (cherry picked from commit 72b0f0e)
zipinfo now supports the full range of values in the TZ string determined by RFC 8536 and detects all invalid formats. Both Python and C implementations now raise exceptions of the same type on invalid data.
zipinfo now supports the full range of values in the TZ string determined by RFC 8536 and detects all invalid formats. Both Python and C implementations now raise exceptions of the same type on invalid data.
Also refactor parsing numbers and times (bpo-42660).
https://bugs.python.org/issue42663
#86826