Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-42663: Fix parsing TZ strings in zoneinfo module #23825

Merged
merged 15 commits into from
Oct 14, 2023

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Dec 17, 2020

  • Support the full range of values determined by RFC 8536.
  • Detect all invalid formats.
  • Both implementations now raise exceptions of the same type on invalid data.

Also refactor parsing numbers and times (bpo-42660).

https://bugs.python.org/issue42663

#86826

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no opinion on the overall approach, just some remarks on the implementation.

.hour = hour,
.minute = minute,
.second = second,
.month = Py_SAFE_DOWNCAST(month, int, uint8_t),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_SAFE_DOWNCAST() should be avoided, it is not safe in release mode: use plain cast (uint8_t)month since you just checked bounds, no?

https://bugs.python.org/issue19692

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jan 17, 2021
julian, day);
return -1;
}

if (hour < -167 || hour > 167) {
PyErr_Format(PyExc_ValueError, "Hour must be in [0, 167]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the minimum 0 or -167?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimum of the absolute value is 0. But there may be a minus sign before time (HHH:MM:SS).

@@ -58,16 +58,16 @@ typedef struct {
uint8_t month;
uint8_t week;
uint8_t day;
int8_t hour;
int16_t hour;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to write a short comment explaining that hour is in range [-167; 167) (RFC ...)? Since it's non obvious why minute and second use int8_t, but hour use int16_t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

167 is written in many places, it would help to avoid inconsistencies if it's updated tomorrow.

I suggested to add a comment (ex: mentioning this constant) to explain why hour requires int16_t and not int8_t, where the structure is defined.

@@ -1200,14 +1199,14 @@ calendarrule_year_to_timestamp(TransitionRuleType *base_self, int year)
}

int64_t ordinal = ymd_to_ord(year, self->month, month_day) - EPOCHORDINAL;
return ((ordinal * 86400) + (int64_t)(self->hour * 3600) +
return ((ordinal * 86400L) + (int64_t)(self->hour * 3600L) +
(int64_t)(self->minute * 60) + (int64_t)(self->second));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer syntax (int64_t)var * constant, rather than trying to declare the constant as an int64_t or a long.

For example, (int64_t)self->hour * 3600 rather than (int64_t)(self->hour * 3600L).

PyErr_Format(PyExc_ValueError, "Day must be in [0, 6]");
return -1;
}

if (hour < -167 || hour > 167) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe declare a constant for this limit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it would make code better. If it is a constant declared far from that place, somebody can change its vale without changing the type of the hour field.

if (self->julian && day >= 59 && is_leap_year(year)) {
day += 1;
}

return ((days_before_year + day) * 86400) + (self->hour * 3600) +
return ((days_before_year + day) * 86400L) + (self->hour * 3600L) +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to declare constants to enhance the readability (60, 3600, 86400):

#define MIN_TO_SEC 60
#define HOUR_TO_SEC (60 * MIN_TO_SEC)
#define DAY_TO_SEC (24 * HOUR_TO_SEC)

Example of Python/pytime.c:

/* To millisecond (10^-3) */
#define SEC_TO_MS 1000

/* To microseconds (10^-6) */
#define MS_TO_US 1000
#define SEC_TO_US (SEC_TO_MS * MS_TO_US)

/* To nanoseconds (10^-9) */
#define US_TO_NS 1000
#define MS_TO_NS (MS_TO_US * US_TO_NS)
#define SEC_TO_NS (SEC_TO_MS * MS_TO_NS)

/* Conversion from nanoseconds */
#define NS_TO_MS (1000 * 1000)
#define NS_TO_US (1000)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants 60 and 3600 look more readable to me than some named constant (and I have problem with distinguishing similar looking names). 86400 is enough obvious too, but if you prefer, I'll add a named constant for it.

I want to replace a trio hour/minute/second with a single integer (time offset in seconds) in future, so constants 3600 and 60 may go out here.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label May 6, 2022
@pganssle pganssle changed the title bpo-42663: Support full range of allowed transition hours in zipinfo. bpo-42663: Support full range of allowed transition hours in zoneinfo. May 6, 2022
@pganssle
Copy link
Member

pganssle commented May 6, 2022

I haven't had a chance to look at the changes in any detail yet, but I think this needs tests, right? Or is the mobile view obscuring them or something?

@serhiy-storchaka
Copy link
Member Author

Good point. I started this PR as pure refactoring (it reduces of the C code by 67 lines), but since it also changes the behavior, it needs tests. And it happens that the Python code did not support 24h offset, and did not detect some errors, and could raise wrong type of exception.

@serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error needs backport to 3.11 only security fixes needs backport to 3.12 bug and security fixes labels Sep 1, 2023
@serhiy-storchaka serhiy-storchaka changed the title bpo-42663: Support full range of allowed transition hours in zoneinfo. bpo-42663: Fix parsing TZ strings in zoneinfo module Sep 1, 2023
@serhiy-storchaka
Copy link
Member Author

@pganssle, could you please take another look?

I added tests and found that the old implementation rejected some extreme values and accepted values outside the valid range. The Python implementation contained more bugs than the C implementation and raised incorrect types of exceptions on invalid data (e.g. IndexError).

I consider now this change as a bugfix.

@serhiy-storchaka serhiy-storchaka merged commit ab08ff7 into python:main Oct 14, 2023
@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12.
🐍🍒⛏🤖

@serhiy-storchaka serhiy-storchaka deleted the zipinfo-parse branch October 14, 2023 20:24
@miss-islington
Copy link
Contributor

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ab08ff7882b6181fb785eed7410dbf8030aded70 3.12

@miss-islington
Copy link
Contributor

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ab08ff7882b6181fb785eed7410dbf8030aded70 3.11

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 14, 2023
…-23825)

zipinfo now supports the full range of values in the TZ string
determined by RFC 8536 and detects all invalid formats.
Both Python and C implementations now raise exceptions of the same
type on invalid data..
(cherry picked from commit ab08ff7)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Oct 14, 2023

GH-110882 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Oct 14, 2023
serhiy-storchaka added a commit that referenced this pull request Oct 15, 2023
GH-110882)

zipinfo now supports the full range of values in the TZ string
determined by RFC 8536 and detects all invalid formats.
Both Python and C implementations now raise exceptions of the same
type on invalid data.
(cherry picked from commit ab08ff7)
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull request Oct 15, 2023
…ythonGH-23825) (pythonGH-110882)

zipinfo now supports the full range of values in the TZ string
determined by RFC 8536 and detects all invalid formats.
Both Python and C implementations now raise exceptions of the same
type on invalid data.
(cherry picked from commit ab08ff7).
(cherry picked from commit 72b0f0e)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka serhiy-storchaka removed their assignment Oct 15, 2023
@serhiy-storchaka serhiy-storchaka removed the needs backport to 3.11 only security fixes label Oct 15, 2023
serhiy-storchaka added a commit that referenced this pull request Oct 15, 2023
…H-23825) (GH-110882) (GH-110889)

zipinfo now supports the full range of values in the TZ string
determined by RFC 8536 and detects all invalid formats.
Both Python and C implementations now raise exceptions of the same
type on invalid data.
(cherry picked from commit ab08ff7)
(cherry picked from commit 72b0f0e)
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
zipinfo now supports the full range of values in the TZ string
determined by RFC 8536 and detects all invalid formats.
Both Python and C implementations now raise exceptions of the same
type on invalid data.
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
zipinfo now supports the full range of values in the TZ string
determined by RFC 8536 and detects all invalid formats.
Both Python and C implementations now raise exceptions of the same
type on invalid data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants