-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
breaking,time: rewrite parse_rfc3339/1 to improve performance, reject partial timestamps, that miss date info like 22:47:08Z
#22585
Conversation
Wow, huge increase in performance, great job! |
vlib/time/parse_test.v
Outdated
['2015-01-06T15:47:32+01:00', '2015-01-06 16:47:32.000000'], | ||
['2015-01-06T15:47:32-01:00', '2015-01-06 14:47:32.000000'], | ||
['2015-01-06T15:47:32+01:10', '2015-01-06 16:57:32.000000'], | ||
['2015-01-06T15:47:32-01:10', '2015-01-06 14:37:32.000000'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include test
['2015-01-06T22:59:59+00:10', '2015-01-06 23:00:09.000000'],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work.
The failure in TOML seems to happen for values like this: import time
t := time.parse_rfc3339('1979-05-27T07:32:00-08:00')!
dump(t) On master, that prints: On this PR, that prints: |
Other examples, that work on master, but do not here: import time
dump(time.parse_rfc3339('1979-05-27T07:32:00-08:00')!)
dump(time.parse_rfc3339('22:47:08Z')!)
dump(time.parse_rfc3339('01:47:08.981+03:00')!)
dump(time.parse_rfc3339('2024-10-19T22:47:08-00:00')!)
dump(time.parse_rfc3339('2024-10-19T22:47:08.9+00:00')!)
dump(time.parse_rfc3339('2024-10-20T01:47:08+03:00')!)
dump(time.parse_rfc3339('2024-10-20T01:47:08.981+03:00')!) On master:
On the PR:
|
I think that code, should work as a first priority, and then it can be optimised. Breaking users of that parser should not happen, unless the parser has a bug, that has to be fixed. |
You can get more time examples from: https://ijmacd.github.io/rfc3339-iso8601/ . |
@spytheman I think master is broken for
import time
fn main() {
dump(time.parse_rfc3339('22:47:08Z')!)
}
time.parse_rfc3339('22:47:08Z')!: 0000-11-30 22:47:08 What do you think should be the result? |
Only the time part should be set in this case, since the date is not specified. Note however, that on master, it does not return an error for this case, while it does here, which later leads to problems for the toml tests. |
If only a time is specified, and a date is needed, then use the current date. This seems like a logical, reasonable alternative to trying to decide if |
No, it is not, since that is ambiguous and time dependent. Consider also the case, where you have just a date: In contrast, setting the unknown fields to 0, is predictable, simple and unambiguous. |
toml will need some fix
from https://docs.rs/toml/latest/toml/value/struct.Datetime.html from https://toml.io/en/ |
The I suspect (but can be wrong of course), that the bug is more likely to be in the new parser here, since that is the most recent thing that is changed. |
Here is the actual spec: https://toml.io/en/v1.0.0#local-date-time |
a8d9b2a
to
1a930f9
Compare
So you're saying the time would be set to 00:00:00... which is exactly midnight. 00:00:00 is a valid time. 0000-00-00 is an invalid date (at least in any computer system I've ever seen. I suppose you could argue it was the exact instant of the Big Bang, but...). It doesn't make sense to set one to a valid value, and the other to an invalid value. |
Is |
To me, |
No, the result should be a valid date, @JalonSolov has a point. |
On master, the fixed date is |
It is interesting, that Go for example, refuses to parse package main
import (
"fmt"
"time"
)
func main() {
t, err := time.Parse(time.RFC3339, "22:47:08Z")
if err != nil {
fmt.Println(err)
}
fmt.Println(t)
} produces:
Rust also errors out for it: use chrono::format::ParseError;
use chrono::{DateTime};
fn main() -> Result<(), ParseError> {
let rfc3339 = DateTime::parse_from_rfc3339("2016-06-20T12:41:45.14Z")?;
println!("{}", rfc3339);
let rfc3339 = DateTime::parse_from_rfc3339("22:47:08Z")?;
println!("{}", rfc3339);
Ok(())
} produces:
|
I could not find a good way to test for python (its standard library does not appear to have a rfc3339 parser), or for JS. |
This https://it-tools.tech/date-converter could handle it, but not as RFC3339, but as something called "Mongo ObjectID", whatever it is 🤔 . |
from https://www.rfc-editor.org/rfc/rfc3339#section-4.3 4. Local Time
4.1. Coordinated Universal Time (UTC)
Because the daylight saving rules for local time zones are so
convoluted and can change based on local law at unpredictable times,
true interoperability is best achieved by using Coordinated Universal
Time (UTC). This specification does not cater to local time zone
rules. |
I do not see, how that affects what the result of |
In reading rfc 3339 over (and over and over and...), there is no clear statement that a date needs to be included. HOWEVER, if you look at the ABNF for the format, you see
which at least implies that if you don't have ALL the digits, plus the This also means that TOML does not support rfc 3339, whether it says so or not. rfc 3339 if very explicit that the time offset must be supplied, as well as the date, there the last 2 example copied from the TOML docs in the screenshot above show just times, and without offsets. |
@JalonSolov from https://toml.io/en/v1.0.0#offset-date-time [Offset Date-Time](https://toml.io/en/v1.0.0#offset-date-time)
To unambiguously represent a specific instant in time,
you *may* use an [RFC 3339](https://tools.ietf.org/html/rfc3339)
formatted date-time with offset.
...
[Local Date-Time](https://toml.io/en/v1.0.0#local-date-time)
If you omit the offset from an [RFC 3339](https://tools.ietf.org/html/rfc3339)
formatted date-time, it will represent the given date-time *without any relation
to an offset or timezone*. It cannot be converted to an instant in time without
additional information. Conversion to an instant, if required, is *implementation-specific*.
...
[Local Date](https://toml.io/en/v1.0.0#local-date)
If you include only the date portion of an [RFC 3339](https://tools.ietf.org/html/rfc3339)
formatted date-time, it will represent that entire day without any relation to an offset
or timezone.
...
[Local Time](https://toml.io/en/v1.0.0#local-time)
If you include *only the time portion* of an [RFC 3339](https://tools.ietf.org/html/rfc3339)
formatted date-time, it will represent that time of day *without any relation to a
specific day* or any offset or timezone. i.e. in my interpretation, the TOML spec does not require a full RFC3339 The shown examples in the linked spec, and the test files in the official I.e. TOML is very liberal in what it allows, while RFC3339 on the other hand, requires the time zone. |
I also found this specialized Python parser module: https://pypi.org/project/pyRFC3339/ . |
I've just changed it, so that time.parse_rfc3339 now returns an error, for incomplete time (missing date), like all of the other existing rfc3339 parsers do. The TOML module now supplies the missing date part, when needed, so that it can continue to be conforming to its own relaxed spec. |
22:47:08Z
22:47:08Z
22:47:08Z
This C implementation, also errors on |
Before
Now
NOTE: I will base myself in this code to create a
Parser
struct then it can be useful in others standard like above oneshttps://ijmacd.github.io/rfc3339-iso8601/