-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonical representation for 0000~0999 years in ISO8601 #2082
Comments
Thanks for noting this! It's true, ISO 8601 says (section 4.3.2, Calendar Year and Years Duration):
This behaviour of switching to the extended years at <1000 existed in the proposal since before I was involved. Maybe @pipobscure @maggiepint @mattjohnsonpint might know what the rationale is? If there are interoperability problems with legacy Date and with other languages that we previously weren't aware of, that might meet the bar for considering a change even during Stage 3. On the other hand, I'm not sure that a userland parser rejecting the extended year format is a strong enough motivation. Even if the 4-digit year was output for 0000–9999 instead of 1000–9999, that parser would still be unable to handle some of the output from JS (legacy Date as well as Temporal). "We use 6-digit extended years by agreement" means that you have to be able to handle 6-digit extended years if you expect to parse the output, or you will eventually get something that breaks your parser 😄 |
IMO, matching the output of legacy Date ISO strings seems important enough to justify a change. In some cases we must diverge (e.g. nanoseconds) but seems like we shouldn't diverge from legacy without a really good reason. |
Thanks for responses! I hope it was simply overlooked in the polyfill code, and intended to be more like: ISOYearString: (year) => {
let yearString;
- if (year < 1000 || year > 9999) {
+ if (year < 0 || year > 9999) {
let sign = year < 0 ? '-' : '+';
let yearNumber = MathAbs(year);
yearString = sign + `000000${yearNumber}`.slice(-6);
} else {
- yearString = `${year}`;
+ yearString = `0000${year}`.slice(-4);
}
return yearString;
} |
The reason for this was that ISO8601 (4.1.2.1) specifies that years before 1582 require prior consent between platforms so aren’t really canonical. Add to that that leading 0 in years give trouble to some platforms and 4 digits are required for the basic format. So we said let’s maximise for interoperability which turns out to be outputting extended years before 1000CE So if you want to change this for better interop, then don’t do this! If you want to go for strictness then change it to “extended before 1582”. And if you want to optimise for interop and humans then leave it as is. |
I have no idea why that's there. My preference would be that years 0000-9999 are always represented with four digits. |
@pipobscure - which platforms can't parse years before 1000 with four digits? |
None. I was being unclear. I meant that because the year is the first thing and leading zeros are something a lot of things don’t deal well with it’s problematic. Just look at any CSV library or handling data in Excel. Leading zeros are likely to be dropped. By requiring extended format before 1000 dates are not lead by zeroes anymore since that requires a + or - in the lead position. |
Summarizing the discussion above, it sounds like the decision is this:
FWIW, it turns out that in Excel (at least on my Mac), (2) may actually be more compatible. When you paste But when you paste It's possible that modern versions of excel recognize the ISO 8601 format (at least the 4-digit variant) and intentionally avoid converting it to a formula. I tried importing the same values in a CSV into Excel. Surprisingly, the behavior was the same: 4-digit years with no prefix imported fine, while 6-digit years with a plus prefix imported as a formula. Note that to get Excel to successfully import, I needed to wrap the values in quotes. Without quotes, the imported file had all blank cells! Given that the 4-digit format seems to be more compatible with legacy Date and also seems more compatible with Excel, my vote would be to switch to the 4-digit format for years between 1 BCE - 999 CE. |
Sorry for misleading title: by "canonical" I meant format that will be produced by the very final reference implementation of Temporal, and maybe explicitly described in final specification. However, I don't see any rational connection between <1582 and expanded representation. As for Y1582 itself, I think the logic behind that is that only dates after Inter gravissimas are maintained as legitimate part of Gregorian calendar. Earlier dates are deprecated and may only occur in its extrapolation. So it's a sort of disclaimer: if user implements ISO 8601 in time machine and goes too far into the past, they may desync due to astronomical reasons. If they go to the future, time shall be patched by leap seconds.
Thanks a lot for checking this one. I suppose, most-likely-to-fail here are parsers with heuristics for 2-digit years and without highest priority check for machine-readable formats. But two extra leading zeros and a plus sign won't help them anyways: if they don't support simple version of standard, why expect support for expanded form? As an example of 2-digit oriented field, YEAR(4) type in MySQL interprets CREATE TABLE gwak ( y YEAR, t TEXT );
INSERT INTO gwak VALUES ( '0000', '0000' ), ( '+000000', '+000000' );
SELECT * from gwak;
+------+---------+
| y | t |
+------+---------+
| 0000 | 0000 |
| 2000 | +000000 |
+------+---------+ |
I'm still in favor of using only 4 digits for years 0000-9999. I think having years 0-1000 behave differently is a trap that is too reminiscent of the legacy Also, I don't think it's our responsibility to avoid leading-zero pitfalls on other platforms. And as mentioned, the leading +/- can cause side effects in some systems. |
Given the lack of modern representation of dates older than 1022 years ago and the possibility that some parties may choke on leading zeroes, I don't see why the addition of 3 tiny characters ( |
Personally, I'm not yet convinced by those examples — those third-parties still choke on other valid ISO 8601 strings that could just as easily be output by Temporal as well as legacy Date. For example, years before 0, or after 9999: >>> from datetime import datetime
>>> datetime.fromisoformat('-000913-09-16T12:44:26')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '-000913-09-16T12:44:26' irb(main):001:0> require 'time'
=> true
irb(main):002:0> Time.iso8601('+100000-09-16T12:44:26Z')
Traceback (most recent call last):
6: from /usr/local/bin/irb:23:in `<main>'
5: from /usr/local/bin/irb:23:in `load'
4: from /var/lib/gems/2.5.0/gems/irb-1.1.0/exe/irb:11:in `<top (required)>'
3: from (irb):3
2: from (irb):4:in `rescue in irb_binding'
1: from /usr/lib/ruby/2.5.0/time.rb:602:in `xmlschema'
ArgumentError (invalid date: "+100000-09-16T12:44:26Z") (interestingly, Ruby's standard library does accept $ date -d '-000913-09-16T12:44:26Z'
date: invalid date ‘-000913-09-16T12:44:26Z’ Sure, this change would make them accept some more ISO strings originating from Temporal that they previously would not, but there is still not "mutual agreement between the communicating parties" on the extended years. Is there were a parser in widespread use that accepts |
The most important interop for Temporal is with other ECMAScript code. If there are two valid formats for the same date, but one of them is currently used by legacy From what I see from the examples above, some parsers fail on So I think the best thing we can do here is to match legacy
|
In this case, such values are outside of range, agreed between emitter and parser by supporting it on both sides.
I hope not. 😄
Yes. I don't have good examples of applications that would be affected, and too old dates are indeed rare in most of "real life" data, but year 0 is affected, and it's natural to give a special meaning (e.g. "date unspecified") to it. Complicated processing is not required to meet this problem: checking and normalization of existing dates is enough to break output in Temporal version. Innocent function like this one might become a textbook footgun: static #MIN = Temporal.Instant.from('0000-01-01T00:00:00+0000');
static #MAX = Temporal.Instant.from('9999-12-31T23:59:59.999999999+0000');
static function validateAndNormalizeAndCheckRange(dateString) {
const tmp = Temporal.Instant.from(dateString);
if (Temporal.Instant.compare(this.#MIN, tmp) > 0 || Temporal.Instant.compare(this.#MAX, tmp) < 0)
throw new RangeError('Datetime outside of range');
return tmp.toString();
}
// or simply:
function validateAndNormalize(dateString) { return Temporal.Instant.from(dateString).toString(); }
/**
* @deprecated and strongly discouraged due to browser differences and inconsistencies
*/
function normalizeOrNull(dateString) { return new Date(dateString).toJSON(); } By the way, not only emitters and parsers are happier with 0YYY. Some storage engines may add weight as well. SQLite version 3.38.0 2022-02-22 18:58:40
sqlite> CREATE TABLE gwak (d TEXT, t TEXT);
sqlite> INSERT INTO gwak VALUES ( datetime('0913-09-16T12:44:26'), '0913-09-16T12:44:26' );
sqlite> INSERT INTO gwak VALUES ( datetime('+000913-09-16T12:44:26'), '+000913-09-16T12:44:26' );
sqlite> INSERT INTO gwak VALUES ( datetime(-33333333333, 'unixepoch'), '-33333333333' );
sqlite> SELECT * FROM gwak;
0913-09-16 12:44:26|0913-09-16T12:44:26
|+000913-09-16T12:44:26
0913-09-16 12:44:27|-33333333333 |
Meeting 2022-03-17: Temporal will change The main reason for making this change is to maximize compatibility with We considered other platforms' challenges with parsing either format, but in the end we agreed that it's a much higher priority to be compatible with other ECMAScript code than to worry about non-spec-compliant parsers outside of ECMAScript. One could even argue that it's better for developers dealing with external parsers to only have one ECMAScript format to worry about, instead of two. Hopefully those other parsers will eventually become compliant in the future! |
Thanks everyone for agreeing on this despite of Stage 3! |
Thanks @LiviaMedeiros! We're grateful that you found this soon enough so that it could be fixed. If you find other issues, feel free to let us know! |
…s, not 6 https://bugs.webkit.org/show_bug.cgi?id=240294 Reviewed by Yusuke Suzuki. This patch implements the spec change of tc39/proposal-temporal#2082: The range for 4-digit years in ISO8601 date strings should be 0-9999, not 1000-9999. * test262/expectations.yaml: Mark four test cases as passing. * runtime/TemporalInstant.cpp: Canonical link: https://commits.webkit.org/250456@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@294050 268f45cc-cd09-0410-ab3c-d52691b4dbfc
According to current
ISOYearString
implementation, non-negative years earlier than 1000 are represented in+000YYY
format instead of0YYY
.Is that intentional?
If yes, what is the rationale?
As far as I understand ISO8601, expanded representation is always allowed to be used "by agreement", but is neither necessary or preferable within whole 0000~9999 range.
Current behaviour is not compatible with legacy
Date
:It's not how it usually works in other languages:
And of course it breaks compatibility with a lot of userland parsers.
The text was updated successfully, but these errors were encountered: