-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Bug Fix for 8 fractional digit checking on TimeSpan.Parse(string) #21077
Conversation
Fix the inconsistent and incorrect results given when parsing a TimeSpan with 8 fractional digits
So this is the test that fails:
It expects This might look it's simply rounding up from 0.1 ticks to 1 tick, but if you replaced the terminal ones with fives then you'll find it also "rounds up" from 0.5 ticks to 5 ticks, or from .00123456 seconds to .0123456 seconds. Which is plainly not how numbers work. Up to you guys if you want to maintain some form of rounding for <100ns to keep this test passing, but there's no code in this repo that does it correctly currently. |
@Quole thanks for submitting the fix, this looks reasonable to me, could you please create corresponding corefx PR with some new test cases covering broken scenario and test fixes? (it will fail until we merge this but you should be able to test it locally - https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/developer-guide.md#testing-with-private-coreclr-bits - you might want to build release coreclr - please let me know if you hit any issues) @tarekgh I think you're more familiar with this code, could you also take a look? |
I think @joshfree would be more familiar with this code so he can advise. I am wondering why in GetNextToken() we didn't just store the length of the parsed digits and check that length in the validation. |
Before writing test cases, I need more input on how this method should work: A) Follow the spec (also Mono?) B) Fix but keep maximum backwards compatibility, enshrining the bug For "maximum" backwards compatibility, 9-digit fractional seconds will still throw an Overflow. But when there's 8-digit with a value between 0 and 1 tick (0 to 100ns exclusive), it will be rounded up to 1 tick (emulating the 10ns -> 100ns bug. Other 8-digit fractions (>100ns) will be truncated to 7 digits (instead of giving incorrect results). This is only backwards compatible with 4.x .NET implementations of the System library. .NET 3.5 gave an entirely different, but more appropriate FormatException. Other implementations I've found (Mono?) didn't have this bug. So this solution would be enshrining a very specific odd behavior from a specific .NET implementation. C) Least surprise (while still having overflow exceptions) D) Most permissive (or "least frustrating") Could have a cut off and at nanosecond or picosecond precision to avoid arbitrarily long strings from parsing, but this would be counter to how all other numeric types are parsed (including floats, doubles and decimals), so I wouldn't recommend it. Those types do not throw an OverflowException even with 10 million extra decimal places beyond what they can store, so I'm not sure why TimeSpan should be any different. E) decimal.Parse() style — (Not listed in table) decimal.Parse("0.55555555555555555555555555555"); // returns 0.5555555555555555555555555556 Find "TryNumberToDecimal" in Number.Parsing.cs for implementation details. Actual and proposed results from parsing various values
I meant to just ask a simple question but have half written the test cases by the end anyway (except I haven't tested the above yet, so might contains errors). Anyway, an option needs to be chosen. I originally was going to recommend "Least surprise", but, after some consideration it seems clear this would still cause surprises. I'm changing my suggestion to most permissive or decimal.Parse() style. Either way requires a larger patch than what I've provided, as does maximum compatibility. But if you want to go with option A then it's quite straight forward, and this patch does that. edit: added column to show .NET <= 3.5 behavior, added option E, and expanded comments |
I believe we should go with either A or B - my gut tells me we should follow the spec. Assuming we went with B - I'm kinda curious why should not @tarekgh any thoughts? |
Honestly, I am seeing the current behavior without any change can be acceptable. I understand is not 100% follow the spec and sometimes is not accurate with the 8 digits but I am not seeing is causing much troubles. if we really want to do anything here I would prefer option B. |
The rule I went with is "< 1 tick always rounds up, more than 1 tick always truncates." This seemed to be simplest rule to preserve the old behavior (i.e. 0.1 tick "rounding" to 1 tick) while also fixing the bug. Also it limits the rounding behavior to only 9 values in total. Here's what those values do currently. The "maximum backwards compatibility" proposal is to have .1 to to .9 "round up" to 1. TimeSpan.Parse("0:00:00.00000000"); // OverflowException
TimeSpan.Parse("0:00:00.00000001").Ticks; // == 1 (actual: 0.1 tick)
TimeSpan.Parse("0:00:00.00000002").Ticks; // == 2
TimeSpan.Parse("0:00:00.00000003").Ticks; // == 3
TimeSpan.Parse("0:00:00.00000004").Ticks; // == 4
TimeSpan.Parse("0:00:00.00000005").Ticks; // == 5
TimeSpan.Parse("0:00:00.00000006").Ticks; // == 6
TimeSpan.Parse("0:00:00.00000007").Ticks; // == 7
TimeSpan.Parse("0:00:00.00000008").Ticks; // == 8
TimeSpan.Parse("0:00:00.00000009"); // OverflowException Applying ceiling-style rounding to all 8-digit values seems excessive but is also possible. |
I was curious to test what how other numeric types handle parsing more decimal places than they can store. Perhaps this is common knowledge but I thought I'd test it. I assumed there was a point where other types would throw an What I found is that while float.Parse("0.5555555"); // 7 × '5' ... == 0.5555555
float.Parse("0.55555555"); // 8 × '5' ... == 0.5555555 (same as above) // "same as above" means the parsed value == the above's parsed value
float.Parse("0.555555555"); // 9 × '5' ... == 0.5555556
float.Parse("0.5555555555"); // 10 × '5' ... == 0.5555556 (same as above)
float.Parse("0.55555555555"); // 11 × '5' ... == 0.5555556 (same as above)
float.Parse("0.5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555"); // 100 × '5' ... == 0.5555556 (same as above)
float.Parse(...); // 100,000,000 × '5' ... == 0.5555556 (same as above)
double.Parse("0.555555555555555"); // 15 × '5' ... == 0.555555555555555
double.Parse("0.5555555555555555"); // 16 × '5' ... == 0.555555555555555
double.Parse("0.55555555555555555"); // 17 × '5' ... == 0.555555555555556
double.Parse("0.555555555555555555"); // 18 × '5' ... == 0.555555555555556 (same as above)
double.Parse("0.5555555555555555555"); // 19 × '5' ... == 0.555555555555556 (same as above)
double.Parse("0.55555555555555555555"); // 20 × '5' ... == 0.555555555555556 (same as above)
double.Parse("0.555555555555555555555"); // 21 × '5' ... == 0.555555555555556 (same as above)
decimal.Parse("0.555555555555555555555555555"); // 27 × '5' ... == 0.555555555555555555555555555
decimal.Parse("0.5555555555555555555555555555"); // 28 × '5' ... == 0.5555555555555555555555555555
decimal.Parse("0.55555555555555555555555555555"); // 29 × '5' ... == 0.5555555555555555555555555556
decimal.Parse("0.555555555555555555555555555555"); // 30 × '5' ... == 0.5555555555555555555555555556 (same as above)
decimal.Parse("0.5555555555555555555555555555555"); // 31 × '5' ... == 0.5555555555555555555555555556 (same as above)
decimal.Parse("0.55555555555555555555555555555555"); // 32 × '5' ... == 0.5555555555555555555555555556 (same as above)
decimal.Parse(...); // 100,000,000 × '5' ... == 0.5555555555555555555555555556 (same as above) "0.00000...001" gets rounded to zero. Also without ever throwing an exception, even with 100,000,000 zeroes. float.Parse("0.00000000000000000000000000000000000000000001"); // 43 × '0' ... == 9.809089E-45
float.Parse("0.000000000000000000000000000000000000000000001"); // 44 × '0' ... == 1.401298E-45
float.Parse("0.0000000000000000000000000000000000000000000001"); // 45 × '0' ... == 0
float.Parse(...); // 100,000,000 × '0' ... == 0 (same as above)
double.Parse(...); // 321 × '0' ... == 9.88131291682493E-323
double.Parse(...); // 322 × '0' ... == 9.88131291682493E-324
double.Parse(...); // 323 × '0' ... == 0
double.Parse(...); // 324 × '0' ... == 0 (same as above)
double.Parse(...); // 100,000,000 × '0' ... == 0 (same as above)
decimal.Parse("0.0000000000000000000000000001"); // 27 × '0' ... == 0.0000000000000000000000000001
decimal.Parse("0.00000000000000000000000000001"); // 28 × '0' ... == 0.0000000000000000000000000000
decimal.Parse(...); // 100,000,000 × '0' ... == 0.0000000000000000000000000000 (same as above) All the numeric types I tested could also handle at least 100 million redundant zeroes at the start of the number, and in the case of floating point types, at the end too. int.Parse("1"); // 0 × '0' ... == 1
int.Parse("01"); // 1 × '0' ... == 1 (same as above)
int.Parse("001"); // 2 × '0' ... == 1 (same as above)
int.Parse(...); // 100,000,000 × '0' ... == 1 (same as above)
long.Parse("1"); // 0 × '0' ... == 1
long.Parse("01"); // 1 × '0' ... == 1 (same as above)
long.Parse("001"); // 2 × '0' ... == 1 (same as above)
long.Parse(...); // 100,000,000 × '0' ... == 1 (same as above)
BigInteger.Parse("1"); // 0 × '0' ... == 1
BigInteger.Parse("01"); // 1 × '0' ... == 1 (same as above)
BigInteger.Parse("001"); // 2 × '0' ... == 1 (same as above)
BigInteger.Parse(...); // 100,000,000 × '0' ... == 1 (same as above)
float.Parse("5.5"); // 0 × '0' ... == 5.5
float.Parse("05.50"); // 1 × '0' ... == 5.5 (same as above)
float.Parse("005.500"); // 2 × '0' ... == 5.5 (same as above)
float.Parse(...); // 100,000,000 × '0' ... == 5.5 (same as above)
double.Parse("5.5"); // 0 × '0' ... == 5.5
double.Parse("05.50"); // 1 × '0' ... == 5.5 (same as above)
double.Parse("005.500"); // 2 × '0' ... == 5.5 (same as above)
double.Parse(...); // 100,000,000 × '0' ... == 5.5 (same as above)
decimal.Parse("5.5"); // 0 × '0' ... == 5.5
decimal.Parse("05.50"); // 1 × '0' ... == 5.50 (same as above)
decimal.Parse("005.500"); // 2 × '0' ... == 5.500 (same as above)
decimal.Parse(...); // 100,000,000 × '0' ... == 5.5000000000000000000000000000 (same as above)
Test source code: |
Online live demos of the bug:
using System;
public class Program
{
public static void Main()
{
Console.WriteLine(TimeSpan.Parse("0:00:00.01").TotalMilliseconds); // 10 ms
Console.WriteLine(TimeSpan.Parse("0:00:00.01000000").TotalMilliseconds); // 100 ms??!
// .NET version information
Console.WriteLine();
Console.WriteLine("Version: {0}", Environment.Version.ToString());
Console.WriteLine(typeof(TimeSpan).Assembly.FullName);
Console.WriteLine(typeof(TimeSpan).Assembly.ImageRuntimeVersion);
}
} |
I tried to work out which version of .NET introduced the bug by testing various versions. Here are my results so far: string a = "0:00:00.01";
string b = "0:00:00.01000000"; // 8 decimal digits
string c = "0:00:00.010000000"; // 9 decimal digits
In some .NET implementations, I don't know how to get a useful version information. Some "identical" versions give different behavior. |
@Quole did you try doing solution B you mentioned earlier? I think this would be the safest option. |
@tannergooding in case he is interested |
I'm sorry I suggested that solution. It was a mistake. It's far from optimal and could only lead to more user code bugs than it would possibly stave off. Extra decimal places and redundant zeroes are two things that do not cause OverflowExceptions; there was no reason to switch from Considering the existing behavior differences between .NET 3.5, .NET 4.0 and some other unknown implementation (possibly Mono—Sorry I'm at a loss to find any way to easily test code across different library versions), and how they all differ remarkably from the behavior of the other numeric parsers, I don't think it would be optimal solution to introduce an additional kludge mentioned at the bottom of some documentation that no one will read until it's pointed out to them on StackOverflow. I could not ethically implement that solution. As I've already written another library which already handles this correctly, my only purpose to do it, as an unpaid volunteer for a $850B company, would be to claim the version in my own TimeSpanParser library is superior. The patch I've submitted fixes the immediate problem, but you've got a long way to go to make this work in any sensible way. Good luck in fixing the code. |
@Quole I would to thank you for your detailed info and your tests across versions. I appreciate your effort here. For the difference in behavior between 3.5 and 4.x, this already stated in the documentation https://docs.microsoft.com/en-us/dotnet/api/system.timespan.parse?view=netframework-4.7.2#System_TimeSpan_Parse_System_String_System_IFormatProvider_ in the "Notes to callers"section. Thinking more in the issue, I believe we have limited the parsing (in the spec) to 7 digits that because our time handling in general in the .NET is using ticks (100-nanseconds). so we cannot go less than that. I think the best solution here, should be parsing the whole number (even if it is more than 7 digits) and then round it to the seven digits and use this value. I know we'll start not throwing in some cases we used to throw but I think this will be a limited breaking scenario and I believe the behavior would be more understandable and logical. @joshfree what you think about that? |
This is how decimal.Parse() works, and I agree it's the ideal solution. |
I am going to close this one and will open a new PR. I have looked more and here is what I am going to do, but before I spell it out I want to clarify some issues:
So what I am going to do is:
|
by the way, thanks @Quole for your effort here and I'll include you in the PR I am going to have. |
by the way I am doing more investigation to ensure what I am going to do is the best here. looks we are normalizing the leading zeros, https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Globalization/TimeSpanParse.cs#L575 which is need to be handled carefully (and maybe rounding will be needed) |
This patch fixes the inconsistent and incorrect results given when parsing a TimeSpan with 8 fractional digits.
I've made my own TimeSpanParser and have been testing it against
System.TimeSpan.Parse()
. I come up with some issues which turned out to be withTimeSpan.Parse(string)
.In brief, all these tests pass "successfully", showing incorrect behavior:
That is, a string with ~0.01 seconds creates the exact same TimeSpan as a string with ~0.001 seconds.
These tests also pass "successfully", showing inconsistent behavior:
In the above code we parse two strings with the same number of digits. The one ending in "98" creates a TimeSpan with 98 ticks, while the one ending in "99" overflows. Not only is this inconsistent, but the 98 value is incorrect: it's actually 9.8 ticks, so should round to either 9 (truncation) or 10 (if behaving like decimal.Parse), but parsing it as 98 ticks is clearly a bug. Though the point here is to illustrate that it has a different kind of behavior to the other string, which "overflows".
The problem is an off-by-one error in
IsInvalidFraction()
, which validates the fractional digits. The fractional digits are everything after the decimal place, which are stored as an integer in a token,TimeSpanToken
, used internally by when parsing a string.All the above examples have 8 digits, which
IsInvalidFraction()
fails to invalidate.I've found the line in the source code that's broken. It attempts to check the number of fractional digits, and does this in a slightly convoluted way. First It creates a number that the integer part can't be larger than (based on its number of leading zeroes (
_zeroes
) and the fact it can only have 7 decimal places), then checks that the fractional number (as an integer), isn't larger than that.coreclr/src/System.Private.CoreLib/shared/System/Globalization/TimeSpanParse.cs
Line 119 in ac09773
But it subtracts one from
_zeroes
, creating an off-by-one error, resulting in anything with 8 digits having to rely on the previous tests in function, which only catch certain 8-digit numbers, and let through many oddities which were meant to be caught by that one line:coreclr/src/System.Private.CoreLib/shared/System/Globalization/TimeSpanParse.cs
Lines 112 to 113 in ac09773
The problem is solved by deleting the
- 1
, which is all this patch does.I've made a walk-through of the this issue here in UnitTest format, as well as tests for the broken and patched version of
IsInvalidFraction()
:https://github.com/quole/TimeSpanParser/blob/master/TimeParser.Tests/NotWrittenHereUnderflowWeirdnessTests.cs
I have not yet created any unit tests suitable for this repo (dotnet/coreclr). I know you'll need some unit tests before accepting this but I thought I'd make the initial post now rather than later, in case there's discussion that needs to be had.
For completeness, here's some links to previous discussions: a casual wiki post made after discovering the issue, including a dump of test results, initial bug report on dotnet/corefx, previous PR I made to the wrong repo (dotnet/corefx)