-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float printing and/or parsing is inaccurate #24557
Comments
potential dupe of #7030 and/or #7648 . I've been meaning to try to look at this for a long long time; for the most part we've always just triaged these issues as "just a bug", but that's not a reason to not try to fix them, if someone has time. I am very familiar with Will Clinger's paper, as he was my Ph.D advisor -- though floating-point parsing was not my topic area -- and I have seen him present the material more than once. Also relevant are the other papers I listed on my comment here: and possibly also relevant is Steele and White, also from PLDI 1990, "How to Print Floating Point Numbers Accurately" and possibly also this recent paper: |
Thanks for the dupes, I'm useless with GitHub's search. If you check out the companion #24556 you'll see that lifthrasiir has an implementation (combining Loitsch's algorithm and the algorithm of Steele and White) and IIUC is intending to contribute it. I'm very interested in seeing this fixed, so I'll try implementing Clinger's algorithm over the next weeks. |
I only just realized that there are actually functions in the standard libraries that display in bases other than ten. They all seem to be deprecated, though. Is this a thing that should be supported indefinitely, or is this slated for removal in the near future? That would be a bummer, since (as discussed in one of the other issues) a lot of the research focuses on base 10. |
cc @lifthrasiir, because you were working on making conversions correct AFAIR. |
I originally planned to work on @rkruppe Base 10 is not that special, but many practical algorithms rely on the precalculated table which is base-specific. It is not reasonable to ship all tables for possible bases. |
I just ran into this issue. Printing a |
@kballard that should be fixed by @lifthrasiir 's PR #24612, at least according to my understanding of the algorithms and the comment here |
@pnkfelix It certainly sounds like it should. |
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief, * This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting". * This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete). * The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite. * It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements. This is rather large code. I did my best to comment and annotate the code, but you have been warned. For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern. This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :) Fixes #7030, #18038 and #24556. Also related to #6220 and #20870. ## Known Issues - [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests. - [ ] Figure out how this PR affects rustc. - [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).
triage: P-medium Adopting the priority from #7648. |
This remains an issue. My current theory is that while we greatly improved our float2decimal conversion, we have not yet done anything to address our float-parsing routines (i.e. decimal2float conversion). Here is some new demo code that I think makes this pretty clear. (I got rid of the random number generation because frankly I don't know how to do that on nightly anymore without using cargo -- and in any case, these deterministic tests should be easy to understand. https://play.rust-lang.org/?gist=7858323861f542d64e34 Update: I also need to review @kballard 's argument about about the printing of |
I am working on dec2flt. I have transcribed the algorithms (R and Bellerophon) from Clinger's paper and implemented most helper routines that were missing. It compiles and said helper routines pass some tests but the whole thing is still missing some pieces before I can start running it against well-behaved inputs. I hope to finish this part next weekend. Before I file a PR I also want to implement correct handling of overflowing, almost-overflowing, underflowing, subnormal and almost-subnormal numbers. This may turn out to be more tricky (the paper mentions this more as an afterthought) but I have a pretty clear plan of attack. Regarding 0.6: No, 0.60...0 is correct. It's not what Python outputs for |
I suspect that we don't handle any of these properly now, so I would be very happy with a PR where these weren't horribly-wrong (even if not fully correct). E.g. subnormals: fn main(){
let x = 1e-308 / 1e15;
println!("{}", x == 0.0);
println!("{:e}", x);
}
|
To be clear, the current state of my code is that things go horribly wrong when fed such numbers. There may be a relatively easy fix that's somewhat correct, but I'm not sure and (now that I've spent quite a bit of time thinking about proper support) I don't think the bandaid will be significantly faster to implement than a proper solution. I'll just have to implement bignum long division and Algorithm M from the paper — of which I have already written a reference implementation in Python — with checks for the minimum and maximum exponent. Well, and set the right cutoff for when to use Algorithm M. Aside: I'm sure you're aware but @lifthrasiir's code correctly handles subnormals, at least on this example. Only the old code has this problem and 1.0 stable (and thus play.rust-lang.org with default settings) still uses the old code. |
Executive summary: I mostly agree with @rukruppe about I wrote:
@rkruppe responded:
I agree that printing First, I wrote this Rust code to print out some ranges of decimal fractions corresponding to low and high points for the floating point values around the area of 0.6 https://play.rust-lang.org/?gist=a50a1eea69cc97be1d45 [deleted paragraph with reported conclusions that were based on code with unquantified floating point error] UPDATE: Argh, the technique in the Rust code above is flawed; casting the numerators and denominators to floats and then dividing is simply a bad way to try to my hands on the exact ranges that are involved here... I will go back later and do it again with proper bignum-based fractions in the future. Now, as for the question of what should be printed (since we seem to have a number of options available to us) ... obviously when you use Based on some interactions with a Scheme interpreter (my go-to choice for bignum stuff), the 64-bit float for 0.59999999999999997779553950749686919152736663818359375 From this we can see why @kballard might expect to see So: Its possible that some people may be annoyed by our current flt2dec handling of specifications like |
ps @rkruppe thanks for looking into this; I was about to roll up my sleeves to work on it yesterday, but i am happy to see you taking care of it. Let me know if you want a second set of eyes during debugging. |
any reason we cannot just cut-and-paste the old (slow-ish) bignum code that we used to have in the Rust repo into a private mod for these bignum implementation issues? The path that utilizes bignums is only exercised in exceptional cases, right?
I'm referring to the Update: Its possible I'm missing the point and that the abstractions provided by |
Decimal parsing and float formatting lives in libcore, i.e., can't use dynamic memory allocation. That alone already rules out using libnum verbatim. If I dug deeper, I'd probably find several other reasons, but it's a moot point since @lifthrasiir has already implemented almost all functionality I need (and all that flt2dec needs). FYI, while division is only needed in exceptional cases, most code paths do require a bignum to some degree (subtraction, comparison, multiplication by powers of two and ten). I realized just now how it can be avoided on some cases (fewer than 15 decimal digits and the absolute decimal exponent is less than 23) but it's still required even for some relatively common cases. Maybe libnum's code for division is a good starting point algorithmically, and I'll make sure to look at it in more depth before I dive into implementing division, but from a cursory glance it seems like importing the code and whipping it into something working may be more trouble than writing division from scratch. |
Just to note, I think that Which is to say, the behavior of printing some value that's not strictly accurate but evaluates to the same value when parsed as an IEEE floating-point value is behavior that's only appropriate when precision has not been specified. When output precision has been specified, it should always print based on the precise floating-point representation. |
Famous last words, heh. That said, I also did a lot of other clean up and refactoring and testing, confirmed a couple of bugs I'd long suspected, and addressed the underlying issues satisfactorily. For example, up-front input validation is now pretty comprehensive and prevents any way I can think of to cause integer (including bignum) overflow during the conversion. As for actual functionality: the code paths for nice inputs work now and passed a couple of simple tests. I'm not willing to declare it correct yet, but I'm hopeful. This is only for f64 as of now (of course I can parse into f64 and cast that to f32 and indeed I do that, and have plenty of test cases where it fails). Still missing, in order:
I won't make any predictions, but I feel I've solved all the hard problems and the rest is polishing and making sure I haven't overlooked any hard problems. |
@rkruppe can you share a pointer to your code? Maybe I can assist with some of the steps, like integration into libcore, or a libcore-based implementation of AlgorithmM/ |
My code is now at https://github.com/rkruppe/rust-dec2flt (virtually no git history though). The modifications to @lifthrasiir's bignum code are in my rust fork in the @pnkfelix Thanks for the offer, but splitting the work between two people seems tough. None of the steps are particularly hard in isolation and most depend on earlier steps. For example, half of the testing can't be done before f32 is properly supported, and integrating into libcore before the code is 99% done would just slow down the edit-compile-test cycle significantly. That said, additional test cases are always good and the optimizations should be independent from everything else. If either interests you, take a look at what's there and submit a PR. Since you'll be the ideal reviewer for the PR, even reading through the code will be valuable. And if you notice a bug, a possible optimization, a way to simplify the code, etc. — great! |
Quick FYI, AlgorithmM is implemented and everything works with |
Completely rewrite the conversion of decimal strings to `f64` and `f32`. The code is intended to be absolutely positively completely 100% accurate (when it doesn't give up). To the best of my knowledge, it achieves that goal. Any input that is not rejected is converted to the floating point number that is closest to the true value of the input. This includes overflow, subnormal numbers, and underflow to zero. In other words, the rounding error is less than or equal to 0.5 units in the last place. Half-way cases (exactly 0.5 ULP error) are handled with half-to-even rounding, also known as banker's rounding. This code implements the algorithms from the paper [How to Read Floating Point Numbers Accurately][paper] by William D. Clinger, with extensions to handle underflow, overflow and subnormals, as well as some algorithmic optimizations. # Correctness With such a large amount of tricky code, many bugs are to be expected. Indeed tracking down the obscure causes of various rounding errors accounts for the bulk of the development time. Extensive tests (taking in the order of hours to run through to completion) are included in `src/etc/test-float-parse`: Though exhaustively testing all possible inputs is impossible, I've had good success with generating millions of instances from various "classes" of inputs. These tests take far too long to be run by @bors so contributors who touch this code need the discipline to run them. There are `#[test]`s, but they don't even cover every stupid mistake I made in course of writing this. Another aspect is *integer* overflow. Extreme (or malicious) inputs could cause overflow both in the machine-sized integers used for bookkeeping throughout the algorithms (e.g., the decimal exponent) as well as the arbitrary-precision arithmetic. There is input validation to reject all such cases I know of, and I am quite sure nobody will *accidentally* cause this code to go out of range. Still, no guarantees. # Limitations Noticed the weasel words "(when it doesn't give up)" at the beginning? Some otherwise well-formed decimal strings are rejected because spelling out the value of the input requires too many digits, i.e., `digits * 10^abs(exp)` can't be stored in a bignum. This only applies if the value is not "obviously" zero or infinite, i.e., if you take a near-infinity or near-zero value and add many pointless fractional digits. At least with the algorithm used here, computing the precise value would require computing the full value as a fraction, which would overflow. The precise limit is `number_of_digits + abs(exp) > 375` but could be raised almost arbitrarily. In the future, another algorithm might lift this restriction entirely. This should not be an issue for any realistic inputs. Still, the code does reject inputs that would result in a finite float when evaluated with unlimited precision. Some of these inputs are even regressions that the old code (mostly) handled, such as `0.333...333` with 400+ `3`s. Thus this might qualify as [breaking-change]. # Performance Benchmarks results are... tolerable. Short numbers that hit the fast paths (`f64` multiplication or shortcuts to zero/inf) have performance in the same order of magnitude as the old code tens of nanoseconds. Numbers that are delegated to Algorithm Bellerophon (using floats with 64 bit significand, implemented in software) are slower, but not drastically so (couple hundred nanoseconds). Numbers that need the AlgorithmM fallback (for `f64`, roughly everything below 1e-305 and above 1e305) take far, far longer, hundreds of microseconds. Note that my implementation is not quite as naive as the expository version in the paper (it needs one to four division instead of ~1000), but division is fundamentally pretty expensive and my implementation of it is extremely simple and slow. All benchmarks run on a mediocre laptop with a i5-4200U CPU under light load. # Binary size Unfortunately the implementation needs to duplicate almost all code: Once for `f32` and once for `f64`. Before you ask, no, this cannot be avoided, at least not completely (but see the Future Work section). There's also a precomputed table of powers of ten, weighing in at about six kilobytes. Running a stage1 `rustc` over a stand-alone program that simply parses pi to `f32` and `f64` and outputs both results reveals that the overhead vs. the old parsing code is about 44 KiB normally and about 28 KiB with LTO. It's presumably half of that + 3 KiB when only one of the two code paths is exercised. | rustc options | old | new | delta | |--------------------------- |--------- |--------- |----------- | | [nothing] | 2588375 | 2633828 | 44.39 KiB | | -O | 2585211 | 2630688 | 44.41 KiB | | -O -C lto | 1026353 | 1054981 | 27.96 KiB | | -O -C lto -C link-args=-s | 414208 | 442368 | 27.5 KiB | # Future Work ## Directory layout The `dec2flt` code uses some types embedded deeply in the `flt2dec` module hierarchy, even though nothing about them it formatting-specific. They should be moved to a more conversion-direction-agnostic location at some point. ## Performance It could be much better, especially for large inputs. Some low-hanging fruit has been picked but much more work could be done. Some specific ideas are jotted down in `FIXME`s all over the code. ## Binary size One could try to compress the table further, though I am skeptical. Another avenue would be reducing the code duplication from basically everything being generic over `T: RawFloat`. Perhaps one can reduce the magnitude of the duplication by pushing the parts that don't need to know the target type into separate functions, but this is finicky and probably makes some code read less naturally. ## Other bases This PR leaves `f{32,64}::from_str_radix` alone. It only replaces `FromStr` (and thus `.parse()`). I am convinced that `from_str_radix` should not exist, and have proposed its [deprecation and speedy removal][deprecate-radix]. Whatever the outcome of that discussion, it is independent from, and out of scope for, this PR. Fixes #24557 Fixes #14353 r? @pnkfelix cc @lifthrasiir @huonw [paper]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.4152 [deprecate-radix]: https://internals.rust-lang.org/t/deprecate-f-32-64-from-str-radix/2405
This example program produces a frighteningly large amount of numbers that, without even leaving the save haven of
[0.0; 1.0)
, change when printed out and read back in:Not all numbers fail to round trip, but in all my trials a majority did. The error is in the order of 1e-16 (much more, around 1e-6, for the default
{}
formatting) for all outputs I looked at, but that is more than IEEE-754 permits (it requires correct round trips when 17 decimal digits are printed) and certainly more than users should be forced to endure. Perfect round tripping is possible, useful, and important --- it's just a pain to implement.I have recently worked my way through float formatting, and at the very least it looks pretty naive (as in, code and comments read as if the authors didn't care at all about rounding error, or were entirely unaware that it's a thing). I also just skimmed over the
FromStr
implementation and it looks a bit smarter, but since there is no reference to any paper and it doesn't use bignums I have doubts about its accuracy as well.Many numbers reach a fixed point after one round trip, but there are also numbers which take hundreds, thousands, or even more round trips before they settle down --- if they settle down at all (some cycle). The largest I found was almost nine million round trips, which is almost absurd. For obvious reasons, the error changing (presumably, growing) over time is even worse than a one-time inaccurate rounding. Here's the program I used to find those.
For the formatting side of things, I also filed #24556
I know less about the parsing side of things. The topic seems to get less attention in general, but I found a paper by William Clinger which seems promising.
cc @rprichard
The text was updated successfully, but these errors were encountered: