-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust does not comply with IEEE 754 floats: arithmetic can produce signaling NaN #107247
Comments
Interestingly enough, MSVC does the same transformation even with I've access to the actual text of IEEE 754-2019 through my current university (I suppose I could've grabbed 754-2008 instead, but I grabbed the latest revision), so here's the actual relevant quotes:
§5.12 is about conversions between floating-point data and character sequences, so is not relevant. Multiplication is a "formatOf general-computational operation". Thus, I agree that the standard mandates that If we relax the language specification from "default floating point environment" to "implementation defined floating point environment" and allow the language to arbitrarily change the exception handling mode, then we can justify the transformation, by first setting the exception handling mode for invalid operation to FWIW, alive2 times out for
I tried using the (experimental?) constrained floating point intrinsics, but I don't think alive2 supports the Of note from the Rust docs, though, is that a bunch of
I think there's a fair chance that the Rust opsem will make no distinction between signaling and quiet NaNs. I can somewhat justify a perverse reading of IEEE 754-1985 that allows all binary encoded NaNs to be quiet; 754-2008 and 754-2019 of course mandate a specific interpretation of the mantissa MSB as indicating quiet (set) or signaling (cleared), but Rust does support targets such as MIPS which do not agree with the 2008 definition. |
I would cc T-opsem and/or A-mir-formality but I don't think either have a pingable role. For lack of a better classification, @rustbot modify labels +T-lang |
As much as I am usually in favor of pedantic adherence of IEEE754, it is extremely dubious that we would want to guarantee that this transformation would not occur, as Rust does not meaningfully support floating point exceptions and thus does not support the utility of this exception to the identity. |
I think the argument would be that sNaN vs qNaN is still a meaningful difference even if exceptions don't exist. |
Plausible. |
Yes, that's exactly it. It's not observable just via It's very likely that the strictest bound we'll get for Rust is that floating point operations returning NaN return a nondeterministic2 quiet NaN. This is (almost3) compliant with the IEEE spec in the We're not concerned about the rest of the NaN payload here (though that's also its own unresolved question), just the signaling bit. Unlike the rest of the NaN payload an even sign, the IEEE standard does explicitly require that no operation produce a NaN with the signalling bit unquiet; all operations must produce quiet NaNs. Wasm is brought up as a target with nondeterministic NaNs, but it actually does only ever produce quiet NaNs. The spec states that "There is no observable difference between quiet and signalling NaNs," but this is only in respect to the fact that exception flags are not exposed in wasm. Separately, wasm defines the concept of an arithmetic NaN, which exactly matches the IEEE definition of quiet NaNs. It then requires all arithmetic operations to produce arithmetic NaNs (and if all NaN inputs are canonical NaNs, for the output to be a canonical NaN6). Speaking on behalf of myself only, I would place slim odds favoring LLVM and subsequently Rust deciding to treat all NaNs as being quiet, willfully ignoring the signaling bit and violating the spec in a minor way (producing sNaN from arithmetic operations) in the name of performance. This seems to me the simplest way to reconcile the current behavior7 with the IEEE spec. LLVM's documentation on the constrained floating-point operations states that mixing the unconstrained and constrained operations is not allowed, and that when in More recent discussion on the LLVM discourse forum: https://discourse.llvm.org/t/semantics-of-nan/66729 Footnotes
|
Floating-point operations returning a nondeterministically chosen quiet NaN is perfectly compliant. Also bear in mind that Rust itself does not implement the full scope (in particular, flags) of IEEE 754, it only claims that floating-point arithmetic operations follow it. I do not imagine there being a significant performance hit to disabling that "optimization". The only thing multiplication by 1 does is quiet a signaling NaN. Same with
That's so totalOrder can be implemented efficiently! The ordering you get from it is the same as the natural order from interpreting the float as a sign-magnitude integer (assuming you're using the recommended [but not mandated] choice of signaling bit). |
Yes, but it's been expressed a couple times that it'd be nice to have deterministic results.
The reason I expect it to be more expensive is not due to what we actually want to disable, but what will get prevented in the crossfire.
Huh yeah, you're right; I somehow got it in my head that the latter standard revisions prescribed the interpretation of the signaling bit rather than just providing a recommendation for interchange. This means the (questionable) interpretation where your binary format for floats contains no encodings for signaling NaN ("should"), but your operations do "support" signaling NaN ("shall") should they be given one (impossible) is still (arguably) somewhat remotely viable. Providing such an implementation of IEEE floating point would I think be an accurate description of the current behavior (modulo FFI visibility of floating point exception flags). |
It's not possible to guarantee that a specific qNaN is produced everywhere, even just on the current tier 1 platforms. In particular, the default qNaN on x86 always has the sign bit set (0xFFC00000 for f32), but it's always cleared on aarch64 (0x7FC00000 for f32).
Firstly, we shouldn't consider ignoring sNaNs, because they are in fact something that exists and are in fact supported by the targets. This miscompilation causes the following to behave differently between debug and release:
That is, the first bit of the significand must always determine whether it is an sNaN or a qNaN, but the standard doesn't mandate that a set bit means it is a qNaN. |
So is nondeterminism okay or not? By the standard1, I'm not talking about whether results are portable between different targets; I'm talking about whether results are deterministic within a single execution. And this is in fact a very important condition to discuss. NaN selection being either "deterministic across all targets" or "nondeterministic on all targets" are fine for optimizations, because all targets have the same behavior, and thus our target independent IR has a single behavior we can optimize over. If NaN selection must be deterministic within a single target but changes between targets, this changes the set of optimizations we're able to do. Consider again the example of This is why I say that the strongest semantic I think Rust will get is that every time a NaN is produced by a computational operation, that NaN shall be nondeterministically2 selected from the set of all qNaN3. This gives us a single semantic to optimize over which is validly refined by every correct implementation of the IEEE standard.
If you consider Rust as targeting LLVM, no, not really. LLVM makes no effort to preserve the signaling quality of NaN. Whether the disclaimer just covers fp exceptions or also covers the signaling bit is debatable, but in practice this miscompilation is exactly due to LLVM pretending sNaN doesn't exist. The
I'm not saying that this reading of the standard is good. But so long as no bit other than Footnotes
|
Nondeterminism is okay; the standard does not in any way forbid it. However, the examples I gave do not rely on it. On a compliant implementation, they must succeed on all executions. The original example I gave simply iterated through every NaN until it found one which is changed by multiplication by 1, which must necessarily happen once it finds an sNaN. It fails to find one under optimizations. My second example generates a qNaN, turns it into an sNaN by toggling the signaling bit (this always works on any compliant platform), and then multiplies it by 1, and then asserts that it is in fact changed (it must be, since it must be quieted). This also fails under optimizations. Assuming that we're not promising anything more than IEEE 754 does, we can optimize
Yes, this is an LLVM bug, as I've mentioned previously. LLVM is non-compliant, which is a serious problem. |
To my knowledge, LLVM makes no attempt to guarantee signalling NaN correctness. Is there a bugreport on their side to confirm whether they even see this as a miscompilation? EDIT: Oh, the LangRef is actually explicit about this point. So yeah doesn't look like Rust has a lot of choice short-term here; this would require a long-term project to make LLVM IEEE-compliant and possibly expose more target-level NaN guarantees along the way. My current rationalization of this behavior is that Rust simply does not implement the parts of the spec that distinguish signalling and non-signalling NaNs. IOW, Yes, this means Rust would not be IEE 754 compliant -- though only in ways that popular C compilers are also already non-compliant, for whatever that is worth.
Why is it a serious problem? (Not a rhetorical question. Given that this behavior is widespread among C compilers, and hard to observe in the floating point environment Rust programs pretty much have to run in, I do not see what the serious practical issues are that are being caused by Rust implementing a weaker spec, where NaN outputs are picked non-deterministically from the set of all NaNs, signalling or quiet. The vast majority of floating point code does not care about signalling vs quiet NaNs, so making them pay for the niche code that does -- by having fewer optimizations -- is not a obvious win either.) |
That's At the moment it seems to be leaning slightly towards it being a documentation bug that the disclaimer about fp exceptions doesn't permit not distinguishing between sNaN and qNaN, but it's also possible that the other resolution of handling NaN more carefully will be taken. (Unfortunately, the |
Yes, it's an LLVM problem for now. I am not so sure how long-term it would be in terms of implementation. In the very least we should document our noncompliance.
Certainly. However, it is noncompliant.
It's a serious problem for a number of reasons. From a practical perspective, signaling NaNs (with trapping exceptions) are in principle very useful debugging tools, but no one can use them because compilers don't implement them properly. There are many times I have personally wanted to use them for their intended purpose (debugging and sentinel values), only to be thwarted by
As soon as you do anything else, you can delete the multiplication by 1 (or division by 1, or addition of -0.0). Also, C compilers are hardly a good role model for what to do with floating-point. They have been implementing invalid floating-point optimizations and causing numerical errors and reproducibility headaches since forever. It's only recently (with much suffering on part of my seniors in the verification arena) that they've been somewhat tamed. Surely Rust can do better? |
Even if the incorrect production of sNaN is fixed, Rust is unlikely to ever guarantee that floating point exceptions are not raised spuriously. Providing this guarantee is actually a significant deal, as it is required to allow speculative evaluation of floating point operations. Speculative evaluation is the simplest way to justify loop invariant code motion, where a computation invariant across loop iterations is hoisted outside the loop. This is one of the major benefits achieved by marking references as Clarifying whether we acknowledge that sNaN is a thing is a reasonable medium term goal. Supporting optional alternative floating point exception handling and environment access is a far long term aspiration that it's not even really worth considering until LLVM has support for such. Yes, it'd be nice to have, but ultimately IEEE-754 is more a spec for chip FPU behavior (w.r.t. fp environment and exception status flags) than it is higher level languages which want to do larger scale optimizations than possible when the fp environment isn't static. |
Is there a language that is a good role model, in the sense that it guarantees full IEEE compliance and even supports fp environments and exceptions? |
Although I'm unsure under which circumstance a spurious FP exception would be raised, this isn't really a dealbreaker? Currently there's no handling for it at all.
All of the functionality of IEEE 754 is intended for use by software. What else would be using it? The FP environment only matters inasmuch as software is interacting with it.
Language? C99 and later support the whole thing with |
Fully agreed on that one. The way I view this, better support for these special float features is a feature addition to Rust that will need some design work, to be considered together with things that go beyond what IEEE promises (e.g. making NaNs less non-deterministic). rust-lang/unsafe-code-guidelines#237 attempts to keep an overview. Not sure what to do with all the individual issues that consider various aspects of this and are littered with confusion caused by a lack of overview... |
x * 1.0
to x
Yes, I have been told there is a C-or-Fortran-or-both compiler with software support that is fully FP compliant or at least near enough to return very precise errors (the "main purpose" of NaN existing is to allow you to record such errors and roughly where they are in software)... unfortunately, it's closed-source, as far as I am aware. |
My dumb question is why are we trying to fully support the IEEE floating standard? PartialCmp and PartialEq were introduced just for floats, and most languages aren't fully IEEE float compatible i.e. the C++ standard isn't. |
Closing as a duplicate of #73328, where we are tracking documenting our NaN guarantees. (We will almost certainly document that operations with SNaN inputs can produce SNaN outputs. That is LLVM semantics, ans they are unlikely to be willing to change that without a solid usecase.) |
The compiler optimizes
x * 1.0
tox
, which is incorrect wheneverx
is a signaling NaN. IEEE 754 mandates that the result is a quiet NaN.Example (rustc 1.66, with
-C opt-level=2
): https://godbolt.org/z/5qa5Go17MThis is almost certainly an LLVM bug. It's been filed over there under llvm/llvm-project#43070. I don't know which optimization pass is responsible for it.
The text was updated successfully, but these errors were encountered: