-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add syntax for hexadecimal floating point values #562
Comments
I would like to second this. Rounding errors because of the decimal to float and vice versa have bothered me for ages. |
Ping? Is there anything I can do tho speed this up? |
As I noted in #617, we won't introduce new syntax in TOML 1.0 on top of TOML 0.5.0. While I do see the benefits to this, I'm going to defer this until after TOML 1.0. |
Copied from #617 since this comment is more appropriate in this issue. Why not use decimal floating-point? It's kinder to humans (and isn't that the point of TOML?). Modern processors are now starting to support the decimal floating-point standard defined in IEEE 754-2008. For example the Intel C compiler now has support for the data types and functions to check the support status. The C data types in the standard are _Decimal32, _Decimal64. and Decimal128. Constants/literals in C use a suffix to denote which data type the value is to be represented in. I believe the suffixes are as per this proposal in 2008:
These denote 32/64/128 bit floating-point values, and with the 'd' or 'D', represent a decimal floating-point value. For TOML, I think this could go one of two ways. Either:
The first is backward compatible. The second is new-syntax. |
You cant represent the the whole range of IEE754 floats accurately. I agree that's its nicer to humans, but you will lose the ability to accurately define numbers.
no desktop processors supports decimal floating point so far.
Thats a very different usecase than this issue (which is just another syntax for the already supported IEEE754 floats, a program using a TOML parser would just get back a double and not a new type), and since a float is rather different than a "decimal" I would not pick 1. |
I guess I come from a billing and revenue management system background where server class processors are the norm, decimal values are required, but it would be nice to use floating point for performance. Binary floating-point values don't suit because they can't accurately represent decimal values. Having to convert strings to decimals is a nuisance when the file format could just support it directly. But I accept that my comment is off-topic. In domains other than my own, such as scientific processing, I'm sure hex floating-point would be very useful. |
I am not from this background, but I would have expected that fractional fixed-point math (like using integers representing 1/1000 of a currency) would be used. you don't need huge exponents, do there is no real need for a exponential format?
That's nitpicking, but a conversion has to happen somewhere. The bigger issue is if the conversion is lossy (which is generally the case if you convert between base2 and base10).
you cant directly specify (base2) values otherwise, which is basically used everywhere. |
There are some common languages that support this natively, and some that don't; it's about 50/50: Support: C, C++, Java, Go, Zig, Swift, Lua, Perl, Python, Haskell (GHC extension) No support: C#, Ruby, Rust, PHP, JavaScript, TypeScript (no spec, but I tried it and it didn't work) This means that implementations for those languages will have to parse these strings to a I think it's a rare enough feature that the added complexity for implementations isn't worth it, but I'm not dead-set against it. |
TBH I'd say parsing hexfloats manually is trivial for people who are already writing a parser (given the necessary experience they'd already have); this is toml++'s implementation: It *looks* like there's a lot going on, but most of it is error handling - the actual hexfloat parse logic is only about 40 lines. I'd be happy for any implementers to use my implementation as a starting point. |
That's roughly the number I had in mind, so nice to have that confirmed. It's not hard-hard, but it's not "just a few lines added in 10 minutes" either, which is what I meant with trivial. For comparison, the parsing code (excluding lexer) is about 800 lines for BurntSushi/toml right now, so ~40 lines is comparatively a lot for a little-used feature (will also need to tweak the lexer a bit). On the other hand: it's also the case that some values are hard or even impossible to express without this, but a number of popular programming languages seem to fare well enough without it, including some lower-level ones like Rust. Personally I've never needed this in my entire 25-year programming career, but I've also done limited low-level stuff. It's a trade-off; personally I'd lean towards "it's not worth it", but it's a small lean. Do you know of people using that feature in your library by the way? I couldn't really find anything on your issue tracker, or TOML files using it with GitHub code search: https://github.com/search?q=%2F0x%5Ba-fA-F0-9%5D%2B%3F%5C.%3Fp%5B-%2B%5D%2F&type=code |
Me, in a private project :) |
You'll want to move Python to the Support camp, because |
The smaller you get, the less likely it is that your library function will end up with the same value after parsing, see for example https://keithp.com/blogs/picolibc-string-float. I am not sure if even standards-compliance will guarantee the same value after round-trips, and "arbitrary-precision" implementations need much code and ram. Look at the popular [STM32] MCUs(https://www.st.com/en/microcontrollers-microprocessors/stm32f3-series.html) for example - performant FPU but couple KB ROM/RAM. The feature you would be adding is lossless data exchange across platforms. |
@nolange the question isn't "is this useful in some cases?" but rather "is this useful enough"? Almost every feature is useful, but also comes with a cost for implementations, complexity, etc. TOML could easily be three times the size if we included every feature that was occasionally useful. It's not really clear to me that many people want to use TOML on STM32 systems, and everyone else on a less constrained system can use a string if they really need this. It's not clear to me what @marzer's use case is exactly, but thus far it seems you're the only one who really needs this.
Cheers, I updated it. Also turned out that Haskell supports it as a GHC extension. |
It's a serialization problem. Sometimes floats need to be exactly round-tripped via a TOML file, and TOML (or, more specifically, the float<->string utilities provided to the TOML implementer by their language) won't necessarily guarantee that since the conversion to decimal form during serialization (i.e. formatting the data as TOML), and then at the other end from string->float during deserialization, can lead to precision loss issues. Keeping it as hexfloats at every step mitigates that. You could argue that TOML isn't really the right tool for this, and you'd be right in my particular case (I could have handled it differently but re-using my existing TOML infrastructure kept things very simple). It's a tricky case because the majority of people will never use it, but those who do, tend to really need it. TOML's inclusion of dates and times also falls prey to this dichotomy, methinks. edit: wording |
@marzer 's use case is interesting. Until yesterday I considered this as "neat, but too specialized to warrant the additional complexity." However, while I know TOML is chiefly meant for configuration, I think it can useful for other use cases too, and serialization is a compelling one. Moreover, it's a kinda logical extension that brings the ways of writing integers and floats into better alignment. These are not all that compelling arguments, and the "additional complexity" arguments still weights against it, but you can now count me in the "slightly in favor" rather the "slightly against" camp. |
In terms of complexity, parsing/printing the "normal" float format will overshadow everything else. I get that this doesn't matter if those routines are available already, if they aren't then the costs are high.
Everyone that cares that a float can be stored and retrieved without a loss. Naturally you would use a really simple one like Means you might get different results for reading a toml file containing decimal floats depending on the language/standard library used. I know you say this is an additional feature, but I am not sure it's understood that this is something functional and not just "cosmetic". |
As if by divine providence, here's an article published today by the author of C++'s https://www.zverovich.net/2023/06/04/printing-double.html (you'll note that towards the end of the article, hexfloats are listed as a more robust alternative in the absence of sensible |
You can use string representations for this (e.g. Imaginary numbers can't be represented in TOML either for example, other than using a string (or an [int, int] array, I guess). While this is certainly useful in some contexts, using strings is "good enough" for most use cases IMHO. My sense of "taste" says this doesn't need to be in TOML, which is not very tangible, and I can definitely understand the arguments in favour of it. If someone would make a concrete PR then I probably won't vote against it (probably! Not a promise!) |
Yeah, I guess, but then why not just do that for everything? Why have a bool type, when Plus, from a user perspective it's inconsistent and error-prone; the neighbouring int, float and boolean KVP's are happily quote-free, but don't forget the quotes around this special magic one! =/ |
Because lots more people want to use |
Personally, I would prefer to see literal decimal syntax be adopted first. But I wouldn't object to the introduction of hexfloats, given proper limits. We can only guarantee so many hexadecimal places are preserved in practice. At the barest minimum, three hex digits would allow "thousandths" (the standard we use elsewhere for decimal accuracy) to be represented accurately, but I'm assuming most use cases would use many more than three digits. |
I personally view TOML more as a user-friendly configuration format, distinct from XML or JSON. These latter formats tend to be challenging for humans to read and write without the aid of special tools, making them less ideal for configuration. While I appreciate the advantages of storing floating-point values, especially as a C++ and embedded developer, I believe this practice is not a common occurrence in a configuration file. Instead, it seems more applicable to data serialization. As I thought about situations where I might employ hex floats, I found the following:
From this list, I personally feel that only the first point holds minor relevance in the context of software configuration. The last three scenarios seem more related to code execution or debugging rather than configuration. I think rounding errors become highly significant when dealing with small floating-point values, such as a 32-bit In conclusion, from my perspective, for the few cases where precise representation of floating-point values is essential, storing these values as strings seems acceptable. Moreover, I believe it's crucial to remember the human-first nature of TOML. Personally, I find the hex float format challenging to comprehend. I struggle to convert the fractional part of a hex float into decimal form mentally. In my view, this lack of intuitive understanding contradicts the underlying premise of TOML, which prioritizes human-readability and interaction. |
Yet you support hex, octal and binary integers and Neither the 32bit floats nor the 'small' floating-point values are an argument, the problem is always:
Don't see how another way to write a float value is different to allowing an int to be specified as hex, you still can write your "readable" values and allow values that are way easier to read as base2/base16 mantissa / exponent too. |
We talked about this feature on and off for 5 years. Is it worthwhile? I could see a use for it. Is it minimal and obvious? For human beings, not so much, though for machine-generated TOML it's more obvious for accurate floats, lacking better alternatives. Should we include it as part of TOML v1.1.0? No. Let's defer it. Tag it "post-v1.1.0", and come back to it for a possible future version. |
We don't need a tag for this -- a PR for this would be welcome, but it likely won't be merged for a TOML 1.1.0 release. :) |
C11/C++11 addedhexfloat as alternative floating point representation.
The underlying reason is that machines represent floats in base 2, and the other output formats are base 10 which is a significant issue where you need to calculate with huge integer numbers to guarantee correct results. Look at some implementations like double-conversion for reference, some even use heap allocations for the large integers required.
The new hexfloats are trivial to parse.
TOML would be a nice option for small and embedded systems aswell, easily reading in float values would help alot there. So please add them, I believe the addition to be conflict-free to existing TOML files (just as it was added conflict-free to the C/C++ standards)
Format description is taken from floating_literal.
The text was updated successfully, but these errors were encountered: