Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions that depend on the current C locale #11952

Closed
HertzDevil opened this issue Mar 30, 2022 · 5 comments · Fixed by #15195
Closed

Functions that depend on the current C locale #11952

HertzDevil opened this issue Mar 30, 2022 · 5 comments · Fixed by #15195

Comments

@HertzDevil
Copy link
Contributor

HertzDevil commented Mar 30, 2022

Some LibC funs, like strtod in String#to_f64?, and snprintf in Float::Printer#internal, depend on the currently active C locale. This means some oddities could happen if a different locale is active:

lib LibC
  LC_ALL = 6

  fun setlocale(category : Int, locale : Char*) : Char*
end

"1,23".to_f64? # => nil
1.23.to_s      # => "1.23"
1.23.to_s.to_f # => 1.23
"%g" % 1.23    # => "1.23"                   # `String::Formatter#float` also uses `snprintf`
1e23.to_s      # => "9.9999999999999992e+22" # Grisu failure case, triggers the `snprintf` path

# the decimal and thousands separators in german are swapped
LibC.setlocale(LibC::LC_ALL, "de_DE.UTF-8")

"1,23".to_f64? # => 1.23
1.23.to_s      # => "1.23"
1.23.to_s.to_f # Invalid Float64: "1.23" (ArgumentError)
"%g" % 1.23    # => "1,23"
1e23.to_s      # => "9,9999999999999992e+22"

All programs start with the C C locale, but third-party shards might nonetheless change it, leading to those hard to debug scenarios. To my understanding the entire Crystal standard library should be locale-independent.

Is there anything we could do here apart from reimplementing all of LibC's locale-dependent functions in Crystal? (There are probably other kinds of global state in the C runtime to avoid too.)

@HertzDevil HertzDevil changed the title Functions that depend on the C locale Functions that depend on the current C locale Mar 30, 2022
@HertzDevil
Copy link
Contributor Author

HertzDevil commented Mar 31, 2022

Here is a list of non-Windows functions I could find that depend on the C locale:

  • dprintf: Affects floating-point formatting specifiers (%a %e %f %g). Appears in Crystal::System.print_error, which is used in many places but does not have any floats. Locale-dependent behavior might affect compiler specs but I don't think there is a big need to replace this one.
  • printf: Similar to above. Appears when failing to raise an exception and when the GC outputs a warning; formatting specifiers are not used at all. (dprintf is probably more suitable here, as those errors really should belong in STDERR rather than STDOUT.)
  • snprintf: Similar to above. Sees the most uses:
    • Crystal::System.print_error on Windows
    • Float::Printer#internal, when Grisu3 fails. Indirectly affects Float32#to_s and Float64#to_s. Implement the Dragonbox algorithm for Float#to_s #10913 removes this usage.
    • String::Formatter#float, whenever a floating-point formatting specifier is used. The PR above does not touch this yet (even then it would cover only %f and some cases of %g).
  • strtod, strtof: Affects String#to_f32? and #to_f64? respectively. As a result of using these functions, conversion from hexfloats is actually possible:
    "0xa.bp+5".to_f # => 342.0
    Some numeric specs in the standard library use a custom hexfloat parser instead of this undocumented feature.
  • strerror: Appears in Errno#message. Might affect specs, and is also a public API, unlike dprintf.

On Windows it has been noted that FormatMessageW, which appears in WinError#message, is also locale-dependent.

@phil294
Copy link

phil294 commented Aug 16, 2022

It is at least questionable that "1,23".to_f could ever succeed just based on the current user's locale, without explicitly asking for this, which is something the developer might never have expected.

So for anyone else affected by this like me, LibC.setlocale(LibC::LC_ALL, "en_US.UTF-8") (code from starting post) somehow did not fix this. Instead, this is what made it:

fun main(argc : Int32, argv : UInt8**) : Int32
	LibC.setenv("LC_ALL", "en_US.UTF-8", 1)
	Crystal.main(argc, argv)
end

so essentially just overwriting the users locale.

@straight-shoota
Copy link
Member

Some libc implementations have implementations liks strtod_l that allow specifying a locale, but they're missing on many platforms. So it wouldn't be a real solution.

I suppose the best solution is to move to native implementations for locale-based algorithms.

@HertzDevil
Copy link
Contributor Author

s2d.c and s2f.c from the Ryu repository seem to contain replacements for strtod and strtof. I do not know whether they have their own names, or whether they are indeed somehow different from other C runtimes' implementations apart from locale support

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Nov 8, 2024

Another alternative is https://github.com/fastfloat/fast_float, now part of GCC and llvm-libc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants