Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting infinity to integer makes value 0 #6737

Open
cometkim opened this issue Apr 21, 2024 · 32 comments
Open

Casting infinity to integer makes value 0 #6737

cometkim opened this issue Apr 21, 2024 · 32 comments

Comments

@cometkim
Copy link
Member

let t1 = infinity->Belt.Float.toInt
let t2 = infinity->Float.toInt

generates

import * as PervasivesU from "./stdlib/pervasivesU.js";

var t1 = PervasivesU.infinity | 0;

var t2 = PervasivesU.infinity | 0;

both are 0. so we don't actually have a way to compare int value with the infinity without %raw. because infinity in pervasive is float

Infinity is often used in arithmetic operations as well as a value indicating "no limit" (e.g. pagination)

#6038 also mentioned

@cometkim
Copy link
Member Author

Maybe we could guide the user by adding infinity_int etc, but it doesn't seem intuitive. Honestly I just made a bug by this, and didn't suspect it for several minutes of debugging.

@mununki
Copy link
Member

mununki commented Apr 21, 2024

I guess the compiler uses the bitwise operation to convert float to int e.g. 1.1 | 0 I think we should handle differently with Infinity. What do you think?

@cometkim
Copy link
Member Author

Yes for sure. It would be nice to have poly infinity, since not, at least there should be something to avoid pitfalls

@mununki
Copy link
Member

mununki commented Apr 21, 2024

Indeed, since we don't have a polymorphic number type, we have to handle it to avoid pitfalls.

@cometkim
Copy link
Member Author

since we don't have a polymorphic number type

or can we make it a keyword? then it can act like a polymorphic

@mununki
Copy link
Member

mununki commented Apr 21, 2024

Another pitfalls:

let t4 = (22130921800000. -. 0.)->Belt.Float.toInt
let t5 = (22130921800000.)->Belt.Float.toInt

generates

var t4 = 22130921800000 - 0 | 0; // -1044676288
var t5 = 2147483647; // max value of Int32

@mununki
Copy link
Member

mununki commented Apr 21, 2024

since we don't have a polymorphic number type

or can we make it a keyword? then it can act like a polymorphic

This issue doesn't seem to be one that can be solved with types. I think we need to rethink the rules for converting Float to Int.

@cometkim
Copy link
Member Author

cometkim commented Apr 21, 2024

Another pitfalls:

let t4 = (22130921800000. -. 0.)->Belt.Float.toInt
let t5 = (22130921800000.)->Belt.Float.toInt

generates

var t4 = 22130921800000 - 0 | 0; // -1044676288
var t5 = 2147483647; // max value of Int32

Honestly, this example doesn't feel like a trap to me. That's a natural property of the int(32) type in arithmetic.

It looks like another kind of problem 😅

22130921800000 - 0 | 0; // -1044676288 seems to happen because the implementation of Belt.Float.toInt is specially stripped by the compiler instead of an actual runtime cast, as expected.

The reason infinity is confusing is that, as mentioned above, it can be used as a kind of flag.

@mununki
Copy link
Member

mununki commented Apr 21, 2024

let t5 = (22130921800000.)->Belt.Float.toInt

Shouldn't this also be generated like this, at least in terms of consistency?

var t5 = 22130921800000 | 0;

@mununki
Copy link
Member

mununki commented Apr 21, 2024

Side note: Why did the compiler try to optimize numeric operations with Int32 for the int type? Because OCaml's Int64 can handle JS's safe integer (2^53).

@cometkim
Copy link
Member Author

Yes, I expect both examples to be equally 2147483647. but anyway it's a different kind of issue we have

@mununki
Copy link
Member

mununki commented Apr 21, 2024

Yes, I expect both examples to be equally 2147483647. but anyway it's a different kind of issue we have

Or, Another option is to not optimize and not change the value in generated js.

@cometkim
Copy link
Member Author

cometkim commented Apr 21, 2024

Why did the compiler try to optimize numeric operations with Int32 for the int type? Because OCaml's Int64 can handle JS's safe integer (2^53).

No, it because JavaScript JIT engines are actually perform better with int32. Inside the JS optimizer, there is a clear distinction between numbers that exceed 32 bits and those that don't, and being able to guarantee 32-bit integers in a specific path has huge optimization benefits.

A bit off-topic, JavaScript's subtle number handling causes problems when interoperating with other systems. This often causes interoperability problems with most systems that make a stricter distinction between int32/int64/float32/float64/etc.

For example, GraphQL Int specifies 32-bit precision in the spec, but often larger values are passed by JS clients.

@cometkim
Copy link
Member Author

cometkim commented Apr 21, 2024

It might be "correct" for infinity to be of type float if we treat it as a part of the IEEE 754 specification. In that case, it also makes sense to explicitly restrict casting to prevent Infinity from being abused.

However, it still poses problems for interoperability with other JS libraries that use Infinity as a flag.

@mununki
Copy link
Member

mununki commented Apr 21, 2024

Okay, I'm still not sure about the rules for converting float values beyond int32 to int actually, but I think the inconsistency between t4 and t5 is another issue. I'll open another issue for it.

Another pitfalls:

let t4 = (22130921800000. -. 0.)->Belt.Float.toInt
let t5 = (22130921800000.)->Belt.Float.toInt

generates

var t4 = 22130921800000 - 0 | 0; // -1044676288
var t5 = 2147483647; // max value of Int32

I agree that int32 coercion process should not happen for infinity.

@mununki
Copy link
Member

mununki commented Apr 21, 2024

since we don't have a polymorphic number type

or can we make it a keyword? then it can act like a polymorphic

Can you elaborate this?

@cometkim
Copy link
Member Author

since we don't have a polymorphic number type

or can we make it a keyword? then it can act like a polymorphic

Can you elaborate this?

we can make it have an independent meaning on the context rather than just the value ident. but never mind, I've changed my mind. That wouldn't be a good idea. Because it's actually a value, if we don't treat it as a value it'll have other problems.

@glennsl
Copy link
Contributor

glennsl commented Apr 21, 2024

Infinity, and nan, are really no different than float values such as 1.1. They don't exist as int values, and Float.toInt is necessarily going to be lossy. If you need a type that can represent these values you either need to use float, or use a custom conversion function that, for example, returns an option<int> where Infinity and nan is translated to None.

@cometkim
Copy link
Member Author

cometkim commented Apr 21, 2024

Mergin #6736 to this.


I don't think there's a problem with the overall ReScript's semantic. However, it inherits some of the JS ecosystem where all numbers can potentially be converted to floating anytime.

When we write a library binding, we simply make an interface float to represent the JS numbers. I could say that is naive, but I think it's pretty conventional today. (even we use that example in the official blog post)

Actually, even though I'm pretty familiar with ReScript, at one point I made a bug (I simply made the initial value to infinity, because that's ok in JS) and didn't suspect it was the problem for a while.

I think we can at least generate a compiler warning like we do with the untagged variants.

Even further, maybe we can create a new number type that is explicitly selected for JS interop purposes.

@glennsl
Copy link
Contributor

glennsl commented Apr 21, 2024

I'm not sure what you're asking for here. As far as I'm aware, float already maps cleanly to JS numbers. It might be more intuitive if it was called number, but float is more accurate, and makes the distinction from int more clear.

@mununki
Copy link
Member

mununki commented Apr 22, 2024

I think that the comment by @glennsl is the good reference for this issue #6038 (comment)

I suppose the rationale can be given by the conversion function defined by EMCA-262 (i.e. The JavaScript language specification). This seems like the most natural and likely most performant conversion, given ReScript's goal of seamless integration with JavaScript.

I'll consider the issue resolved based on this.

@cometkim
Copy link
Member Author

cometkim commented Apr 22, 2024

float already maps cleanly to JS numbers. It might be more intuitive if it was called number, but float is more accurate, and makes the distinction from int more clear.

No, considering int and float as subsets of JS number is more practical from an implementation perspective and makes sense in the actual JS engines.

Infinity, and nan, are really no different than float values such as 1.1

And in the JS spec Infinity has specified behavior with all number operations. So there is actually a fast-path in operations between an integer and the Infinity. (sorry for confusing here, I'm checking docs again)

So I would not say that is completely wrong for all number values in JS to have to be mapped to float or for users to try to operate on int values and infinity.

Let me drop an example, imagine there is a JS animation library that doesn't support partial repetition. And has a repeat option as an integer.

FFI authors have no hesitation in modeling this as an int (or bigint). It's completely idiomatic in the JS world to assign infinity instead of an arbitrarily large value when they want an "infinite loop" instead.

Or if even something more "correct":

@unboxed
type repeat =
   // isn't supported
   | @as(infinity) Infinity
   // probably wrong for precision, but it's practical choice
   | Count(int)

type options = {
  repeat: repeat,
}

@glennsl
Copy link
Contributor

glennsl commented Apr 22, 2024

I'm not sure what you mean is contradictory with what I'm saying. As far as I can tell, float is not a subset of JS numbers, but entirely overlapping. +/-Infinity are values defined in IEEE-754.

The underlying difference between int and float is memory representation, which has a big impact on performance. ints are 32-bit and have a relatively straightforward memory representation, 1 in binary is 0...0001, 2 is 0...0010 and so on. floats are 64-bit and have a much more complex representation, as specified by IEEE-754. int does not have a representation for Infinity, it simply does not fit in that data type, therefore your repeat type would effectively be a float, i.e. the int count would also be 64-bit floating point numbers.

I can see why you would want to model it this way from a conceptual perspective, but I fear this would just further obscure what's actually going on and invite more confusion and trouble, for relatively little gain.

@cometkim
Copy link
Member Author

The underlying difference between int and float is memory representation

Yes for sure, and I thought that int in ReScript exactly mapped to V8's SMI type, a special form of JS number, the only value representation that doesn't create a heap object in V8. As far as I understand, ReScript was designed to be optimized for V8 and that's why the int type exists in the first place.

All Number operations defined in the ECMA specification refer to NaN and Infinity first. The SMI implementation doesn't behave any differently than the spec, and no additional casting happened in operations with special boundary values like Infinity, NaN, 0.

@glennsl
Copy link
Contributor

glennsl commented Apr 23, 2024

The int type comes from OCaml and is originally optimized for its binary representation, which is 31-bits because OCaml uses a tagged pointer representation like V8. I'm not aware of any work having been done to optimize ReScript's int representation for V8 specifically, and don't believe this was much of a factor in the choice of using OCaml as the underlying language. But of course the underlying reasons for them both being designed this way are likely very similar.

From what I can tell, SMI does not include any representations for Inifnity or NaN. If any such value is included along with int's, they will all be represented as floating point numbers on the heap, or if in an array, as doubles. See for example https://v8.dev/blog/react-cliff and https://v8.dev/blog/elements-kinds

@cometkim
Copy link
Member Author

cometkim commented Apr 23, 2024

From what I can tell, SMI does not include any representations for Inifnity or NaN. If any such value is included along with int's, they will all be represented as floating point numbers on the heap, or if in an array, as doubles.

I mean, there are no additional allocations in smi + Infinity because Infinity is statically allocated. It just returns Infinity right after the boundary check of params. (it's the standard behavior, not an optimization)

@glennsl
Copy link
Contributor

glennsl commented Apr 23, 2024

There isn't for Inifnity itself, but there is for every other value of the type, because the possibility of Infinity being present means they have to be represented as floating point numbers on the heap rather than as SMI's. Otherwise it wouldn't be able to fit all the possible values of the type.

@cometkim
Copy link
Member Author

I don't think it returns to a different values after the basic operations (except min/max) with +/-Infinity or NaN. It's a closed path. So I'm asking, are there any potential problems if we allowed operations with ints?

@cometkim
Copy link
Member Author

cometkim commented Apr 23, 2024

To clarify, I am not (yet) advocating here to make infinity a valid int value.

If our infinity is clearly a float, I think we should at least explicitly disallow downcasting (unlike other values) by runtime and compile time error, as Infinity and NaN have special meaning in ECMAScript spec unlike other numbers

@glennsl
Copy link
Contributor

glennsl commented Apr 23, 2024

I don't think it returns to a different values after the basic operations (except min/max) with +/-Infinity or NaN. It's a closed path. So I'm asking, are there any potential problems if we allowed operations with ints?

Depends how you define problems. I think it might lead to confusion about the underlying fundamentals, which might lead to mistakes that cause poor performance. Operations aren't determined by values directly, but by types inferred from the values used. If you have a box that mostly contains ints and then try to put Infinity into it, it will be converted to a pointer to a float in order to fit it, which will be allocated on the heap and cause the FPU to be used instead of the ALU, which in turn means values have to be pulled from slower memory and memory usage to increase significantly. Usually not a problem, but sometimes it is, and it's this kind of abstraction that causes poor general understanding of fundamental computing and makes the software we use unnecessarily slow.

If our infinity is clearly a float, I think we should at least explicitly disallow downcasting (unlike other values) by runtime and compile time error, as Infinity and NaN have special meaning in ECMAScript spec unlike other numbers

On a conceptual level I kind of agree, but this goes against JS semantics and will make float to int conversion significantly more expensive.

@cometkim
Copy link
Member Author

but this goes against JS semantics

What does that mean? Checking if values are Infinity or NaN first is exactly the JS semantic

And sure adding runtime check is redundant and unnecessarily expensive. But if we keep int as a representation for OCaml's rather than SMI (which is a valid JS number), we need some safeguards. (At least for direct usage of infinity literal)

@cometkim
Copy link
Member Author

There was too much divergence. Let me summarize the arguments:


infinity is correctly expressed as a float, and changing the value representation is dangerous, I agree with that.

But we have an actual problem here, given the following situation (pseudocode):

1: val page, max

2: max = infinity

3: while (page < max)

For JS users, rather than modeling the page as a float considering operations with infinity, it is natural to designate it as an int type and then find a way to compare infinity with int when necessary. Because it's completely allowed and even idiomatic in JS.

Users may reach Float.toInt, which never works as intended. Because ReScript treats Float.toInt as the ToInt32 abstract operation.

abstract operations in ECMAScript spec are not the JS semantic and are never directly exposed to the JS user. The only number operations that reference ToInt32 are Number::leftShift, Number::signedRightShift, and Number::bitwise*, all of which are never defined in ReScript int.

When infinity->toInt is encountered, it can always be replaced by just 0 or it will be a bug. But users won't be able to find it first until they look closely at the generated codes.

This is not a good thing for ReScript. This is just as dangerous as the non-exhaustive matches we care about.

I can suggest a few options that I explored here:

  1. Adds a compiler warning if the infinity literal bound to a well-known casting to int ops, such as toInt, Float.toInt, Belt.Float.toInt, Core.Float.toInt, fromFloat, Int.fromFloat, Belt.Int.fromFloat, Core.Int.fromFloat. This may be unnecessarily expensive, so it may need to be a separate process rather than the compilation. (e.g. reanalyze, dedicated linter)

  2. Allows infinity to have a special meaning in context or we can simply add something like the infinity_int literal. This means that int gets closer to the meaning of SMI (in V8) so is interoperable with the JS numbers for some specific cases.

  3. Add safer API to the stdlib to replace Float.toInt, Int.fromFloat like [TableCloth](https://github.com/darklang/tablecloth-rescript/blob/06cd102/src/TableclothFloat.resi#L752 -L753) does. But this could be a major breaking change to the Core. It imposes unnecessary migration costs on users even who have never used infinity value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants