Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invariance test fails on Apple arm+clang #1452

Closed
erikcs opened this issue Sep 25, 2024 · 3 comments
Closed

Invariance test fails on Apple arm+clang #1452

erikcs opened this issue Sep 25, 2024 · 3 comments
Labels
requires research An issue that needs additional thought and experimentation before it can be implemented.

Comments

@erikcs
Copy link
Member

erikcs commented Sep 25, 2024

On my m3 Sonoma, grf compiled with clang (1600.0.26.3) fails this invariance test.1

This is a bit absurd considering this is an invariance that passes on Azure with x86-based clang or gcc. It also passes if my m3 Sonoma macbook compiles grf with gcc-14 instead of clang.

The code in that invariance test rely on Eigen, maybe there is something strange going on with arm-clang + Eigen, or just clang. @jtibshirani what do you say, it this something we should pass on to a clang dev?

Footnotes

  1. It tests a property of a "complicated" forest algorithm which basically boils down to: "if I give you the duplicated stacked outcomes (Y,Y) the results is the same as if I'd given you just the single outcome (Y)"

@erikcs erikcs added the requires research An issue that needs additional thought and experimentation before it can be implemented. label Sep 25, 2024
@jtibshirani
Copy link
Member

That's indeed surprising! If you have the time, seems like a good thing to debug and pass on to whatever component ends up containing the issue. To help debug, you could work on reducing the scope -- for example can we reproduce the issue in a C++ only set-up, to rule out anything related to R?

@jtibshirani
Copy link
Member

One guess: we "vendored in" our own random number generator (#469). But that was before ARM was widely available ... I wonder if we need to take another look at that dependency and update the files?

@erikcs
Copy link
Member Author

erikcs commented Oct 3, 2024

False alarm(!), now that I've reloaded all the grf machinery into my brain I realized I've been a bit too naive in some invariance expectations. The test in question sometimes have different splits (the random samples drawn + everything else randomness related is luckily the same across architectures), and that is actually legal behavior. It's the same old CART + non-commutative computer instructions headache we all know by now. For some two variables x1 or x2, then I can get the same criterion by splitting on either one. Which one I end up splitting on depends on 1) which appears first in my sort order or 2) floating point behavior when calculating objective = a + b. In this case arm decides that b + a is slightly larger than a + b, and splits on x2 instead of x1. If we ever implement a new random forest from scratch it would be nice to think of general ways to translate algorithms into software that behaves exactly the same across all system flavors 🤔. For CART splitting in particular that would mean coming up with ways to ensure the split decision is identical regardless of operation order, which sounds challenging .I'll make some test updates and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requires research An issue that needs additional thought and experimentation before it can be implemented.
Projects
None yet
Development

No branches or pull requests

2 participants