Invariance test fails on Apple arm+clang #1452

erikcs · 2024-09-25T06:01:01Z

On my m3 Sonoma, grf compiled with clang (1600.0.26.3) fails this invariance test.¹

This is a bit absurd considering this is an invariance that passes on Azure with x86-based clang or gcc. It also passes if my m3 Sonoma macbook compiles grf with gcc-14 instead of clang.

The code in that invariance test rely on Eigen, maybe there is something strange going on with arm-clang + Eigen, or just clang. @jtibshirani what do you say, it this something we should pass on to a clang dev?

It tests a property of a "complicated" forest algorithm which basically boils down to: "if I give you the duplicated stacked outcomes (Y,Y) the results is the same as if I'd given you just the single outcome (Y)" ↩

jtibshirani · 2024-09-27T16:40:08Z

That's indeed surprising! If you have the time, seems like a good thing to debug and pass on to whatever component ends up containing the issue. To help debug, you could work on reducing the scope -- for example can we reproduce the issue in a C++ only set-up, to rule out anything related to R?

jtibshirani · 2024-09-27T16:43:49Z

One guess: we "vendored in" our own random number generator (#469). But that was before ARM was widely available ... I wonder if we need to take another look at that dependency and update the files?

erikcs · 2024-10-03T05:15:21Z

False alarm(!), now that I've reloaded all the grf machinery into my brain I realized I've been a bit too naive in some invariance expectations. The test in question sometimes have different splits (the random samples drawn + everything else randomness related is luckily the same across architectures), and that is actually legal behavior. It's the same old CART + non-commutative computer instructions headache we all know by now. For some two variables x1 or x2, then I can get the same criterion by splitting on either one. Which one I end up splitting on depends on 1) which appears first in my sort order or 2) floating point behavior when calculating objective = a + b. In this case arm decides that b + a is slightly larger than a + b, and splits on x2 instead of x1. If we ever implement a new random forest from scratch it would be nice to think of general ways to translate algorithms into software that behaves exactly the same across all system flavors 🤔. For CART splitting in particular that would mean coming up with ways to ensure the split decision is identical regardless of operation order, which sounds challenging .I'll make some test updates and close this issue.

erikcs added the requires research An issue that needs additional thought and experimentation before it can be implemented. label Sep 25, 2024

erikcs closed this as completed Oct 3, 2024

erikcs mentioned this issue Oct 3, 2024

Relax lm_forest invariance #1458

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invariance test fails on Apple arm+clang #1452

Invariance test fails on Apple arm+clang #1452

erikcs commented Sep 25, 2024 •

edited

Loading

jtibshirani commented Sep 27, 2024

jtibshirani commented Sep 27, 2024

erikcs commented Oct 3, 2024

Invariance test fails on Apple arm+clang #1452

Invariance test fails on Apple arm+clang #1452

Comments

erikcs commented Sep 25, 2024 • edited Loading

Footnotes

jtibshirani commented Sep 27, 2024

jtibshirani commented Sep 27, 2024

erikcs commented Oct 3, 2024

erikcs commented Sep 25, 2024 •

edited

Loading