-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature_fraction
RNG looks broken
#4371
Comments
Linking #4134 (comment) as related. |
@DexGroves Thanks for using LightGBM. After a careful check, I think it should be LightGBM/include/LightGBM/utils/random.h Line 88 in 1b5bec0
otherwise the last element N-1 will always be left out. Thanks for pointing that out.
The reason to use the second branch in With |
Fixed via #4450. I can confirm that now original reproducible example returns something like the following:
|
Thanks for the quick fix @shiyu1994! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Hi!
First up, thanks for maintaining this amazing package!
I noticed something weird while training a model with three predictors and 1/2 <
feature_fraction
< 5/6. One feature always gets ignored. I think this is because of the per-tree RNG that determines which features are available. As far as I can tell:NextInt
in random.h gives alternating even and odd numbers, which gets weird when we ask it for modulo even things, particularly modulo 2.r+1
, but I could be wrong.Repro
Why I think this is happening
In the select-two-from-three case, we go around this loop exactly twice per tree, and two bad things happen. The first go-around call rolls
NextInt(0, 1)
, which returns 0 in all cases. The zeroth variable always gets included. I think the roll should beNextInt(0, 2)
, so the zeroth variable has a chance to be left out.The second go-around rolls
NextInt(0, 2)
, which returns either 0 or 1 depending on seed. This is doingRandInt32() % 2
under the hood. Since this is the 2nd, 4th, 6th, 8th, ..., call to the RNG, andRantInt32
alternates between even and odd numbers, it always hits the same value for every tree.This doesn't just affect the select-two-from-three case, but it gets more complicated to think about. Even-indexed variables get interesting if they get to roll
something_always_even % another_small_even_number
. You can construct other combinations of number of features andfeature_fraction
where variables in the last or second-to-last position get excluded always, or where the model has a significant bias towards using even-indexed variables, etc.Fixes?
RantInt32
so it doesn't have even/odd alternation?RandInt16
forNextInt
?Happy to submit a PR for the first or third one if you think they're OK solutions.
The text was updated successfully, but these errors were encountered: