-
-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SmallRng: Replace PCG algorithm with xoshiro{128,256}++ #1038
Conversation
I decided to use the default impl of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I like the simplified implementations and agree with the decision not to use SplitMix for seed_from_u64
(IIRC the official advice is simply to use a significantly different PRNG, so our existing PCG32 impl should be fine).
src/rngs/xoshiro256plusplus.rs
Outdated
impl RngCore for Xoshiro256PlusPlus { | ||
#[inline] | ||
fn next_u32(&mut self) -> u32 { | ||
self.next_u64() as u32 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes the low-order bytes. @vigna is it preferable to take the high-order bytes given that we're discarding half anyway? Or is there no point caring for the ++ variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'd take the high bits (longer carry chain propagation, which means higher linear complexity), but, really, the difference in quality is not detectable.
Note that if you use PCG32 to initialize the state usinga 64-bit seed, seeds with a large number of lower bits in common will initialize the state in a similar way (try with 0 and 2^63), and consequently the very first outputs will be correlated. The effect might be however very mild in practice.
You can avoid this problem by mixing the seed with a MurmurHash mix (or any other quick bijective mixing function) before using it as a seed for PCG32 (maybe you're already doing something like that, in which case forget this comment).
Of course there will be always seeds leading to similar initialization if you use PCG32, but at least they will be random-looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the higher bits makes sense (this should also be done for the implementations in rand_xoshiro
).
@vigna Would switching to splitmix for 64-bit seeding also avoid this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using SplitMix avoids visible dependence on the 64-bit seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an issue with our seed_from_u64
method, I'd prefer we deal with that directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. So the issue propagates to every generator using that method. I would have used something with less dependence from the initial value, but I agree that it's unlikely significant problems might surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to ignore it: it's not very significant.
What is the argument for continuing to use PCG? Avoiding churn? I think it would be nice to avoid issues such as #905, even if it does not matter that much in this case.
We could just use a cryptographic hash function here; it's not performance critical, though we prefer to minimise code size.
Switching to splitmix is approximately the same number of lines and does not suffer from the correlation problem. As far as I can tell, the only downside is that this is a value-breaking change for everyone using seed_from_u64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vks #905 is not an "issue" so much as incorrect documentation. To quote @vigna:
Personally, I don't believe in the "middle ground" of "difficult-to-predict-but-not-crypto": usually it just means it is so easy to break that nobody tries. I mean, I can claim that xoshiro256++ is "difficult to predict"—and maybe that's even true. But until someone tries for real, that's just a vague and unsubstatiated claim.
The main issue with PCG, as far as I am aware, is that streams tend to be quite similar; this is (part) of the issue in #1032 for example. This really doesn't matter for seed_from_u64
since it only uses a single stream.
If we really want a perfect initialiser we should use a proven cryptographic PRNG or hash function, but to some extent this is overkill considering we only start with (a maximum of) 64-bits of entropy. It seems to me it is "good enough" for intended uses. If @vigna strongly disagrees, I will listen to his advice (though what I understood from the above is that we should use a hash function such as murmurhash and a PRNG).
We also are trying to keep the code-size of rand_core
minimal however. I'm not entirely convinced the split between rand
and rand_core
still makes sense, but changing that now is more churn than justified in a mostly-mature library.
Between SplitMix64 and PCG32, I remain unconvinced that either is significantly better than the other. PCG does more mixing overall (since it emits only half the state per round, and does both multiplication and addition vs just addition), while SplitMix may have a stronger output function. Either way, I think getting good avalanche from low-order bits is much more important than from high-order bits due to the way users are likely to use this, and in this case I think both PRNGs are acceptable. But if @vigna knows better, please do correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#905 is not an "issue" so much as incorrect documentation.
I was referring to the issue with similar sequences for different seeds.
This really doesn't matter for seed_from_u64 since it only uses a single stream.
Yes, but a similar issue persists anyway: Different (but similar) seeds give very similar sequences, as discussed above.
I'm not entirely convinced the split between rand and rand_core still makes sense, but changing that now is more churn than justified in a mostly-mature library.
I agree. For Rand 1.0 I could imagine merging the two crates and making rand
without default features equivalent to the old rand_core
.
Between SplitMix64 and PCG32, I remain unconvinced that either is significantly better than the other. PCG does more mixing overall (since it emits only half the state per round, and does both multiplication and addition vs just addition), while SplitMix may have a stronger output function. Either way, I think getting good avalanche from low-order bits is much more important than from high-order bits due to the way users are likely to use this, and in this case I think both PRNGs are acceptable. But if @vigna knows better, please do correct me.
Alright, I will revert the splitmix commit. If necessary, we can perform this change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the issues with PCG is that some subsequences of the same generator are very correlated (similar bit patterns), as shown above. You're confusing this issue with the issue that sequences from different streams are similar. The first one is obviously proven by the example you're showing. This is the issue at hand.
The persistent bit patterns of PCG32 for correlated seed cannot happen with SplitMix, so, yes, there is a significant and measurable difference. You can take two states of PCG32 and get highly correlated subsequences in which repeated bit patterns appear. You cannot do the same with SplitMix. You might be unconvinced, but it's mathematics. I'm not claiming that SplitMix will have uncorrelated subsequences, because the state mixing happens only leftward, as with PCG, but the problem is mitigated enormously by the powerful mix function.
If you're OK with repeated bit patterns in the initial state of your generators, that's another question—it's your library, if you want them, keep them.
@dhardy I restored the PCG32-based implementation of |
@vks thanks for the quick changes, but we should finish planning first. @vigna has a good argument that SplitMix is slightly better for
Perhaps, then, the best option is to override |
Yes, that's probably the best solution. Fix it locally for now, put together a more comprehensive solution for the future. Note that if you use a 64-bit seed to initialize a larger state, there are some things which are mathematically impossible to avoid. For example, there will be always seeds leading to a similar first word of state (or any fixed word of state): if the initializing generator you are using does not output every possible value (e.g., PCG32), there will be even some seeds leading to an identical first word of state; and even if the initializing generator outputs exactly once every possible 64-bit value, there will be first words of state sharing, say, the lower 63 bits. This is a fact of life—you cannot avoid this even with a crypto-strength initializing generator. What you can avoid is that more than one word of state is correlated, because you are tapping into a much larger state space. A crypto-strength initializing generator will do that for you. SplitMix will provide words that have no visible similarity artifacts, and that are sufficiently uncorrelated to initialize a generator. It is true, however, that if you use two seeds differing only in the higher bit the difference in initialization will be only due to the spreading of the influence of that bit by the mixing function (which, once again, is more than sufficient IMHO). |
Alright, I migrated |
src/rngs/small.rs
Outdated
/// Note that depending on the application, [`StdRng`] is faster on many modern | ||
/// platforms while providing higher-quality randomness. Furthermore, `SmallRng` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we really say it is faster on many platforms? Micro-benchmarks aren't fully representative, and I don't think we have enough other data to go on, so better simply to say that it may be faster?
src/rngs/small.rs
Outdated
/// - Security against prediction or reproducibility are important. | ||
/// Use [`StdRng`] instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is two separate things: don't recommend StdRng
for reproducibility!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just remove "reproducibility", because this is discussed in detail below.
Due to close correlations of PCG streams (rust-random#907) and lack of right-state propagation (rust-random#905), the `SmallRng` algorithm is switched to xoshiro{128,256}++. The implementation is taken from the `rand_xoshiro` crate and slightly simplified. Fixes rust-random#910.
Also fix a wrong link.
This update in particular changes the SmallRng to xoshiro, which I explored in an experimental branch. Benchmarking shows that this change is performance-neutral and the output looks the same. See rust-random/rand#1038 for more. The experimental branch was more complicated because it replaced the use of thread_rng with a custom rayon pool. I may eventually pick up that branch again because it offers more control, but for now this new rand version gets me the better PRNG. AMD Ryzen 9 3900X 12-Core Processor (AMD64 Family 23 Model 113 Stepping 0) tracescene/10x10x4 time: [295.42 us 295.79 us 296.17 us] change: [-1.4129% -1.1793% -0.9318%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 5 (5.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe
This update in particular changes `SmallRng` to xoshiro. `SmallRng` is only used for scene construction, as the core continues to use `thread_rng`, which wraps `StdRng`. See rust-random/rand#1038 for more. As expected, benchmarking shows that this change is performance-neutral and the output looks correct. There is an ongoing experimental branch that effectively implements a custom thread-local RNG solution using a custom rayon pool. This will in particular allow using xoshiro in the core trace loop. Benchmarking: AMD Ryzen 9 3900X 12-Core Processor (AMD64 Family 23 Model 113 Stepping 0) tracescene/10x10x4 time: [295.42 us 295.79 us 296.17 us] change: [-1.4129% -1.1793% -0.9318%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 5 (5.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe
Due to close correlations of PCG streams (#907) and lack of right-state propagation (#905), the
SmallRng
algorithm is switched to xoshiro{128,256}++. The implementation is taken from therand_xoshiro
crate and slightly simplified.Fixes #910.