-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove probe_loop
in favor of Iterator
#265
Conversation
This tries to reduce compile times and slightly improve the code by avoiding theclosure-based `probe_loop` fn and related control flow types/matches. Instead this uses an Iterator with a special `reload` flag.
Hi. Thank you so much for working on this. I am thinking to accept this PR for v0.11.2 because of the following reasons:
As for runtime performance, this change made the benchmark result a bit worse. The new code is between ~5% to ~7% slower than the original. I think this is because the new code is preventing the compiler from inlining some function calls (so the size of the generated codes have reduced). It is a trade-off between the size of the generated codes and the performance. I like the new code better because its clearer and easier to understand. So it will be a good trade-off. I also think that the current probing algorithm can be improved, which could improve the performance over the number we lose now. So I would rather keep the code simple and improve the probing algorithm later. |
All checks should pass after merging into the https://github.com/moka-rs/moka/compare/probe-iter-2 (See that the last commit passed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very thorough analysis, I’m always surprised how much of a difference such small changes can make. But at the same time I’m also curious how stable those numbers are.
// FIXME: this will panic if `len() == 0` | ||
let this_bucket = (offset, &buckets[offset]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change I’m a bit uncertain about, maybe using an Option
for this_bucket
could avoid that, and also remove some panic code as well. If you mention you want to optimize the probing itself either way, it might not be worth worth doing this immediately, if indeed len
will never be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the cht
code and confirmed that len
will never be 0. So this line will never cause a panic.
BucketArray
can be created by default
method or with_length
function.
default
uses 128 as len
:
Line 28 in e5690f8
Self::with_length(0, BUCKET_ARRAY_DEFAULT_LENGTH) |
with_length
takes an arbitrary len
in usize
, but it's caller (below) will never call it with 0:
Lines 170 to 184 in e5690f8
if capacity == 0 { | |
unsafe { | |
ptr::write_bytes(segments.as_mut_ptr(), 0, actual_num_segments); | |
segments.set_len(actual_num_segments); | |
} | |
} else { | |
let actual_capacity = (capacity * 2 / actual_num_segments).next_power_of_two(); | |
for _ in 0..actual_num_segments { | |
segments.push(Segment { | |
bucket_array: Atomic::new(BucketArray::with_length(0, actual_capacity)), | |
len: AtomicUsize::new(0), | |
}); | |
} | |
} |
capacity |
actual_num_segments |
actual_capacity (len ) |
---|---|---|
0 | any | n/a (does not call with_length ) |
1 | 1 | 2 |
1 | 2 | 1 |
1 | 4 | 1 |
2 | 1 | 4 |
I am not sure what you mean by stable. If it is about the stability of the numbers between different runs of the same benchmark, then I would say those numbers are stable. However, the numbers do not mean much with other workloads, especially ones with both high cache miss rate and large miss penalty (the time spend to calculate/get the value). For such workloads, latency differences in a cache operation will be negligible because the miss penalty is much larger than the latency difference. I think we will need to talk about the direction of the optimization; smaller code size/shorter compile time vs. runtime performance. It seems many cases are trade offs between them. These benchmark numbers may be meaningful for some workloads (e.g., with very high hit rates) but not for others. So I think it is okay to give lower priority on runtime performance than code size/compile time. (I will accept this PR anyway) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. Thank you for working on this!
This tries to reduce compile times and slightly improve the code by avoiding theclosure-based
probe_loop
fn and related control flow types/matches. Instead this uses an Iterator with a specialreload
flag.Its another attempt to slightly improve upon #203.
As a testcase, I used https://github.com/getsentry/symbolicator, and here is the difference in output of
cargo llvm-lines -p symbolicator-service
:before:
after:
I focused this on two of the functions that were touched. Its certainly possible that going by this measure alone does not paint an exact picture either. Maybe I am focusing on something that does not matter at all in the end.