Remove `probe_loop` in favor of Iterator #265

Swatinem · 2023-05-22T14:28:50Z

This tries to reduce compile times and slightly improve the code by avoiding theclosure-based probe_loop fn and related control flow types/matches. Instead this uses an Iterator with a special reload flag.

Its another attempt to slightly improve upon #203.
As a testcase, I used https://github.com/getsentry/symbolicator, and here is the difference in output of cargo llvm-lines -p symbolicator-service:

before:

  Lines                  Copies               Function name
  -----                  ------               -------------
  2460076                63753                (TOTAL)
    44782 (1.8%,  8.8%)    298 (0.5%,  2.3%)  moka::cht::map::bucket::BucketArray<K,V>::probe_loop
    15704 (0.6%, 21.2%)    151 (0.2%,  7.6%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if::{{closure}}
    11270 (0.5%, 28.7%)    151 (0.2%, 12.4%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if
     5580 (0.2%, 46.6%)     90 (0.1%, 24.6%)  moka::cht::map::bucket::BucketArray<K,V>::get
     5580 (0.2%, 46.9%)     90 (0.1%, 24.7%)  moka::cht::map::bucket::BucketArray<K,V>::get::{{closure}}

after:

  Lines                  Copies               Function name
  -----                  ------               -------------
  2370611                61032                (TOTAL)
    23652 (1.0%, 15.2%)    151 (0.2%,  4.8%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if
    10080 (0.4%, 30.3%)     90 (0.1%, 13.1%)  moka::cht::map::bucket::BucketArray<K,V>::get
     3360 (0.1%, 53.0%)     32 (0.1%, 26.2%)  <moka::cht::map::bucket::Probe<K,V> as core::iter::traits::iterator::Iterator>::next

I focused this on two of the functions that were touched. Its certainly possible that going by this measure alone does not paint an exact picture either. Maybe I am focusing on something that does not matter at all in the end.

This tries to reduce compile times and slightly improve the code by avoiding theclosure-based `probe_loop` fn and related control flow types/matches. Instead this uses an Iterator with a special `reload` flag.

tatsuya6502 · 2023-06-04T11:28:36Z

Hi. Thank you so much for working on this. I am thinking to accept this PR for v0.11.2 because of the following reasons:

It reduces the size of the generated codes.
It reduces compile time a bit (~5%?)
The new code would be easier to understand than the original.

As for runtime performance, this change made the benchmark result a bit worse. The new code is between ~5% to ~7% slower than the original. I think this is because the new code is preventing the compiler from inlining some function calls (so the size of the generated codes have reduced). It is a trade-off between the size of the generated codes and the performance.

I like the new code better because its clearer and easier to understand. So it will be a good trade-off. I also think that the current probing algorithm can be improved, which could improve the performance over the number we lose now. So I would rather keep the code simple and improve the probing algorithm later.

tatsuya6502 · 2023-06-04T11:32:23Z

Here are the benchmark results:

Ubuntu 22.04 x86_64
Intel Core i7-12700F
- 8 × performance cores × 2 hyper threads
- 4 × efficient cores

tatsuya6502 · 2023-06-04T11:35:14Z

All checks should pass after merging into the master branch (which has the upgraded MSRV 1.65).

https://github.com/moka-rs/moka/compare/probe-iter-2 (See that the last commit passed)

Swatinem

That is a very thorough analysis, I’m always surprised how much of a difference such small changes can make. But at the same time I’m also curious how stable those numbers are.

Swatinem · 2023-06-04T12:39:12Z

src/cht/map/bucket.rs

+        // FIXME: this will panic if `len() == 0`
+        let this_bucket = (offset, &buckets[offset]);


This is a change I’m a bit uncertain about, maybe using an Option for this_bucket could avoid that, and also remove some panic code as well. If you mention you want to optimize the probing itself either way, it might not be worth worth doing this immediately, if indeed len will never be 0.

I reviewed the cht code and confirmed that len will never be 0. So this line will never cause a panic.

BucketArray can be created by default method or with_length function.

default uses 128 as len:

moka/src/cht/map/bucket.rs

Line 28 in e5690f8

Self::with_length(0, BUCKET_ARRAY_DEFAULT_LENGTH)

with_length takes an arbitrary len in usize, but it's caller (below) will never call it with 0:

moka/src/cht/segment.rs

Lines 170 to 184 in e5690f8

if capacity == 0 {

unsafe {

ptr::write_bytes(segments.as_mut_ptr(), 0, actual_num_segments);

segments.set_len(actual_num_segments);

}

} else {

let actual_capacity = (capacity * 2 / actual_num_segments).next_power_of_two();

for _ in 0..actual_num_segments {

segments.push(Segment {

bucket_array: Atomic::new(BucketArray::with_length(0, actual_capacity)),

len: AtomicUsize::new(0),

});

}

}

capacity actual_num_segments actual_capacity (len)

0 any n/a (does not call with_length)

1 1 2

1 2 1

1 4 1

2 1 4

tatsuya6502 · 2023-06-06T01:21:14Z

But at the same time I’m also curious how stable those numbers are.

I am not sure what you mean by stable. If it is about the stability of the numbers between different runs of the same benchmark, then I would say those numbers are stable.

However, the numbers do not mean much with other workloads, especially ones with both high cache miss rate and large miss penalty (the time spend to calculate/get the value). For such workloads, latency differences in a cache operation will be negligible because the miss penalty is much larger than the latency difference.

I think we will need to talk about the direction of the optimization; smaller code size/shorter compile time vs. runtime performance. It seems many cases are trade offs between them. These benchmark numbers may be meaningful for some workloads (e.g., with very high hit rates) but not for others. So I think it is okay to give lower priority on runtime performance than code size/compile time.

(I will accept this PR anyway)

tatsuya6502

It looks good to me. Thank you for working on this!

Follow up #265

Remove probe_loop in favor of Iterator

da251c4

This tries to reduce compile times and slightly improve the code by avoiding theclosure-based `probe_loop` fn and related control flow types/matches. Instead this uses an Iterator with a special `reload` flag.

tatsuya6502 self-requested a review May 22, 2023 15:08

tatsuya6502 self-assigned this May 22, 2023

tatsuya6502 added this to the v0.11.2 milestone May 27, 2023

tatsuya6502 added the enhancement New feature or request label May 27, 2023

Swatinem mentioned this pull request May 27, 2023

moka is too polymorphic (cargo-llvm-lines) #203

Open

tatsuya6502 mentioned this pull request Jun 3, 2023

Bump the minimum supported Rust version (MSRV) to 1.65 (Nov 3, 2022) #275

Merged

Swatinem commented Jun 4, 2023

View reviewed changes

tatsuya6502 approved these changes Jun 7, 2023

View reviewed changes

tatsuya6502 added this pull request to the merge queue Jun 7, 2023

Merged via the queue into moka-rs:master with commit 90fa847 Jun 7, 2023

tatsuya6502 mentioned this pull request Jun 7, 2023

Follow up #265 #278

Merged

github-merge-queue bot pushed a commit that referenced this pull request Jun 7, 2023

Merge pull request #278 from moka-rs/probe-iter-3

d74d1e3

Follow up #265

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `probe_loop` in favor of Iterator #265

Remove `probe_loop` in favor of Iterator #265

Swatinem commented May 22, 2023

tatsuya6502 commented Jun 4, 2023

tatsuya6502 commented Jun 4, 2023

tatsuya6502 commented Jun 4, 2023

Swatinem left a comment

Swatinem Jun 4, 2023

tatsuya6502 Jun 6, 2023 •

edited

Loading

tatsuya6502 commented Jun 6, 2023 •

edited

Loading

tatsuya6502 left a comment

		// FIXME: this will panic if `len() == 0`
		let this_bucket = (offset, &buckets[offset]);

	if capacity == 0 {
	unsafe {
	ptr::write_bytes(segments.as_mut_ptr(), 0, actual_num_segments);
	segments.set_len(actual_num_segments);
	}
	} else {
	let actual_capacity = (capacity * 2 / actual_num_segments).next_power_of_two();

	for _ in 0..actual_num_segments {
	segments.push(Segment {
	bucket_array: Atomic::new(BucketArray::with_length(0, actual_capacity)),
	len: AtomicUsize::new(0),
	});
	}
	}

Remove probe_loop in favor of Iterator #265

Remove probe_loop in favor of Iterator #265

Conversation

Swatinem commented May 22, 2023

tatsuya6502 commented Jun 4, 2023

tatsuya6502 commented Jun 4, 2023

tatsuya6502 commented Jun 4, 2023

Swatinem left a comment

Choose a reason for hiding this comment

Swatinem Jun 4, 2023

Choose a reason for hiding this comment

tatsuya6502 Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

tatsuya6502 commented Jun 6, 2023 • edited Loading

tatsuya6502 left a comment

Choose a reason for hiding this comment

Remove `probe_loop` in favor of Iterator #265

Remove `probe_loop` in favor of Iterator #265

tatsuya6502 Jun 6, 2023 •

edited

Loading

tatsuya6502 commented Jun 6, 2023 •

edited

Loading