-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize common path of Once::doit #14174
Conversation
The message on this PR doesn't describe what's actually being landed (both the PR description and the commit message). Can you please rewrite it to actually describe what's going on? You can put the extra information back in a comment for posterity's sake, but I'd rather not have the message suggest that it's being marked as |
Optimize `Once::doit`: perform optimistic check that initializtion is already completed. `load` is much cheaper than `fetch_add` at least on x86_64. Verified with this test: ``` static mut o: one::Once = one::ONCE_INIT; unsafe { loop { let start = time::precise_time_ns(); let iters = 50000000u64; for _ in range(0, iters) { o.doit(|| { println!("once!"); }); } let end = time::precise_time_ns(); let ps_per_iter = 1000 * (end - start) / iters; println!("{} ps per iter", ps_per_iter); // confuse the optimizer o.doit(|| { println!("once!"); }); } } ``` Test executed on Mac, Intel Core i7 2GHz. Result is: * 20ns per iteration without patch * 4ns per iteration with this patch applied Once.doit could be even faster (800ps per iteration), if `doit` function was split into a pair of `doit`/`doit_slow`, and `doit` marked as `#[inline]` like this: ``` #[inline(always)] pub fn doit(&self, f: ||) { if self.cnt.load(atomics::SeqCst) < 0 { return } self.doit_slow(f); } fn doit_slow(&self, f: ||) { ... } ```
Updated the patch and PR description to make it clear, that |
Closes rust-lang#14210 (Make Vec.truncate() resilient against failure in Drop) Closes rust-lang#14206 (Register new snapshots) Closes rust-lang#14205 (use sched_yield on linux and freebsd) Closes rust-lang#14204 (Add a crate for missing stubs from libcore) Closes rust-lang#14201 (Render not_found with an absolute path to the rust stylesheet) Closes rust-lang#14198 (update valgrind headers) Closes rust-lang#14174 (Optimize common path of Once::doit) Closes rust-lang#14162 (Print 'rustc' and 'rustdoc' as the command name for --version) Closes rust-lang#14145 (Better strict version hash (SVH) computation)
Submitting PR again, because I cannot reopen #13349, and github does not attach new patch to that PR. ======= Optimize `Once::doit`: perform optimistic check that initializtion is already completed. `load` is much cheaper than `fetch_add` at least on x86_64. Verified with this test: ``` static mut o: one::Once = one::ONCE_INIT; unsafe { loop { let start = time::precise_time_ns(); let iters = 50000000u64; for _ in range(0, iters) { o.doit(|| { println!("once!"); }); } let end = time::precise_time_ns(); let ps_per_iter = 1000 * (end - start) / iters; println!("{} ps per iter", ps_per_iter); // confuse the optimizer o.doit(|| { println!("once!"); }); } } ``` Test executed on Mac, Intel Core i7 2GHz. Result is: * 20ns per iteration without patch * 4ns per iteration with this patch applied Once.doit could be even faster (800ps per iteration), if `doit` function was split into a pair of `doit`/`doit_slow`, and `doit` marked as `#[inline]` like this: ``` #[inline(always)] pub fn doit(&self, f: ||) { if self.cnt.load(atomics::SeqCst) < 0 { return } self.doit_slow(f); } fn doit_slow(&self, f: ||) { ... } ```
Use sync::one::Once to fetch the mach_timebase_info only once when running precise_time_ns(). This helps because mach_timebase_info() is surprisingly inefficient. Also fix the order of operations when applying the timebase to the mach absolute time value. This improves the time on my machine from ``` test tests::bench_precise_time_ns ... bench: 157 ns/iter (+/- 4) ``` to ``` test tests::bench_precise_time_ns ... bench: 38 ns/iter (+/- 3) ``` and it will get even faster once rust-lang#14174 lands.
…chton Use sync::one::Once to fetch the mach_timebase_info only once when running precise_time_ns(). This helps because mach_timebase_info() is surprisingly inefficient. Also fix the order of operations when applying the timebase to the mach absolute time value. This improves the time on my machine from ``` test tests::bench_precise_time_ns ... bench: 157 ns/iter (+/- 4) ``` to ``` test tests::bench_precise_time_ns ... bench: 38 ns/iter (+/- 3) ``` and it will get even faster once #14174 lands.
Submitting PR again, because I cannot reopen #13349, and github does not attach new patch to that PR.
Optimize
Once::doit
: perform optimistic check that initializtion isalready completed.
load
is much cheaper thanfetch_add
at leaston x86_64.
Verified with this test:
Test executed on Mac, Intel Core i7 2GHz. Result is:
Once.doit could be even faster (800ps per iteration), if
doit
functionwas split into a pair of
doit
/doit_slow
, anddoit
marked as#[inline]
like this: