Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize common path of Once::doit #14174

Merged
merged 1 commit into from
May 15, 2014
Merged

Optimize common path of Once::doit #14174

merged 1 commit into from
May 15, 2014

Conversation

stepancheg
Copy link
Contributor

Submitting PR again, because I cannot reopen #13349, and github does not attach new patch to that PR.

Optimize Once::doit: perform optimistic check that initializtion is
already completed. load is much cheaper than fetch_add at least
on x86_64.

Verified with this test:

static mut o: one::Once = one::ONCE_INIT;
unsafe {
    loop {
        let start = time::precise_time_ns();
        let iters = 50000000u64;
        for _ in range(0, iters) {
            o.doit(|| { println!("once!"); });
        }
        let end = time::precise_time_ns();
        let ps_per_iter = 1000 * (end - start) / iters;
        println!("{} ps per iter", ps_per_iter);

        // confuse the optimizer
        o.doit(|| { println!("once!"); });
    }
}

Test executed on Mac, Intel Core i7 2GHz. Result is:

  • 20ns per iteration without patch
  • 4ns per iteration with this patch applied

Once.doit could be even faster (800ps per iteration), if doit function
was split into a pair of doit/doit_slow, and doit marked as
#[inline] like this:

#[inline(always)]
pub fn doit(&self, f: ||) {
    if self.cnt.load(atomics::SeqCst) < 0 {
        return
    }

    self.doit_slow(f);
}

fn doit_slow(&self, f: ||) { ... }

@lilyball
Copy link
Contributor

The message on this PR doesn't describe what's actually being landed (both the PR description and the commit message). Can you please rewrite it to actually describe what's going on? You can put the extra information back in a comment for posterity's sake, but I'd rather not have the message suggest that it's being marked as #[inline] or being split.

Optimize `Once::doit`: perform optimistic check that initializtion is
already completed.  `load` is much cheaper than `fetch_add` at least
on x86_64.

Verified with this test:

```
static mut o: one::Once = one::ONCE_INIT;
unsafe {
    loop {
        let start = time::precise_time_ns();
        let iters = 50000000u64;
        for _ in range(0, iters) {
            o.doit(|| { println!("once!"); });
        }
        let end = time::precise_time_ns();
        let ps_per_iter = 1000 * (end - start) / iters;
        println!("{} ps per iter", ps_per_iter);

        // confuse the optimizer
        o.doit(|| { println!("once!"); });
    }
}
```

Test executed on Mac, Intel Core i7 2GHz. Result is:
* 20ns per iteration without patch
*  4ns per iteration with this patch applied

Once.doit could be even faster (800ps per iteration), if `doit` function
was split into a pair of `doit`/`doit_slow`, and `doit` marked as
`#[inline]` like this:

```
#[inline(always)]
pub fn doit(&self, f: ||) {
    if self.cnt.load(atomics::SeqCst) < 0 {
        return
    }

    self.doit_slow(f);
}

fn doit_slow(&self, f: ||) { ... }
```
@stepancheg
Copy link
Contributor Author

Updated the patch and PR description to make it clear, that doit is not split and not inlined.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 15, 2014
Closes rust-lang#14210 (Make Vec.truncate() resilient against failure in Drop)
Closes rust-lang#14206 (Register new snapshots)
Closes rust-lang#14205 (use sched_yield on linux and freebsd)
Closes rust-lang#14204 (Add a crate for missing stubs from libcore)
Closes rust-lang#14201 (Render not_found with an absolute path to the rust stylesheet)
Closes rust-lang#14198 (update valgrind headers)
Closes rust-lang#14174 (Optimize common path of Once::doit)
Closes rust-lang#14162 (Print 'rustc' and 'rustdoc' as the command name for --version)
Closes rust-lang#14145 (Better strict version hash (SVH) computation)
bors added a commit that referenced this pull request May 15, 2014
Submitting PR again, because I cannot reopen #13349, and github does not attach new patch to that PR. 

=======

Optimize `Once::doit`: perform optimistic check that initializtion is
already completed.  `load` is much cheaper than `fetch_add` at least
on x86_64.

Verified with this test:

```
static mut o: one::Once = one::ONCE_INIT;
unsafe {
    loop {
        let start = time::precise_time_ns();
        let iters = 50000000u64;
        for _ in range(0, iters) {
            o.doit(|| { println!("once!"); });
        }
        let end = time::precise_time_ns();
        let ps_per_iter = 1000 * (end - start) / iters;
        println!("{} ps per iter", ps_per_iter);

        // confuse the optimizer
        o.doit(|| { println!("once!"); });
    }
}
```

Test executed on Mac, Intel Core i7 2GHz. Result is:
* 20ns per iteration without patch
*  4ns per iteration with this patch applied

Once.doit could be even faster (800ps per iteration), if `doit` function
was split into a pair of `doit`/`doit_slow`, and `doit` marked as
`#[inline]` like this:

```
#[inline(always)]
pub fn doit(&self, f: ||) {
    if self.cnt.load(atomics::SeqCst) < 0 {
        return
    }

    self.doit_slow(f);
}

fn doit_slow(&self, f: ||) { ... }
```
@bors bors closed this May 15, 2014
@bors bors merged commit f853cf7 into rust-lang:master May 15, 2014
@stepancheg stepancheg deleted the once branch May 15, 2014 13:44
lilyball added a commit to lilyball/rust that referenced this pull request May 16, 2014
Use sync::one::Once to fetch the mach_timebase_info only once when
running precise_time_ns(). This helps because mach_timebase_info() is
surprisingly inefficient. Also fix the order of operations when applying
the timebase to the mach absolute time value.

This improves the time on my machine from

```
test tests::bench_precise_time_ns ... bench:       157 ns/iter (+/- 4)
```

to

```
test tests::bench_precise_time_ns ... bench:        38 ns/iter (+/- 3)
```

and it will get even faster once rust-lang#14174 lands.
bors added a commit that referenced this pull request May 16, 2014
…chton

Use sync::one::Once to fetch the mach_timebase_info only once when
running precise_time_ns(). This helps because mach_timebase_info() is
surprisingly inefficient. Also fix the order of operations when applying
the timebase to the mach absolute time value.

This improves the time on my machine from

```
test tests::bench_precise_time_ns ... bench:       157 ns/iter (+/- 4)
```

to

```
test tests::bench_precise_time_ns ... bench:        38 ns/iter (+/- 3)
```

and it will get even faster once #14174 lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants