Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vec::retain performance #91527

Merged
merged 2 commits into from
Dec 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions library/alloc/benches/vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -733,11 +733,26 @@ fn bench_flat_map_collect(b: &mut Bencher) {
b.iter(|| v.iter().flat_map(|color| color.rotate_left(8).to_be_bytes()).collect::<Vec<_>>());
}

/// Reference benchmark that `retain` has to compete with.
#[bench]
fn bench_retain_iter_100000(b: &mut Bencher) {
let mut v = Vec::with_capacity(100000);

b.iter(|| {
let mut tmp = std::mem::take(&mut v);
tmp.clear();
tmp.extend(black_box(1..=100000));
v = tmp.into_iter().filter(|x| x & 1 == 0).collect();
});
}

#[bench]
fn bench_retain_100000(b: &mut Bencher) {
let v = (1..=100000).collect::<Vec<u32>>();
let mut v = Vec::with_capacity(100000);

b.iter(|| {
let mut v = v.clone();
v.clear();
v.extend(black_box(1..=100000));
v.retain(|x| x & 1 == 0)
});
}
Expand Down
61 changes: 29 additions & 32 deletions library/alloc/src/vec/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1520,49 +1520,46 @@ impl<T, A: Allocator> Vec<T, A> {

let mut g = BackshiftOnDrop { v: self, processed_len: 0, deleted_cnt: 0, original_len };

// process_one return a bool indicates whether the processing element should be retained.
#[inline(always)]
fn process_one<F, T, A: Allocator, const DELETED: bool>(
fn process_loop<F, T, A: Allocator, const DELETED: bool>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, dropping #[inline] modifier should not be done before New Pass Manager stabilizes because this affects how early function would be inlined.
Also, fn drop can have this modifier too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I see that fn drop always do a memmove call despite the fact that almost always there is 0 items to copy (this is not true only if panic occurs).
LLVM doesn't do any checks before memmove call: godbolt

Maybe condition must look like if self.deleted_cnt > 0 && self.original_len != self.processed_len {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, dropping #[inline] modifier should not be done before New Pass Manager stabilizes because this affects how early function would be inlined.

Is it relevant though? I think most of the performance comes from the loops themselves being optimized properly, not from inlining the loops into the larger function. They only do unchecked pointer arithmetic anyway, so it's not like they benefit from bounds check elimination.

original_len: usize,
f: &mut F,
g: &mut BackshiftOnDrop<'_, T, A>,
) -> bool
where
) where
F: FnMut(&mut T) -> bool,
{
// SAFETY: Unchecked element must be valid.
let cur = unsafe { &mut *g.v.as_mut_ptr().add(g.processed_len) };
if !f(cur) {
// Advance early to avoid double drop if `drop_in_place` panicked.
g.processed_len += 1;
g.deleted_cnt += 1;
// SAFETY: We never touch this element again after dropped.
unsafe { ptr::drop_in_place(cur) };
// We already advanced the counter.
return false;
}
if DELETED {
// SAFETY: `deleted_cnt` > 0, so the hole slot must not overlap with current element.
// We use copy for move, and never touch this element again.
unsafe {
let hole_slot = g.v.as_mut_ptr().add(g.processed_len - g.deleted_cnt);
ptr::copy_nonoverlapping(cur, hole_slot, 1);
while g.processed_len != original_len {
// SAFETY: Unchecked element must be valid.
let cur = unsafe { &mut *g.v.as_mut_ptr().add(g.processed_len) };
if !f(cur) {
// Advance early to avoid double drop if `drop_in_place` panicked.
g.processed_len += 1;
g.deleted_cnt += 1;
// SAFETY: We never touch this element again after dropped.
unsafe { ptr::drop_in_place(cur) };
// We already advanced the counter.
if DELETED {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write this more clearly? It is a quite complex, imho.

Maybe this would more readable:

match (DELETED, f(cur)){
    (true, true) =>{
          // move current item to freed hole
    },
    (true, false) => {
          // increment cnts, drop in place, 
          // move items
    }
    (false, true) => {
        // increase processed len, do not move because no items dropped yet
    }
    (false, false) => {
        // increase cnts, drop in place, return
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                // SAFETY: We never touch this element again after dropped.
                    g.deleted_cnt += 1;
                unsafe { ptr::drop_in_place(cur) };
                    // SAFETY: We never touch this element again after dropped.
                // We already advanced the counter.
                    unsafe { ptr::drop_in_place(cur) };

This code block would happen on two branches and thus would have to be duplicated.

Well, I guess processed_len can be hoisted out of all the conditionals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think that duplicate code is a big issue because it is very short. For me, control flow is not easy to follow but maybe it is something wrong with me and the code is OK.

continue;
} else {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return shows our intent more clearly.

}
}
if DELETED {
// SAFETY: `deleted_cnt` > 0, so the hole slot must not overlap with current element.
// We use copy for move, and never touch this element again.
unsafe {
let hole_slot = g.v.as_mut_ptr().add(g.processed_len - g.deleted_cnt);
ptr::copy_nonoverlapping(cur, hole_slot, 1);
}
}
g.processed_len += 1;
}
g.processed_len += 1;
return true;
}

// Stage 1: Nothing was deleted.
while g.processed_len != original_len {
if !process_one::<F, T, A, false>(&mut f, &mut g) {
break;
}
}
process_loop::<F, T, A, false>(original_len, &mut f, &mut g);

// Stage 2: Some elements were deleted.
while g.processed_len != original_len {
process_one::<F, T, A, true>(&mut f, &mut g);
}
process_loop::<F, T, A, true>(original_len, &mut f, &mut g);

// All item are processed. This can be optimized to `set_len` by LLVM.
drop(g);
Expand Down