-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slice equality is slow #16913
Comments
the |
The call to match_case.i: ; preds = %"_ZN5slice57Items$LT$$x27a$C$$x20T$GT$.Iterator$LT$$BP$$x27a$x20T$GT$4next21h11506221690941188551E.exit"
%sret_slot.sroa.0.0.i.lcssa310 = phi i8* [ %sret_slot.sroa.0.0.i, %"_ZN5slice57Items$LT$$x27a$C$$x20T$GT$.Iterator$LT$$BP$$x27a$x20T$GT$4next21h11506221690941188551E.exit" ]
%60 = icmp eq i8* %sret_slot.sroa.0.0.i.lcssa310, null
br i1 %60, label %.noexc76, label %then-block-191-.i.loopexit309
match_case6.i: ; preds = %"_ZN5slice57Items$LT$$x27a$C$$x20T$GT$.Iterator$LT$$BP$$x27a$x20T$GT$4next21h11506221690941188551E.exit"
%61 = icmp eq i8* %sret_slot.sroa.0.0.i, null
br i1 %61, label %then-block-191-.i.loopexit, label %.noexc197
.noexc197: ; preds = %match_case6.i
%62 = bitcast i8* %sret_slot.sroa.0.0.i to i64*
%63 = bitcast i8* %sret_slot.sroa.0.0.i201 to i64*
%64 = load i64* %63, align 8
%65 = load i64* %62, align 8
%66 = icmp eq i64 %64, %65
br i1 %66, label %loop_body.i, label %then-block-191-.i.loopexit cc @zwarich -- Could the NullCheckElim pass handle that? That aside, even the
AFAICT, LLVM has no optimization to optimize loops to calls to memcmp, yet (it's a TODO in the LoopIdiomRecognize xform), so we probably lose out because of that. |
The custom_eq method can be improved slightly by using raw pointers (C++ style iteration). But even then, the distance to memcmp is vast. For u8 elements it's 10x. |
Hm, I don't see that much of a difference for u8 elements. The (unmodified) custom_eq is about as fast as memcp for me.
I implemented memcmp_eq like this: fn memcmp_eq<'a, T: PartialEq>(a: &'a [T], b: &'a [T]) -> bool {
if a.len() != b.len() {
return false;
}
unsafe {
rlibc::memcmp(a.as_ptr() as *const _, b.as_ptr() as *const _, a.len()) == 0
}
} |
try libc memcmp instead |
The LLVM loop idiom pass doesn't know how to generate |
Is this fixed by #26884 ? |
It seems to be fixed in the sense of the original reporter? But, slice equality still doesn't vectorize properly or compare well to glibc's memcmp (for byte slices), so improvements remain. |
@dotdash Not sure why it doesn't vectorize in nightly: https://play.rust-lang.org/?gist=38c5ef4ccf66898cc261&version=nightly pub fn compare(a: &[u8], b: &[u8]) -> bool {
a == b
} |
As of today this is still true. |
Related bug in LLVM: https://llvm.org/bugs/show_bug.cgi?id=16332 Basically the problem is that the LLVM vectoriser does not know how to optimise loops with multiple exits or (which is the same) with termination guards that are not simple constraints on the index variable. |
Seems to be fixed now with some specialisation (the slow comparison, not LLVM bug)
|
Thanks @nagisa, closing! |
fix: Some file watching related vfs fixes Fixes rust-lang/rust-analyzer#15554, additionally it seems that client side file watching was broken on windows this entire time, this PR switches `DidChangeWatchedFilesRegistrationOptions` to use relative glob patterns which do work on windows in VSCode.
If I run the following code with
rustc -O --test src/slice_equality_slow.rs && ./slice_equality_slow --bench
:Then I get:
I ran into this because I was able to speed up the naive string matching algorithm in
core::str
by replacing the slice equality with afor
loop and comparing each component manually.The text was updated successfully, but these errors were encountered: