Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization when looping over bytes of a value #133528

Closed
theemathas opened this issue Nov 27, 2024 · 10 comments
Closed

Missed optimization when looping over bytes of a value #133528

theemathas opened this issue Nov 27, 2024 · 10 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@theemathas
Copy link
Contributor

I tried this code, which contains 3 functions which check if all the bits in a u64 are all ones:

#[no_mangle]
fn ne_bytes(input: u64) -> bool {
    let bytes = input.to_ne_bytes();
    bytes.iter().all(|x| *x == !0)
}

#[no_mangle]
fn black_box_ne_bytes(input: u64) -> bool {
    let bytes = input.to_ne_bytes();
    let bytes = std::hint::black_box(bytes);
    bytes.iter().all(|x| *x == !0)
}

#[no_mangle]
fn direct(input: u64) -> bool {
    input == !0
}

I expected to see this happen: ne_bytes() should be optimized to the same thing as direct(), while black_box_ne_bytes() should be optimized slightly worse

Instead, this happened: I got the following assembly, where ne_bytes() is somehow optimized worse than black_box_ne_bytes()

ne_bytes:
        mov     rax, rdi
        not     rax
        shl     rax, 8
        sete    cl
        shr     rdi, 56
        cmp     edi, 255
        setae   al
        and     al, cl
        ret

black_box_ne_bytes:
        mov     qword ptr [rsp - 8], rdi
        lea     rax, [rsp - 8]
        cmp     qword ptr [rsp - 8], -1
        sete    al
        ret

direct:
        cmp     rdi, -1
        sete    al
        ret

Godbolt

Meta

Reproducible on godbolt with stable rustc 1.82.0 (f6e511eec 2024-10-15) and nightly rustc 1.85.0-nightly (7db7489f9 2024-11-25)

@theemathas theemathas added the C-bug Category: This is a bug. label Nov 27, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 27, 2024
@workingjubilee workingjubilee added C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels Nov 27, 2024
@purplesyringa
Copy link
Contributor

purplesyringa commented Nov 27, 2024

As far as I can see, something very similar is at least partially fixed on LLVM trunk: https://godbolt.org/z/MzqG7rf9d. There's also another similar issue: llvm/llvm-project#117853, but I'm not sure if it's relevant to this particular issue.

@purplesyringa

This comment has been minimized.

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Nov 27, 2024
@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 27, 2024
@clubby789
Copy link
Contributor

Looks like all 3 functions optimize to the same thing on LLVM trunk opt

@clubby789 clubby789 added the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes label Nov 27, 2024
@DianQK DianQK added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. and removed llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes labels Feb 21, 2025
@madhav-madhusoodanan
Copy link
Contributor

Hi
I was learning how LLVM integrates with rustc when I came across this issue.

Just wanted to know if this issue depends on LLVM releasing a new version that optimizes per-byte comparisons.

@DianQK
Copy link
Member

DianQK commented Feb 25, 2025

Just wanted to know if this issue depends on LLVM releasing a new version that optimizes per-byte comparisons.

The master branch has included the new version (LLVM 20).

@madhav-madhusoodanan
Copy link
Contributor

Got it, thank you!

@karolzwolak
Copy link
Contributor

On the nightly, direct and ne_bytes produce identical code, and black_box_ne_bytes is not too far off.
current output from godbolt

black_box_ne_bytes:
        mov     qword ptr [rsp - 8], rdi
        lea     rax, [rsp - 8]
        cmp     qword ptr [rsp - 8], -1
        sete    al
        ret

direct:
        cmp     rdi, -1
        sete    al
        ret

I'd like to work on this issue, but I'm not sure how to go about writing good tests for this. I guess we could just test that direct and ne_bytes produce identical assembly, but what about black_box_ne_bytes?
Could someone guide me on how to implement in a sound way?

@clubby789
Copy link
Contributor

@karolzwolak You should check the codegen tests (https://github.com/rust-lang/rust/blob/master/tests/codegen/) for example. You'll want to test that ne_bytes compiles to

define noundef zeroext i1 @ne_bytes(i64 noundef %input) unnamed_addr {
start:
  %_0 = icmp eq i64 %input, -1
  ret i1 %_0
}

Which will need a CHECK-LABEL, a CHECK for the icmp then a CHECK-NEXT for the ret

black_box_ne_bytes can be ignored - it has the same optimization, just with some extra stack moves

@karolzwolak
Copy link
Contributor

Thanks, really appreciate the help.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 1, 2025
…133528, r=DianQK

test(codegen): add looping_over_ne_bytes test for rust-lang#133528

Adds test for rust-lang#133528.
I renamed the function to `looping_over_ne_bytes` to better reflect that it is doing.
I also set the min llvm version to 20 as this was presumably a llvm bug that was fixed in version 20.
I didn't tie the test to any specific architecture, as we are testing llvm output.
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 2, 2025
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#136865 (Perform deeper compiletest path normalization for `$TEST_BUILD_DIR` to account for compare-mode/debugger cases, and normalize long type file filename hashes)
 - rust-lang#136922 (Pattern types: Avoid having to handle an Option for range ends in the type system or the HIR)
 - rust-lang#137081 (change config.toml to bootstrap.toml)
 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

Failed merges:

 - rust-lang#137147 (Add exclude to config.toml)

r? `@ghost`
`@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 2, 2025
…133528, r=DianQK

test(codegen): add looping_over_ne_bytes test for rust-lang#133528

Adds test for rust-lang#133528.
I renamed the function to `looping_over_ne_bytes` to better reflect that it is doing.
I also set the min llvm version to 20 as this was presumably a llvm bug that was fixed in version 20.
I didn't tie the test to any specific architecture, as we are testing llvm output.
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 2, 2025
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#136865 (Perform deeper compiletest path normalization for `$TEST_BUILD_DIR` to account for compare-mode/debugger cases, and normalize long type file filename hashes)
 - rust-lang#137081 (change config.toml to bootstrap.toml)
 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

r? `@ghost`
`@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 2, 2025
…133528, r=DianQK

test(codegen): add looping_over_ne_bytes test for rust-lang#133528

Adds test for rust-lang#133528.
I renamed the function to `looping_over_ne_bytes` to better reflect that it is doing.
I also set the min llvm version to 20 as this was presumably a llvm bug that was fixed in version 20.
I didn't tie the test to any specific architecture, as we are testing llvm output.
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2025
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#136865 (Perform deeper compiletest path normalization for `$TEST_BUILD_DIR` to account for compare-mode/debugger cases, and normalize long type file filename hashes)
 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

r? `@ghost`
`@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 3, 2025
…133528, r=DianQK

test(codegen): add looping_over_ne_bytes test for rust-lang#133528

Adds test for rust-lang#133528.
I renamed the function to `looping_over_ne_bytes` to better reflect that it is doing.
I also set the min llvm version to 20 as this was presumably a llvm bug that was fixed in version 20.
I didn't tie the test to any specific architecture, as we are testing llvm output.
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2025
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2025
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2025
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2025
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2025
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#137103 ({json|html}docck: catch and error on deprecated syntax)
 - rust-lang#137632 (rustdoc: when merging target features, keep the highest stability)
 - rust-lang#137684 (Add rustdoc support for `--emit=dep-info[=path]`)
 - rust-lang#137794 (make qnx pass a test)
 - rust-lang#137801 (tests: Unignore target modifier tests on all platforms)
 - rust-lang#137826 (test(codegen): add looping_over_ne_bytes test for rust-lang#133528)

r? `@ghost`
`@rustbot` modify labels: rollup
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2025
Rollup merge of rust-lang#137826 - karolzwolak:looping_over_ne_bytes_133528, r=DianQK

test(codegen): add looping_over_ne_bytes test for rust-lang#133528

Adds test for rust-lang#133528.
I renamed the function to `looping_over_ne_bytes` to better reflect that it is doing.
I also set the min llvm version to 20 as this was presumably a llvm bug that was fixed in version 20.
I didn't tie the test to any specific architecture, as we are testing llvm output.
@theemathas
Copy link
Contributor Author

Closed by #137826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants