Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

included range in for loop gives more assembly code than excluded range #102462

Open
melonges opened this issue Sep 29, 2022 · 9 comments
Open

included range in for loop gives more assembly code than excluded range #102462

melonges opened this issue Sep 29, 2022 · 9 comments
Assignees
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@melonges
Copy link

melonges commented Sep 29, 2022

Given the following code where source will be excluded

pub fn store(source: &i32, dest: &mut i32) -> i32 {
    let mut accum = 0;
    for _ in 0..*source {  // source will be excluded
        accum += *dest;
    }
    accum
}

The assembly output is:

example::store:
        mov     ecx, dword ptr [rdi]
        mov     edx, dword ptr [rsi]
        imul    edx, ecx
        xor     eax, eax
        test    ecx, ecx
        cmovg   eax, edx
        ret

if source will be included the assembly code will increase more than

pub fn store(source: &i32, dest: &mut i32) -> i32 {
    let mut accum = 0;
    for _ in 0..=*source { //  source will be included
        accum += *dest;
    }
    accum
}

The assembly output is:

example::store:
        mov     ecx, dword ptr [rdi]
        test    ecx, ecx
        js      .LBB0_1
        mov     r8d, dword ptr [rsi]
        xor     eax, eax
        xor     esi, esi
.LBB0_3:
        xor     edi, edi
        cmp     esi, ecx
        setl    dl
        add     eax, r8d
        cmp     esi, ecx
        jge     .LBB0_5
        mov     dil, dl
        add     esi, edi
        cmp     esi, ecx
        jle     .LBB0_3
.LBB0_5:
        ret
.LBB0_1:
        xor     eax, eax
        ret

I tested on rustc 1.64.0 with -C opt-level=3

@melonges melonges added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 29, 2022
@Rageking8
Copy link
Contributor

@rustbot label -A-diagnostics +A-codegen +I-heavy

@rustbot rustbot added A-codegen Area: Code generation I-heavy Issue: Problems and improvements with respect to binary size of generated code. and removed A-diagnostics Area: Messages for errors, warnings, and lints labels Sep 29, 2022
@lukas-code
Copy link
Member

This regressed between 1.42 and 1.43: Godbolt link

related: #45222

@leonardo-m
Copy link

(If you have two nested loop, the code is bad even in older compilers).

@jdahlstrom
Copy link

jdahlstrom commented Oct 1, 2022

This regressed between 1.42 and 1.43: Godbolt link

The 1.42 codegen is tricky, as it seems at first that it doesn't account for the case where *source == i32::MAX and just goes add eax, 1. But it ends up giving the expected result, which is 0 (mod 2^31) no matter the value of *dest. But in many other cases inclusive ranges necessarily produce suboptimal code because the T::MAX case requires special handling.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Oct 1, 2022
@nikic
Copy link
Contributor

nikic commented Oct 1, 2022

Simpler example without the unnecessary memory indirections: https://rust.godbolt.org/z/GPvrbvbYY

External iteration over inclusive ranges is well known to optimize badly. Optimization may be viable in this particular case though.

@nikic
Copy link
Contributor

nikic commented Dec 21, 2022

This is the fold we'd need: https://alive2.llvm.org/ce/z/JsWRvT

@melonges
Copy link
Author

how is the problem right now guys?

@ilyvion
Copy link

ilyvion commented Dec 22, 2023

Came across this weirdness today; it's still a problem in December 2023:

pub fn inclusive() -> usize {
    (2..=100).step_by(2).sum::<usize>()
}

pub fn exclusive() -> usize {
    (2..101).step_by(2).sum::<usize>()
}

produces assembly with 46 lines and 3 lines, respectively. Was going to create a new bug for it, but I'm pretty sure it's just another version of this one.

@veera-sivarajan
Copy link
Contributor

@rustbot claim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants