Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Zen5 tuning part 3: scheduler tweaks
this patch adds support for new fussion in znver5 documented in the optimization manual: The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions with certain ALU instructions. The following conditions need to be met for fusion to happen: - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B - The MOV is followed by an ALU instruction where the MOV and ALU destination register match. - The ALU instruction may source only registers or immediate data. There cannot be any memory source. - The ALU instruction sources either the source or dest of MOV instruction. - If ALU instruction has 2 reg sources, they should be different. - The following ALU instructions can fuse with an older qualified MOV instruction: ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR (I assume OP is OR) I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but with our model we can't realy use it. Increasing issue rate to 8 leads to infinite loop in scheduler. Finally, I also enabled fuse_alu_and_branch since it is supported by znver5 (I think by earlier zens too). New fussion pattern moves quite few instructions around in common code: @@ -2210,13 +2210,13 @@ .cfi_offset 3, -32 leaq 63(%rsi), %rbx movq %rbx, %rbp + shrq $6, %rbp + salq $3, %rbp subq $16, %rsp .cfi_def_cfa_offset 48 movq %rdi, %r12 - shrq $6, %rbp - movq %rsi, 8(%rsp) - salq $3, %rbp movq %rbp, %rdi + movq %rsi, 8(%rsp) call _Znwm movq 8(%rsp), %rsi movl $0, 8(%r12) @@ -2224,8 +2224,8 @@ movq %rax, (%r12) movq %rbp, 32(%r12) testq %rsi, %rsi - movq %rsi, %rdx cmovns %rsi, %rbx + movq %rsi, %rdx sarq $63, %rdx shrq $58, %rdx sarq $6, %rbx which should help decoder bandwidth and perhaps also cache, though I was not able to measure off-noise effect on SPEC. gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5. (ix86_adjust_cost): Add TODO about znver5 memory latency. (ix86_fuse_mov_alu_p): New. (ix86_macro_fusion_pair_p): Use it. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5. (X86_TUNE_FUSE_MOV_AND_ALU): New tune;
- Loading branch information