Improve F14find* by 5%-10% on Aarch64 #2378
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization.
This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs.
Following disasm shows all theoretical benefits being exercised:
before:
2dcd54: 91000508 add x8, x8, #0x1
2dcd58: 9ac9250f lsr x15, x8, x9
2dcd5c: 8b1001ce add x14, x14, x16
2dcd60: b4fffccf cbz x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>
after:
2dce14: f100054a subs x10, x10, #0x1
2dce18: 8b0e01ad add x13, x13, x14
2dce1c: 54fffce1 b.ne 2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> // b.any
Reviewed By: Gownta, embg
Differential Revision: D69056923