Improve F14find* by 5%-10% on Aarch64 #2378

Nicoshev · 2025-02-03T20:26:57Z

Summary:
The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization.

This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs.

Following disasm shows all theoretical benefits being exercised:

before:

2dcd54: 91000508 add x8, x8, #0x1
2dcd58: 9ac9250f lsr x15, x8, x9
2dcd5c: 8b1001ce add x14, x14, x16
2dcd60: b4fffccf cbz x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>

after:

2dce14: f100054a subs x10, x10, #0x1
2dce18: 8b0e01ad add x13, x13, x14
2dce1c: 54fffce1 b.ne 2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> // b.any

Reviewed By: Gownta, embg

Differential Revision: D69056923

Summary: The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization. This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs. Following disasm shows all theoretical benefits being exercised: before: 2dcd54: 91000508 add x8, x8, #0x1 2dcd58: 9ac9250f lsr x15, x8, x9 2dcd5c: 8b1001ce add x14, x14, x16 2dcd60: b4fffccf cbz x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> after: 2dce14: f100054a subs x10, x10, #0x1 2dce18: 8b0e01ad add x13, x13, x14 2dce1c: 54fffce1 b.ne 2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> // b.any Reviewed By: Gownta, embg Differential Revision: D69056923

facebook-github-bot · 2025-02-03T20:27:07Z

This pull request was exported from Phabricator. Differential Revision: D69056923

facebook-github-bot · 2025-02-04T04:49:27Z

This pull request was exported from Phabricator. Differential Revision: D69056923

facebook-github-bot · 2025-02-04T16:16:24Z

This pull request has been merged in 9d0b066.

Summary: X-link: facebook/folly#2378 The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization. This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs. Following disasm shows all theoretical benefits being exercised: before: 2dcd54: 91000508 add x8, x8, #0x1 2dcd58: 9ac9250f lsr x15, x8, x9 2dcd5c: 8b1001ce add x14, x14, x16 2dcd60: b4fffccf cbz x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> after: 2dce14: f100054a subs x10, x10, #0x1 2dce18: 8b0e01ad add x13, x13, x14 2dce1c: 54fffce1 b.ne 2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> // b.any Reviewed By: Gownta, embg Differential Revision: D69056923 fbshipit-source-id: 2e7216986a751aade943985f2b43ee4e7edda4fa

facebook-github-bot added the CLA Signed label Feb 3, 2025

facebook-github-bot added the fb-exported label Feb 3, 2025

facebook-github-bot closed this in 9d0b066 Feb 4, 2025

facebook-github-bot added the Merged label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve F14find* by 5%-10% on Aarch64 #2378

Improve F14find* by 5%-10% on Aarch64 #2378

Nicoshev commented Feb 3, 2025

facebook-github-bot commented Feb 3, 2025

facebook-github-bot commented Feb 4, 2025

facebook-github-bot commented Feb 4, 2025

Improve F14find* by 5%-10% on Aarch64 #2378

Improve F14find* by 5%-10% on Aarch64 #2378

Conversation

Nicoshev commented Feb 3, 2025

facebook-github-bot commented Feb 3, 2025

facebook-github-bot commented Feb 4, 2025

facebook-github-bot commented Feb 4, 2025