Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve F14find* by 5%-10% on Aarch64 #2378

Closed
wants to merge 1 commit into from

Conversation

Nicoshev
Copy link
Contributor

@Nicoshev Nicoshev commented Feb 3, 2025

Summary:
The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization.

This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs.

Following disasm shows all theoretical benefits being exercised:

before:

2dcd54: 91000508 add x8, x8, #0x1
2dcd58: 9ac9250f lsr x15, x8, x9
2dcd5c: 8b1001ce add x14, x14, x16
2dcd60: b4fffccf cbz x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>

after:

2dce14: f100054a subs x10, x10, #0x1
2dce18: 8b0e01ad add x13, x13, x14
2dce1c: 54fffce1 b.ne 2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4> // b.any

Reviewed By: Gownta, embg

Differential Revision: D69056923

Summary:
The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization.

This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs.

Following disasm shows all theoretical benefits being exercised:

before:

  2dcd54:	91000508 	add	x8, x8, #0x1
  2dcd58:	9ac9250f 	lsr	x15, x8, x9
  2dcd5c:	8b1001ce 	add	x14, x14, x16
  2dcd60:	b4fffccf 	cbz	x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>

after:

  2dce14:	f100054a 	subs	x10, x10, #0x1
  2dce18:	8b0e01ad 	add	x13, x13, x14
  2dce1c:	54fffce1 	b.ne	2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>  // b.any

Reviewed By: Gownta, embg

Differential Revision: D69056923
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69056923

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69056923

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 9d0b066.

facebook-github-bot pushed a commit to facebook/hhvm that referenced this pull request Feb 4, 2025
Summary:
X-link: facebook/folly#2378

The diff simplifies the loop within F14find* by moving a shift operation from the condition to initialization.

This removes the need to perform a shift on each iteration. It also reduces the number of values needed simultaneously, potentially improving CPU register usage. Additionally, on aarch64 allows the usage of instruction subs.

Following disasm shows all theoretical benefits being exercised:

before:

  2dcd54:	91000508 	add	x8, x8, #0x1
  2dcd58:	9ac9250f 	lsr	x15, x8, x9
  2dcd5c:	8b1001ce 	add	x14, x14, x16
  2dcd60:	b4fffccf 	cbz	x15, 2dccf8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>

after:

  2dce14:	f100054a 	subs	x10, x10, #0x1
  2dce18:	8b0e01ad 	add	x13, x13, x14
  2dce1c:	54fffce1 	b.ne	2dcdb8 <_ZN30F14Map_equalityRefinement_Test8TestBodyEv+0x2f4>  // b.any

Reviewed By: Gownta, embg

Differential Revision: D69056923

fbshipit-source-id: 2e7216986a751aade943985f2b43ee4e7edda4fa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants