Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal: Speed up line index calculation via NEON for aarch64 #16350

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

roife
Copy link
Member

@roife roife commented Jan 12, 2024

This commit provides SIMD acceleration (via NEON) for line-index library on aarch64 architecture, which improves performance for Apple Silicon users (and potentially for future aarch64-based chips).

The algorithm used here follows the same process as the original implementation using SSE2. Most of the vector instructions in SSE2 have corresponding parts in neon. The only issue is that there is no corresponding instruction for _mm_movemask_epi8 in neon. To address this problem, I referred to the article at https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 12, 2024
@roife
Copy link
Member Author

roife commented Jan 13, 2024

Similarly, LineEndings::normalize in global_state.rs (replacing \r\n with \n) is called every time process_change is invoked, can we attempt to optimize this using SIMD technology? If this is feasible, I would like to give it a try.

@Young-Flash
Copy link
Member

curious about the performance improvement of the modified version, have you done any comparison testing?

cargo run --release -p rust-analyzer -- analysis-stats project-to-analyze-path

@lnicola
Copy link
Member

lnicola commented Jan 13, 2024

I don't think this is going to be visible in analysis-stats, LineIndex is only used while typing.

can we attempt to optimize this using SIMD technology

I think so, but I don't know if it's worth it, as lines are on average, what, 20 characters long? So if the file uses \r\n we're not going to stay on the SIMD path for long. There might be a better way, but I don't know it.

Of course, we could still have a fast check for \r (even using memchr).

@roife
Copy link
Member Author

roife commented Jan 13, 2024

So if the file uses \r\n we're not going to stay on the SIMD path for long.

You are right, the scenarios handled by LineEndings::normalize and LineIndex are quite different, and it may not necessarily benefit from SIMD.


Regarding the acceleration of neon for LineIndex calculation, I conducted tests on a large source code file I constructed myself, and the improvement compared to the original method is about 20% -- Of course, due to the overall short execution time, this improvement is in the millisecond range. However, because LineIndex calculation may occur every time there is a TextChange notification (and may even happen multiple times in one notification handling), the cumulative improvement can be more significant.

Additionally, using SIMD might also enhance the energy efficiency of r-a -- according to my friend, he found that 10% of the power consumption in r-a is spent on calculating LineIndex after his tests.

@lnicola
Copy link
Member

lnicola commented Jan 13, 2024

Regarding the acceleration of neon for LineIndex calculation, I conducted tests on a large source code file I constructed myself, and the improvement compared to the original method is about 20%

Ouch, 20% for what, 8 chars at a time is pretty sad. Not your fault, of course.

Additionally, using SIMD might also enhance the energy efficiency of r-a -- according to my friend, he found that 10% of the power consumption in r-a is spent on calculating LineIndex after his tests.

Wow, that's impressive!

@roife
Copy link
Member Author

roife commented Jan 13, 2024

Ouch, 20% for what, 8 chars at a time is pretty sad. Not your fault, of course.

Oh, sorry, I noticed a flaw in my previous testing: I didn't conduct the tests in --release. I repeated the experiment with --release, using several files from rust-analyzer. To make the test results more significant, I concatenated multiple files for testing.

  • Without using neon: 11.222584ms
  • With neon: 1.744458ms (speed up ~ 6.5x)

So, this is a significant improvement - in terms of LineIndex calculation.

@Veykril
Copy link
Member

Veykril commented Jan 16, 2024

Thanks!
@bors r+

@bors
Copy link
Contributor

bors commented Jan 16, 2024

📌 Commit df53828 has been approved by Veykril

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Jan 16, 2024

⌛ Testing commit df53828 with merge 18abb12...

@bors
Copy link
Contributor

bors commented Jan 16, 2024

☀️ Test successful - checks-actions
Approved by: Veykril
Pushing 18abb12 to master...

@bors bors merged commit 18abb12 into rust-lang:master Jan 16, 2024
10 checks passed
@lnicola lnicola changed the title internal: Speedup line index calculation via NEON for aarch64 internal: Speed up line index calculation via NEON for aarch64 Jan 20, 2024
@roife roife deleted the neon-support-for-line-index branch February 27, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants