internal: Speed up line index calculation via NEON for aarch64 #16350

roife · 2024-01-12T08:04:28Z

This commit provides SIMD acceleration (via NEON) for line-index library on aarch64 architecture, which improves performance for Apple Silicon users (and potentially for future aarch64-based chips).

The algorithm used here follows the same process as the original implementation using SSE2. Most of the vector instructions in SSE2 have corresponding parts in neon. The only issue is that there is no corresponding instruction for _mm_movemask_epi8 in neon. To address this problem, I referred to the article at https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon.

roife · 2024-01-13T07:56:50Z

Similarly, LineEndings::normalize in global_state.rs (replacing \r\n with \n) is called every time process_change is invoked, can we attempt to optimize this using SIMD technology? If this is feasible, I would like to give it a try.

Young-Flash · 2024-01-13T10:13:41Z

curious about the performance improvement of the modified version, have you done any comparison testing?

cargo run --release -p rust-analyzer -- analysis-stats project-to-analyze-path

lnicola · 2024-01-13T10:32:32Z

I don't think this is going to be visible in analysis-stats, LineIndex is only used while typing.

can we attempt to optimize this using SIMD technology

I think so, but I don't know if it's worth it, as lines are on average, what, 20 characters long? So if the file uses \r\n we're not going to stay on the SIMD path for long. There might be a better way, but I don't know it.

Of course, we could still have a fast check for \r (even using memchr).

roife · 2024-01-13T10:41:21Z

So if the file uses \r\n we're not going to stay on the SIMD path for long.

You are right, the scenarios handled by LineEndings::normalize and LineIndex are quite different, and it may not necessarily benefit from SIMD.

Regarding the acceleration of neon for LineIndex calculation, I conducted tests on a large source code file I constructed myself, and the improvement compared to the original method is about 20% -- Of course, due to the overall short execution time, this improvement is in the millisecond range. However, because LineIndex calculation may occur every time there is a TextChange notification (and may even happen multiple times in one notification handling), the cumulative improvement can be more significant.

Additionally, using SIMD might also enhance the energy efficiency of r-a -- according to my friend, he found that 10% of the power consumption in r-a is spent on calculating LineIndex after his tests.

lnicola · 2024-01-13T11:04:32Z

Regarding the acceleration of neon for LineIndex calculation, I conducted tests on a large source code file I constructed myself, and the improvement compared to the original method is about 20%

Ouch, 20% for what, 8 chars at a time is pretty sad. Not your fault, of course.

Additionally, using SIMD might also enhance the energy efficiency of r-a -- according to my friend, he found that 10% of the power consumption in r-a is spent on calculating LineIndex after his tests.

Wow, that's impressive!

roife · 2024-01-13T11:15:33Z

Ouch, 20% for what, 8 chars at a time is pretty sad. Not your fault, of course.

Oh, sorry, I noticed a flaw in my previous testing: I didn't conduct the tests in --release. I repeated the experiment with --release, using several files from rust-analyzer. To make the test results more significant, I concatenated multiple files for testing.

Without using neon: 11.222584ms
With neon: 1.744458ms (speed up ~ 6.5x)

So, this is a significant improvement - in terms of LineIndex calculation.

Veykril · 2024-01-16T09:11:02Z

Thanks!
@bors r+

bors · 2024-01-16T09:11:05Z

📌 Commit df53828 has been approved by Veykril

It is now in the queue for this repository.

bors · 2024-01-16T09:12:13Z

⌛ Testing commit df53828 with merge 18abb12...

bors · 2024-01-16T09:22:50Z

☀️ Test successful - checks-actions
Approved by: Veykril
Pushing 18abb12 to master...

internal: Speedup line index calculation via NEON for aarch64

7c3744e

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 12, 2024

internal: add inline for move_mask in line-index

df53828

bors merged commit 18abb12 into rust-lang:master Jan 16, 2024
10 checks passed

lnicola changed the title ~~internal: Speedup line index calculation via NEON for aarch64~~ internal: Speed up line index calculation via NEON for aarch64 Jan 20, 2024

roife deleted the neon-support-for-line-index branch February 27, 2024 06:20

philips-software-forest-releaser bot mentioned this pull request Jun 4, 2024

chore(deps,rust): update rust-lang.rust-analyzer philips-software/amp-devcontainer#436

Merged

This was referenced Jun 11, 2024

chore(deps,rust): update rust-lang.rust-analyzer philips-software/amp-devcontainer#445

Merged

chore(deps,rust): update rust-lang.rust-analyzer philips-software/amp-devcontainer#447

Merged

philips-software-forest-releaser bot mentioned this pull request Jun 18, 2024

chore(deps,rust): update rust-lang.rust-analyzer philips-software/amp-devcontainer#456

Merged

philips-software-forest-releaser bot mentioned this pull request Jun 25, 2024

chore(deps,rust): update rust-lang.rust-analyzer philips-software/amp-devcontainer#469

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal: Speed up line index calculation via NEON for aarch64 #16350

internal: Speed up line index calculation via NEON for aarch64 #16350

roife commented Jan 12, 2024 •

edited

Loading

roife commented Jan 13, 2024 •

edited

Loading

Young-Flash commented Jan 13, 2024

lnicola commented Jan 13, 2024

roife commented Jan 13, 2024 •

edited

Loading

lnicola commented Jan 13, 2024

roife commented Jan 13, 2024 •

edited

Loading

Veykril commented Jan 16, 2024

bors commented Jan 16, 2024

bors commented Jan 16, 2024

bors commented Jan 16, 2024

internal: Speed up line index calculation via NEON for aarch64 #16350

internal: Speed up line index calculation via NEON for aarch64 #16350

Conversation

roife commented Jan 12, 2024 • edited Loading

roife commented Jan 13, 2024 • edited Loading

Young-Flash commented Jan 13, 2024

lnicola commented Jan 13, 2024

roife commented Jan 13, 2024 • edited Loading

lnicola commented Jan 13, 2024

roife commented Jan 13, 2024 • edited Loading

Veykril commented Jan 16, 2024

bors commented Jan 16, 2024

bors commented Jan 16, 2024

bors commented Jan 16, 2024

roife commented Jan 12, 2024 •

edited

Loading

roife commented Jan 13, 2024 •

edited

Loading

roife commented Jan 13, 2024 •

edited

Loading

roife commented Jan 13, 2024 •

edited

Loading