-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal: Speed up line index calculation via NEON for aarch64 #16350
Conversation
Similarly, |
curious about the performance improvement of the modified version, have you done any comparison testing?
|
I don't think this is going to be visible in
I think so, but I don't know if it's worth it, as lines are on average, what, 20 characters long? So if the file uses Of course, we could still have a fast check for |
You are right, the scenarios handled by Regarding the acceleration of neon for LineIndex calculation, I conducted tests on a large source code file I constructed myself, and the improvement compared to the original method is about 20% -- Of course, due to the overall short execution time, this improvement is in the millisecond range. However, because LineIndex calculation may occur every time there is a TextChange notification (and may even happen multiple times in one notification handling), the cumulative improvement can be more significant. Additionally, using SIMD might also enhance the energy efficiency of r-a -- according to my friend, he found that 10% of the power consumption in r-a is spent on calculating LineIndex after his tests. |
Ouch, 20% for what, 8 chars at a time is pretty sad. Not your fault, of course.
Wow, that's impressive! |
Oh, sorry, I noticed a flaw in my previous testing: I didn't conduct the tests in
So, this is a significant improvement - in terms of |
Thanks! |
☀️ Test successful - checks-actions |
This commit provides SIMD acceleration (via NEON) for
line-index
library on aarch64 architecture, which improves performance for Apple Silicon users (and potentially for future aarch64-based chips).The algorithm used here follows the same process as the original implementation using SSE2. Most of the vector instructions in SSE2 have corresponding parts in neon. The only issue is that there is no corresponding instruction for
_mm_movemask_epi8
in neon. To address this problem, I referred to the article at https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon.