ARM considerations for VirtualMachine #2771

Scooletz · 2021-02-09T17:18:18Z

I revisited the simd optimized code for VirtualMachine and I failed to find a way to improve it for ARM greatly. A few observations that sums up my reasoning/thinking:

the current condition is Vector<byte>.Count == 32 which fails for ARM which is 128 bit long
the are 4 bitwise VM opcodes starting from https://ethervm.io/#16
I tried to find another match in system.runtime.intrinsics.arm but failed to find any

and this brought me to a stale mate. What I could now is to refactor VirtualMachine so that supports Vector<byte>.Count == 16. This would require at least some testing or viewing the code. I could use https://github.com/EgorBo/Disasmo for ARM. Unfortunately, due to dotnet/runtime#41518 it would fail because

However, certain JIT settings, such as the ISAs used, are largely dependent on the machine being run against and so you cannot currently check codegen for something like X86.Avx2 if your underlying hardware doesn't support it. Nor can you check for something like Arm.Dp from a x64 machine.

This would mean that without a physical ARM I could not look at the code. I digged a bit to find more about intrinsics and their actual gains in the following ones:

Later I moved to find the output and the overhead for a specific bitwise case of the vectorized and not vectorized code:

ASIMD logical AND, BIC, EOR, MOV, MVN, ORN, ORR, NOT Latency: 2, Throughput: 2, Pipeline: V
Logical, shift, no flagset AND, BIC, EON, EOR, ORN, ORR Latency: 1 Throughput: Pipeline: I

where:

Latency, defined as the minimum latency seen by an operation dependent on an instruction in the described group.
Execution Throughput defined as the maximum throughput (in instructions per cycle) of the specified instruction group that can be achieved in the entirety of the Cortex-A76 microarchitecture.

Having this numbers I can't see a clear win scenario for a simple bitwise operations. Currently, they are unrolled in as long and moving it to vector seems to be not that beneficial at all

The text was updated successfully, but these errors were encountered:

Scooletz · 2021-02-09T17:18:37Z

FYI according to our conversation @tkstanczak

Scooletz closed this as completed Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM considerations for VirtualMachine #2771

ARM considerations for VirtualMachine #2771

Scooletz commented Feb 9, 2021

Scooletz commented Feb 9, 2021

ARM considerations for VirtualMachine #2771

ARM considerations for VirtualMachine #2771

Comments

Scooletz commented Feb 9, 2021

Scooletz commented Feb 9, 2021