Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
JIT: Add a (disabled) prototype for a generalized promotion pass (#83388
) Introduce a "physical" promotion pass that generalizes the existing promotion. More specifically, it does not have restrictions on field count and it can handle arbitrary recursive promotion. The pass is physical in the sense that it does not rely on any field metadata for structs. Instead, it works in two separate passes over the IR: 1. In the first pass we find and analyze how unpromoted struct locals are accessed. For example, for a simple program like: ``` public static void Main() { S s = default; Call(s, s.C); Console.WriteLine(s.B + s.C); } [MethodImpl(MethodImplOptions.NoInlining)] private static void Call(S s, byte b) { } private struct S { public byte A, B, C, D, E; } ``` we see IR like: ``` ***** BB01 STMT00000 ( 0x000[E-] ... 0x003 ) [000003] IA--------- ▌ ASG struct (init) [000001] D------N--- ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0 [000002] ----------- └──▌ CNS_INT int 0 ***** BB01 STMT00001 ( 0x008[E-] ... 0x026 ) [000008] --C-G------ ▌ CALL void Program:Call(Program+S,ubyte) [000004] ----------- arg0 ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0 [000007] ----------- arg1 └──▌ LCL_FLD ubyte V00 loc0 [+2] ***** BB01 STMT00002 ( 0x014[E-] ... ??? ) [000016] --C-G------ ▌ CALL void System.Console:WriteLine(int) [000015] ----------- arg0 └──▌ ADD int [000011] ----------- ├──▌ LCL_FLD ubyte V00 loc0 [+1] [000014] ----------- └──▌ LCL_FLD ubyte V00 loc0 [+2] ``` and the analysis produces ``` Accesses for V00 [000..005) #: (2, 200) # assigned from: (0, 0) # assigned to: (1, 100) # as call arg: (1, 100) # as implicit by-ref call arg: (1, 100) # as on-stack call arg: (0, 0) # as retbuf: (0, 0) # as returned value: (0, 0) ubyte @ 001 #: (1, 100) # assigned from: (0, 0) # assigned to: (0, 0) # as call arg: (0, 0) # as implicit by-ref call arg: (0, 0) # as on-stack call arg: (0, 0) # as retbuf: (0, 0) # as returned value: (0, 0) ubyte @ 002 #: (2, 200) # assigned from: (0, 0) # assigned to: (0, 0) # as call arg: (1, 100) # as implicit by-ref call arg: (0, 0) # as on-stack call arg: (0, 0) # as retbuf: (0, 0) # as returned value: (0, 0) ``` Here the pairs are (#ref counts, wtd ref counts). Based on this accounting, the analysis estimates the profitability of replacing some of the accessed parts of the struct with a local. This may be costly because overlapping struct accesses (e.g. passing the whole struct as an argument) may require more expensive codegen after promotion. And of course, creating new locals introduces more register pressure. Currently the profitability analysis is very crude. In this case the logic decides that promotion is not worth it: ``` Evaluating access ubyte @ 001 Single write-back cost: 5 Write backs: 100 Read backs: 100 Cost with: 1350 Cost without: 650 Disqualifying replacement Evaluating access ubyte @ 002 Single write-back cost: 5 Write backs: 100 Read backs: 100 Cost with: 1700 Cost without: 1300 Disqualifying replacement ``` 2. In the second pass the field accesses are replaced with new locals for the profitable cases. For overlapping accesses that currently involves writing back replacements to the struct local first. For arguments/OSR locals, it involves reading them back from the struct first. In the above case we can override the profitability analysis with stress mode STRESS_PHYSICAL_PROMOTION_COST and we get: ``` Evaluating access ubyte @ 001 Single write-back cost: 5 Write backs: 100 Read backs: 100 Cost with: 1350 Cost without: 650 Promoting replacement due to stress lvaGrabTemp returning 2 (V02 tmp1) (a long lifetime temp) called for V00.[001..002). Evaluating access ubyte @ 002 Single write-back cost: 5 Write backs: 100 Read backs: 100 Cost with: 1700 Cost without: 1300 Promoting replacement due to stress lvaGrabTemp returning 3 (V03 tmp2) (a long lifetime temp) called for V00.[002..003). V00 promoted with 2 replacements [001..002) promoted as ubyte V02 [002..003) promoted as ubyte V03 ... ***** BB01 STMT00000 ( 0x000[E-] ... 0x003 ) [000003] IA--------- ▌ ASG struct (init) [000001] D------N--- ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0 [000002] ----------- └──▌ CNS_INT int 0 ***** BB01 STMT00001 ( 0x008[E-] ... 0x026 ) [000008] -ACXG------ ▌ CALL void Program:Call(Program+S,ubyte) [000004] ----------- arg0 ├──▌ LCL_VAR struct<Program+S, 5> V00 loc0 [000022] -A--------- arg1 └──▌ COMMA ubyte [000021] -A--------- ├──▌ ASG ubyte [000019] D------N--- │ ├──▌ LCL_VAR ubyte V03 tmp2 [000020] ----------- │ └──▌ LCL_FLD ubyte V00 loc0 [+2] [000018] ----------- └──▌ LCL_VAR ubyte V03 tmp2 ***** BB01 STMT00002 ( 0x014[E-] ... ??? ) [000016] -ACXG------ ▌ CALL void System.Console:WriteLine(int) [000015] -A--------- arg0 └──▌ ADD int [000027] -A--------- ├──▌ COMMA ubyte [000026] -A--------- │ ├──▌ ASG ubyte [000024] D------N--- │ │ ├──▌ LCL_VAR ubyte V02 tmp1 [000025] ----------- │ │ └──▌ LCL_FLD ubyte V00 loc0 [+1] [000023] ----------- │ └──▌ LCL_VAR ubyte V02 tmp1 [000028] ----------- └──▌ LCL_VAR ubyte V03 tmp2 ``` The pass still only has rudimentary support and is missing many basic CQ optimization optimizations. For example, it does not make use of any liveness yet and it does not have any decomposition support for assignments. Yet, it already shows good potential in user benchmarks. I have listed some follow-up improvements in #76928. This PR is adding the pass but it is disabled by default. It can be enabled by setting DOTNET_JitStressModeNames=STRESS_PHYSICAL_PROMOTION. There are two new scenarios added to jit-experimental that enables it, to be used for testing purposes.
- Loading branch information