Better default for L3 cache size on win-arm64 and lin-arm64 #64645

EgorBo · 2022-02-01T20:52:18Z

While we're trying to address the L3 cache issue (#60166) for both Win-arm64 and Linux-arm64 (osx-arm64 is fine already #64576), I think it makes sense to at least use the existing heuristic as a good default (based on logical cores count) if it's bigger than what we found (e.g. L2). This heuristic looks like this:

int predictedCacheSize = Math.Min(4096, Math.Max(256, (int)logicalCPUs * 128)) * 1024;

(it's not mine, it existed in the code for cases when we can't even get any cache info at all)

Same heuristic but visualized:

I think it's better than the current default and it also won't hurt small CPUs or won't report something gigantic for some 80cores CPU etc.

Gen0 size is currently calculated like this: ((L3size * 3) * 5 / 8) where 5/8 is a general heuristic and * 3 is some arm64 specific, see

runtime/src/coreclr/vm/gcenv.os.cpp

Lines 649 to 652 in 51056be

    
           #if defined(TARGET_ARM64) 
        
               // Bigger gen0 size helps arm64 targets 
        
               maxSize = maxTrueSize * 3; 
        
           #endif

Also, here is a graph for Gen0 size -- RPS for Plaintext-MVC benchmark (the most GC-bound we have in PerfLab currently):

The red line is where we're now: 256kb L3 -> 480Kb Gen0 -> 380k RPS.
The green line is what we'll have with this heuristic: 1.5Mb L3 -> ~2.8Mb Gen0 -> 920k RPS

For this specific benchmark the best RPS (1086k RPS) corresponds to ~16Mb Gen0 (L3 cache function should report something between 12mb and 16mb)

database-fortunes benchmark:

I also ran a couple of simple micro-benchmarks locally on a workstation GC and it seems like performance gets steady after gen0 at least 4Mb

@Maoni0 @mangod9 @jkotas

ghost · 2022-02-01T20:52:25Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

While we're trying to address the L3 cache issue for both Win-arm64 and Linux-arm64 (osx-arm64 is fine already #64576), I think it makes sense to at least use the existing heuristic as a good default (based on logical cores count) if it's bigger than what we found (e.g. L2). This heuristic looks like this:

int predictedCacheSize = Math.Min(1536, Math.Max(256, (int)logicalCPUs * 128)) * 1024;

(it's not mine, it existed in the code for cases when we can't even get any cache info at all)

Same heuristic but visualized:

It doesn't predict 32Mb for our 30core eMAG, but it's better than nothing and it also won't hurt small CPUs or won't report something gigantic for some 80cores CPU etc.

Gen0 size is currently calculated like this: ((L3size * 3) * 5 / 8) where 5/8 is a general heuristic and * 3 is some arm64 specific, see

runtime/src/coreclr/vm/gcenv.os.cpp

Lines 649 to 652 in 51056be

    
           #if defined(TARGET_ARM64) 
        
               // Bigger gen0 size helps arm64 targets 
        
               maxSize = maxTrueSize * 3; 
        
           #endif

Also, here is a graph for Gen0 size -- RPS for Plaintext-MVC benchmark (the most GC-bound we have in PerfLab currently):

The red line is where we're now: 256kb L3 -> 480Kb Gen0 -> 380k RPS.
The green line is what we'll have with this heuristic: 1.5Mb L3 -> ~2.8Mb Gen0 -> 920k RPS

For this specific benchmark the best RPS (1086k RPS) corresponds to ~16Mb Gen0 (L3 cache function should report something between 12mb and 16mb)

I also ran a couple of simple micro-benchmarks locally on a workstation GC and it seems like performance gets steady after gen0 at least 4Mb

@Maoni0 @mangod9 @jkotas

Author:	EgorBo
Assignees:	EgorBo
Labels:	`area-GC-coreclr`
Milestone:	-

mangod9 · 2022-02-01T21:57:56Z

This looks good for now, but based on your measurements should we default to at-least 4mb? @Maoni0 ?

Maoni0 · 2022-02-01T22:51:46Z

on a 64 proc machine

logicalCPUs * std::min(1536, std::max(256, (int)logicalCPUs * 128)) * 1024;

would return 96mb..that's way too large. why are we doing the logicalCPUs * part? this is supposed to be per CPU.

EgorBo · 2022-02-01T22:56:14Z

Added database-fortunes aspnet benchmark. Will run more.

why are we doing the logicalCPUs * part? this is supposed to be per CPU.

I think it's just a heuristic like "if a cpu has more than 8 cores it's most likely something powerful"

logicalCPUs * std::min(1536, std::max(256, (int)logicalCPUs * 128)) * 1024;

Oops, in my formula in the issue I forgot about the leading logicalCPUs * and it actually reported something meaningful 😄 E.g. it never goes bigger than 1.5Mb for >= 12 cores. It actually makes sense(?)

…3-cache-default

EgorBo · 2022-02-02T01:04:25Z

@Maoni0 I've just changed the formula, now it's

Max gen0 size is 7.5Mb (for systems with > 30 cores) , let me know if you want a smaller value

EgorBo · 2022-02-02T01:16:33Z

I am running all the aspnet benchamarks we have in perflab via crank currently, so far the best results (or rather optimal) are when Gen0 is between 6Mb and 16Mb

tannergooding · 2022-02-02T02:07:38Z

I'm not familiar with the logic here, so sorry for the question in advance/explanation of thoughts in advance if this has already been considered/etc...

Why are we predicting the L3 size rather than just getting it from the OS (such as GetLogicalProcessorInformationEx) or from the CPUID info (x86/x64)?

There is a range of hardware and configurations here and as the core counts and layouts increase there is a lot more interesting details than just "how much L3 exists". So it seems we are potentially missing loads of potentially important information by not pulling the relevant info from the OS/hardware.

For example, lets consider just:

Ryzen 3950X - 4 x 16 MB of L3 cache with 16-way associativity
Ryzen 5950X - 2 x 32 MB of L3 cache with 16-way associativity

The Ryzen CPUs are comprised of CCX modules where each CCX has its own share of the cores, L1/L2/L3 cache, etc. The CCXs are technically distinct units and communicate with each other over the Infinity Fabric. While communicating over the Infinity Fabric is possible and fast, its also slower than accessing resources on the same CCX. Likewise, while two separate cores on the same CCX can communicate, it is slower than accessing the resources that are directly meant for that core. And finally, hyperthreading exists by basically splitting the resources of a single core in half with each thread getting roughly half of the resources available to it and so this is something that can be important to consider as well.

So while both of these CPUs provide 64MB of L3 cache and both have 16-cores/32-threads, the performance and considerations for the L3 cache here are quite a bit different. In both setups, each core has roughly 4MB L3 to itself and each thread, roughly 2MB L3. However, on the 3950X each core has access to an additional 12MB L3 at a "medium, speed" and the other 48MB of L3 over Infinity Fabric at an additional cost ("slow speed"). The 5950X on the other hand has access to 28MB of L3 in the same CCX at a "medium speed" and the other 32MB over Infinity Fabric at an additional cost ("slow speed").

There have been several articles and deep dives on the Ryzen architecture including https://www.anandtech.com/show/16214/amd-zen-3-ryzen-deep-dive-review-5950x-5900x-5800x-and-5700x-tested/5 which shows some of the profiled Core-To-Core latencies and how same core access is extremely fast (6ns), accessing other cores on the same CCX was about 2-3x slower (~17ns) and accessing over infinity fabric was up to 4-5x slower than that (~80ns).

There are also some other in depth technical videos from AMD here: https://www.youtube.com/watch?v=5sAcXhad16k&t=147s
and technical documentation, including the architecture developer manuals here: https://developer.amd.com/resources/developer-guides-manuals/

There are similar considerations in the Intel Alder Lake technology with their power/efficient split and in other upcoming CPUs like the Zen "3D" which will have up to 192MB L3 accessible.

With the introduction of the power/efficient split and CPUs with many cores/threads there are also a lot of considerations that come into play around thread scheduling and things that I think it would be good if we were considering and designing around.

https://www.intel.com/content/www/us/en/developer/articles/guide/alder-lake-developer-guide.html goes a bit in depth on some of the considerations. This specific article is somewhat game focused but many of the rules/guidelines are reiterated in the Intel and AMD optimization guides and are more generally applicable.

It calls out a lot of things that I don't believe we are accounting for today; like how caches are split/accessible by resources (called out above) or how hyper-threads share resources and so scheduling threads to the main thread of each core before scheduling to the secondary threads is important (some of which is expected to be handled by the OS, but which advanced usage scenarios may also take advantage of or provide additional hints around).

EgorBo · 2022-02-02T02:26:26Z

I'm not familiar with the logic here

I agree that just L3 size is rather a questionable metric without additional context like how many cores share it, etc. but the current problem that for Windows-ARM64 and Linux-ARM64 there is no way (we're aware of) to get any information about L3 at all, e.g. on Windows GetLogicalProcessorInformation simply only reports L1-L2 (Windows team is helping us atm) and same on Linux. On macOS we have everything we need from sysctl - L3 size, how many performance cores share it, etc...

tannergooding · 2022-02-02T03:05:18Z

I agree that just L3 size is rather a questionable metric without additional context like how many cores share it, etc. but the current problem that for Windows-ARM64 and Linux-ARM64 there is no way (we're aware of) to get any information about L3 at all, e.g. on Windows GetLogicalProcessorInformation simply only reports L1-L2 (Windows team is helping us atm) and same on Linux. On macOS we have everything we need from sysctl - L3 size, how many performance cores share it, etc...

Sorry, are you saying that GetLogicalProcessorInformation (and the Ex variant) report L1/L2/L3 on x86/x64 but not for Arm64 so this is an Arm64 only issue (that is we do use the relevant OS APIs on x86/x64 and for Arm64 on OSX; so this as a workaround just for Arm64 on Windows/Linux)?

I'm notably not seeing the same on any of my 3 ARM64 devices (Surface Pro X, Samsung GalaxyBook2, or the Qualcomm ECS Liva dev box). A simple C++ app using GetLogicalProcessorInformationEx reports the L1, L2, and L3 caches as far as I can see.

src/coreclr/gc/windows/gcenv.windows.cpp

EgorBo · 2022-02-02T10:27:04Z

rry, are you saying that GetLogicalProcessorInformation (and the Ex variant) report L1/L2/L3 on x86/x64 but not for Arm64 so this is an Arm64 only issue (that is we do use the relevant OS APIs on x86/x64 and for Arm64 on OSX; so this as a workaround just for Arm64 on Windows/Linux)?

I'm notably not seeing the same on any of my 3 ARM64 devices (Surface Pro X, Samsung GalaxyBook2, or the Qualcomm ECS Liva dev box). A simple C++ app using GetLogicalProcessorInformationEx reports the L1, L2, and L3 caches as far as I can see.

Exactly, it doesn't report L3 on our Windows11-arm64 machines with lots-of-cores-hardware, Windows team is aware.
Also, there is no reliable way to get it on Linux at all and we also have some partners helping us here. Meanwhile, we need something better than 256Kb on a machine with 30 cores as a reported Last-level cache => Gen0 size.

So it's a reasonable workaround till we find a 100% reliable way to get the cache or switch to some other method to calculate Gen0 size.

EgorBo · 2022-02-02T11:02:27Z

More aspnet/TechEmpower benchmarks from PerfLab:
RPS vs Gen0 and P90 vs Gen0. P90 is not a super reliable metric, for consistent results each run should be way longer but we still can see some patterns.

Vertical Axis - RPS or P90 (ms)
Horizontal Axis - Gen0 size, Mb. If you want to project source L3 size from Gen0 use L3 = (Gen0 * 8) / 15 formula

So far, the optimal results are between 6Mb an 16Mb for Gen0, 7.5Mb as this PR proposes sounds like a good default.
Also, as I noted here, Max Gen0 result from the heuristic is 7.5Mb so no huge values anymore

@Maoni0 does it look good now? While we're looking for a better solution.
CI failures aren't related.

Plaintext-MVC baseline vs this PR (tested binaries):

| load                   |        base |          PR |          |
| ---------------------- | ----------- | ----------- | -------- |
| CPU Usage (%)          |           4 |           7 |  +75.00% |
| Cores usage (%)        |         118 |         197 |  +66.95% |
| Working Set (MB)       |          37 |          37 |    0.00% |
| Private Memory (MB)    |         358 |         358 |    0.00% |
| Start Time (ms)        |           0 |           0 |          |
| First Request (ms)     |         302 |         299 |   -0.99% |
| Requests/sec           |     448,574 |   1,008,072 | +124.73% |
| Requests               |   6,766,783 |  15,202,961 | +124.67% |
| Mean latency (ms)      |       10.61 |        3.10 |  -70.78% |
| Max latency (ms)       |      287.39 |       97.52 |  -66.07% |
| Bad responses          |           0 |           0 |          |
| Socket errors          |           0 |           0 |          |
| Read throughput (MB/s) |       56.47 |      126.90 | +124.72% |
| Latency 50th (ms)      |        5.76 |        2.44 |  -57.64% |
| Latency 75th (ms)      |       11.31 |        3.60 |  -68.17% |
| Latency 90th (ms)      |       23.21 |        9.80 |  -57.78% |
| Latency 99th (ms)      |        0.00 |        0.00 |          |

src/coreclr/gc/windows/gcenv.windows.cpp

…3-cache-default

src/coreclr/gc/unix/gcenv.unix.cpp

AntonLapounov · 2022-02-03T19:30:17Z

I am running all the aspnet benchamarks we have in perflab via crank currently, so far the best results (or rather optimal) are when Gen0 is between 6Mb and 16Mb

What is the actual L3 size on those machines? If it is 32 MiB, then our 5/8 factor may be inadequate even if we remove 3x scaling. I am afraid you are optimizing for very specific hardware. To change the formula, we need to run tests on more than one type of hardware.

EgorBo · 2022-02-03T19:41:38Z

I am running all the aspnet benchamarks we have in perflab via crank currently, so far the best results (or rather optimal) are when Gen0 is between 6Mb and 16Mb

What is the actual L3 size on those machines? If it is 32 MiB, then our 5/8 factor may be inadequate even if we remove 3x scaling. I am afraid you are optimizing for very specific hardware. To change the formula, we need to run tests on more than one type of hardware.

The machines have 32Mb of L3, the heuristic reports 4Mb which results in 7.5Mb for Gen0 (max possible size for this heuristic). For these benchmarks on this CPU it produces the best "RPS/working set size" ratio. Can be decreased down to 2Mb L3 (3.75Mb Gen0) without losing much benefits (~10%) if 7.5Mb is too much.

This PR is not a scientific paper, it just tries to use a reasonable default which is much better than what we have now - 256Kb (480Kb gen0). It noticeably improves all GC-intensive benchmarks, even for desktop scenarios. I propose we merge it so we can have a better ground for upcoming Preview2, the L3 cache issue was found ~3 month ago.

EgorBo · 2022-02-03T20:10:58Z

This PR increases working set from ~170Mb to ~370Mb while the same benchmark reports 440Mb on our Xeon. So values after 8Mb gen0 dramatically increase working set without much benefits (e.g. Gen0=28mb == 1Gb of working set)

AntonLapounov · 2022-02-03T20:30:00Z

How do we know this formula change does not affect negatively other types of hardware? For instance, in case of 8 cores the reported L3 size is changed by this PR from 8 MiB to just 1 MiB, which is a significant reduction. Your finding that the optimal Gen0 size is between 1/5 and 1/2 of the L3 size (instead of the currently used 3*5/8 factor) for this particular hardware is quite interesting; however, I think we should also test some other types of hardware before changing the general formula.

EgorBo · 2022-02-03T20:34:14Z

How do we know this formula change does not affect negatively other types of hardware? For instance, in case of 8 cores the reported L3 size is changed by this PR from 8 MiB to just 1 MiB, which is a significant reduction. Your finding that the optimal Gen0 size is between 1/5 and 1/2 of the L3 size (instead of the currently used 3*5/8 factor) for this particular hardware is quite interesting; however, I think we should also test some other types of hardware before changing the general formula.

I think we never use that formula currently at all and just rely on whatever comes from API which is mostly something small (100% small on Linux-arm64).
We only use the formula on Linux in case if /sys/devices/system/cpu/cpu0/cache/index0/size didn't even report L2 (never happened to me on any linux-arm64 machine I tested)

tannergooding · 2022-02-03T20:55:41Z

We only use the formula on Linux in case if /sys/devices/system/cpu/cpu0/cache/index0/size didn't even report L2

Where is the logic for /sys/devices/* that handles knowing the type/level of cache (and number of indices) so we know what values we're actually using?

I only see logic that queries cpu0 and ignores any differences between cores (which is particularly bad for non-homogenous systems, which many ARM64 CPUs and which the new Alder Lake CPUs are). It also looks to just hardcode itself to index0 through index4 without checking how many cache indices are present.

Notably we also only appear to use this as a "fallback" path and prefer the sysconf data instead, which reports a "lot" less information and means we can't tune ourselves as well

I'd expect the logic to actually do things like:

See how many cpu* entries are under /sys/devices/system/cpu/
For each cpu*, see how many cache indices are under /sys/devices/system/cpu/cpu*/cache/
For each index*, see what the level is and query other relevant information (/sys/devices/system/cpu/cpu*/cache/index*/level)
- There is a lot of information/data points here and it covers things like associativity, size, type, what cores have access to it, etc (all the same stuff GetLogicalProcessorInformationEx reports when it is working as intended)

For reference, this is all documented here: https://github.com/torvalds/linux/blob/master/Documentation/ABI/testing/sysfs-devices-system-cpu

EgorBo · 2022-02-03T21:02:51Z

The problem that /sys/devices just doesn't say anything about L3 at all on all Linux-arm64 versions we tested making it useless. Maybe we can rely on some of its bit for the heuristic, but the heuristic itself is supposed to be a last-chance fallback.

tannergooding · 2022-02-03T21:12:15Z

The problem that /sys/devices just doesn't say anything about L3 at all on all Linux-arm64 versions we tested making it useless. Maybe we can rely on some of its bit for the heuristic, but the heuristic itself is supposed to be a last-chance fallback.

And we can't rely on it not being reported as the heuristic that says "do the fallback"?

This is another case where on my own boxes (both WSL and directly running Linux natively -- Ubuntu 20.04.03 LTS), I am seeing the numbers accurately reported for ARM64.

tannergooding · 2022-02-03T21:14:45Z

Also noting that there are some chips, such as the Raspberry PI, which may have no L3 and assuming it has one may also be incorrect/unoptimal.

EgorBo · 2022-02-03T21:15:48Z

(both WSL and directly running Linux natively -- Ubuntu 20.04.03 LTS), I am seeing the numbers accurately reported for ARM64.

Interesting, what kind of hardware you use for it?

Also, if it reports L3 correctly then the heuristic won't be used, It is highly unlikely its value will be bigger than the real one (maybe +/- 0.5mb).

EgorBo · 2022-02-03T21:18:12Z

Also noting that there are some chips, such as the Raspberry PI, which may have no L3 and assuming it has one may also be incorrect/unoptimal.

Let's not keep the current value in sake of Raspberry PI ;-)
Also, for 4-cores Raspberry we report 512Kb cache which definitely won't hurt it. You also unlikely to use server mode gc for it or use in gc intensive workloads.

tannergooding · 2022-02-03T21:25:57Z

Interesting, what kind of hardware you use for it?

Raspberry PI - (Booting Ubuntu) Reports no L3, because it doesn't have an L3
Surface Pro X - (Only tried WSL) Reports 4MB L3
GalaxyBook 2 - (WSL and Dual Booting Ubuntu) Reports 2MB L3
LIVA QC710 - (Only tried WSL) Reports 1MB L3

EgorBo · 2022-02-03T21:32:19Z

LIVA QC710

Thanks for data, I assume all of them (except Pi) use popular Qualcomm chips where cache is reported via a special register accessible by kernel, I even have a snippet somewhere with raw arm asm. While we mostly care about custom server/cloud hardware in this issue. The heuristic won't hurt any of the devices you listed - if L3 is reported correctly than it will be bigger than what the heuristic predicts.

AntonLapounov · 2022-02-03T22:05:54Z

@EgorBo Have you been testing server GC only? I am wondering whether the optimal range for workstation GC might be different.

EgorBo · 2022-02-04T00:12:15Z

@EgorBo Have you been testing server GC only? I am wondering whether the optimal range for workstation GC might be different.

not these, will schedule a run, but e.g. these #64576 were workstation ones.

Another interesting metrics is "RPS divided by working set size" vs "Gen0 size"

…3-cache-default

mangod9 · 2022-04-25T15:14:28Z

@EgorBo does this need more thought or is it ready to merge?

jkotas · 2022-04-25T15:26:18Z

src/coreclr/gc/unix/gcenv.unix.cpp

-
-        cacheSize = logicalCPUs * std::min(1536, std::max(256, (int)logicalCPUs * 128)) * 1024;
-    }
+    // It is currently expected to be missing cache size info


A lot of information in this comment is no longer relevant. Could you please update this comment to only include what is still relevant?

EgorBo · 2022-06-28T18:39:28Z

Closing since, apparently, was overtaken by #71029

Use better default for LLC size on Linux-arm64 and Windows-arm64

2b574fb

ghost assigned EgorBo Feb 1, 2022

dotnet-issue-labeler bot added the area-GC-coreclr label Feb 1, 2022

Update gcenv.unix.cpp

8b0eece

mangod9 approved these changes Feb 1, 2022

View reviewed changes

EgorBo added 2 commits February 2, 2022 04:00

Address feedback

7d54099

Merge branch 'main' of https://github.com/dotnet/runtime into arm64-L…

243539a

…3-cache-default

Fix build on win-arm64

df1976b

tannergooding reviewed Feb 2, 2022

View reviewed changes

src/coreclr/gc/windows/gcenv.windows.cpp Show resolved Hide resolved

AntonLapounov reviewed Feb 3, 2022

View reviewed changes

src/coreclr/gc/windows/gcenv.windows.cpp Outdated Show resolved Hide resolved

EgorBo added 2 commits February 3, 2022 18:49

Fix comments

21263eb

Merge branch 'main' of https://github.com/dotnet/runtime into arm64-L…

8426ba4

…3-cache-default

AntonLapounov reviewed Feb 3, 2022

View reviewed changes

src/coreclr/gc/unix/gcenv.unix.cpp Outdated Show resolved Hide resolved

EgorBo added 2 commits February 7, 2022 22:15

Merge branch 'main' of https://github.com/dotnet/runtime into arm64-L…

bbee7b0

…3-cache-default

Fix comments

37b386d

EgorBo mentioned this pull request Feb 24, 2022

[Question] GC pause time is too large for 60 fps #65850

Open

jkotas reviewed Apr 25, 2022

View reviewed changes

EgorBo closed this Jun 28, 2022

ghost locked as resolved and limited conversation to collaborators Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better default for L3 cache size on win-arm64 and lin-arm64 #64645

Better default for L3 cache size on win-arm64 and lin-arm64 #64645

EgorBo commented Feb 1, 2022 •

edited

Loading

ghost commented Feb 1, 2022

mangod9 commented Feb 1, 2022

Maoni0 commented Feb 1, 2022

EgorBo commented Feb 1, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

tannergooding commented Feb 2, 2022

EgorBo commented Feb 2, 2022 •

edited

Loading

tannergooding commented Feb 2, 2022

EgorBo commented Feb 2, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

AntonLapounov commented Feb 3, 2022

EgorBo commented Feb 3, 2022 •

edited

Loading

EgorBo commented Feb 3, 2022 •

edited

Loading

AntonLapounov commented Feb 3, 2022

EgorBo commented Feb 3, 2022

tannergooding commented Feb 3, 2022 •

edited

Loading

EgorBo commented Feb 3, 2022 •

edited

Loading

tannergooding commented Feb 3, 2022

tannergooding commented Feb 3, 2022

EgorBo commented Feb 3, 2022

EgorBo commented Feb 3, 2022 •

edited

Loading

tannergooding commented Feb 3, 2022

EgorBo commented Feb 3, 2022

AntonLapounov commented Feb 3, 2022

EgorBo commented Feb 4, 2022 •

edited

Loading

mangod9 commented Apr 25, 2022

jkotas Apr 25, 2022

EgorBo commented Jun 28, 2022

	#if defined(TARGET_ARM64)
	// Bigger gen0 size helps arm64 targets
	maxSize = maxTrueSize * 3;
	#endif

Better default for L3 cache size on win-arm64 and lin-arm64 #64645

Better default for L3 cache size on win-arm64 and lin-arm64 #64645

Conversation

EgorBo commented Feb 1, 2022 • edited Loading

ghost commented Feb 1, 2022

mangod9 commented Feb 1, 2022

Maoni0 commented Feb 1, 2022

EgorBo commented Feb 1, 2022 • edited Loading

EgorBo commented Feb 2, 2022 • edited Loading

EgorBo commented Feb 2, 2022 • edited Loading

tannergooding commented Feb 2, 2022

EgorBo commented Feb 2, 2022 • edited Loading

tannergooding commented Feb 2, 2022

EgorBo commented Feb 2, 2022 • edited Loading

EgorBo commented Feb 2, 2022 • edited Loading

AntonLapounov commented Feb 3, 2022

EgorBo commented Feb 3, 2022 • edited Loading

EgorBo commented Feb 3, 2022 • edited Loading

AntonLapounov commented Feb 3, 2022

EgorBo commented Feb 3, 2022

tannergooding commented Feb 3, 2022 • edited Loading

EgorBo commented Feb 3, 2022 • edited Loading

tannergooding commented Feb 3, 2022

tannergooding commented Feb 3, 2022

EgorBo commented Feb 3, 2022

EgorBo commented Feb 3, 2022 • edited Loading

tannergooding commented Feb 3, 2022

EgorBo commented Feb 3, 2022

AntonLapounov commented Feb 3, 2022

EgorBo commented Feb 4, 2022 • edited Loading

mangod9 commented Apr 25, 2022

jkotas Apr 25, 2022

Choose a reason for hiding this comment

EgorBo commented Jun 28, 2022

EgorBo commented Feb 1, 2022 •

edited

Loading

EgorBo commented Feb 1, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

EgorBo commented Feb 2, 2022 •

edited

Loading

EgorBo commented Feb 3, 2022 •

edited

Loading

EgorBo commented Feb 3, 2022 •

edited

Loading

tannergooding commented Feb 3, 2022 •

edited

Loading

EgorBo commented Feb 3, 2022 •

edited

Loading

EgorBo commented Feb 3, 2022 •

edited

Loading

EgorBo commented Feb 4, 2022 •

edited

Loading