-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Limit 3-opt to 1000 swaps per run #112259
Conversation
/azp run runtime-coreclr jitstress-random |
Azure Pipelines successfully started running 1 pipeline(s). |
Seems like the jitstress-random failures are unrelated, but can you dig in? I'm ok with having a limit but am still curious exactly what was happening in the example, it looked like it had found a cyclic sequence of very profitable moves, which should not be possible. Was the profile inconsistent? Were we saturating/overflowing computations? |
Sure. arm64 legs are hitting #112278 in SVE tests; Cobalt machines were just turned on in CI, so I expect we'll see more bugs come up. The remaining failure is tracked by #112281, and might be bad codegen. Neither look related to 3-opt.
Both are true in this case, but the latter is problematic (and I suppose we cannot enforce profile consistency when the latter is true). Some loop bodies in the method have weights exceeding
Those block ranges in particular have flow exceeding
FWIW, in Debug builds, 3-opt recomputes the overall layout cost after each swap, and asserts that the move improved the cost. For excessively large costs, we skip the assertion to avoid false alarms from floating-point imprecision, and for profiles of "normal" size, I've yet to see this assert fail, so I haven't seen 3-opt's greediness invariant fail yet. |
You could imagine tempering profile synthesis so that in deep loop nests we start to curtail And this compounds in a nest; if say we have But I'm not sure of the best way to do this, likely we want to keep the inner loop (not sure if in the example above we are using PGO and/or synthesis, but the long term plan is that we'll always have these). |
/ba-g blocked by test timeout |
In this case,
Your proposal to limit amplification sounds reasonable, at least from a layout perspective. Since the initial block layout keeps loop bodies compact, I don't think inner loop weights have to be amplified all that much to dissuade 3-opt from breaking them up. |
* main: (41 commits) Automated bump of chrome version (dotnet#112309) Add `GetDeclaringType` to `PropertyDefinition` and `EventDefinition`. (dotnet#111646) Update the System.ComponentModel.Annotations solution to build in VS (dotnet#112313) JIT: initial support for stack allocating arrays of GC type (dotnet#112250) [main] Update dependencies from dotnet/roslyn (dotnet#112260) Update Xcode casing (dotnet#112307) update the location of assert for REG_ZR check (dotnet#112294) Enable `SA1206`: Keyword ordering (dotnet#112303) Address feedback on dense FrozenDictionary optimization (dotnet#112298) Start regular pri-1 tests runs with native AOT (dotnet#111391) Observe exceptions from _connectionCloseTcs (dotnet#112190) Test failure - SendAsync_RequestVersion20_ResponseVersion20 (dotnet#112232) Fix init race in mono_class_try_get_[shortname]_class. (dotnet#112282) Remove repeated call to DllMain (dotnet#112285) Replace bitvector.h/cpp with ptrArgTP type in gc_unwind_x86.h/inl (dotnet#112268) JIT: Limit 3-opt to 1000 swaps per run (dotnet#112259) [main] Update dependencies from dotnet/icu, dotnet/runtime-assets (dotnet#112120) Update dependencies from https://github.com/dotnet/emsdk build 20250205.3 (dotnet#112223) Fix EventPipe on Android CoreClr. (dotnet#112270) Fix exception handling in the prestub worker (dotnet#111937) ...
Fixes #111988.