-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing BOLT flags #128514
Comments
There was a talk in March 2024 at the LLVM Performance Workshop; I can't find a copy of the talk online but the slides are available at https://llvm.org/devmtg/2024-03/slides/practical-use-of-bolt.pdf It includes the following suggestions:
We're currently using: Lines 2199 to 2212 in b60044b
Suggesting we should explore |
There's some commentary in #124948 (comment)
My intent is to do some benchmarking for each flag. |
Ask me everything if you need. And I am now working on BOLT, I also want to contribute to Python! |
By the way, I wonder how you collect profiles? By instrumentation or perf ? |
I strongly recommend that |
I was going to collect benchmarks with https://github.com/python/pyperformance (i.e., not instrumentation) on my Linux machine. I think I can also post branches and ask the I have a few commits ready |
Profiles are key to BOLT. You are on x86, right? I remember cdsplit is not supported on AArch64. |
I have machines with both architectures. Are you suggesting an alternative approach to measuring the effect? |
Yeah, maybe aarch64 can get more performance. |
As an update, I set up an x86-64 bare metal machine with LLVM 19 and am running benchmarks for the flags I described above. I'm not including LTO in the baseline, should I? |
Here are my initial results https://gist.github.com/zanieb/eee897b77597f93d0d3e0c2081dbc712 |
FYI, We are getting BOLTed binary through instrumentation, not perf, when we actually build. |
I belive that we don't have to, if only the difference between baseline is flag :) |
(I am adding myself as assignee to catch up) |
A second round of benchmarks with more samples comes out a little different https://gist.github.com/zanieb/8614bcb40b0db24dd678f2983146fb43 The effect depends on the workload. |
Feature or enhancement
This is a tracking issue for discussion on determining the optimal flags for BOLT to improve performance.
Tuning the flags is mentioned in #101525, but doesn't feel like a blocker for stabilization.
Linked PRs
-hugify
for BOLT #128849-split-all-cold
for BOLT #128850The text was updated successfully, but these errors were encountered: