Optimizing BOLT flags #128514

zanieb · 2025-01-05T05:28:15Z

Feature or enhancement

This is a tracking issue for discussion on determining the optimal flags for BOLT to improve performance.

Tuning the flags is mentioned in #101525, but doesn't feel like a blocker for stabilization.

Linked PRs

The text was updated successfully, but these errors were encountered:

zanieb · 2025-01-05T05:31:30Z

There was a talk in March 2024 at the LLVM Performance Workshop; I can't find a copy of the talk online but the slides are available at https://llvm.org/devmtg/2024-03/slides/practical-use-of-bolt.pdf

It includes the following suggestions:

Function splitting: -split-functions, -split-strategy=cdsplit
Function reordering: -reorder-functions=cdsort
Block reordering: -reorder-blocks=ext-tsp
Use THP pages for hot text: -hugify
PLT optimization: -plt
More aggressive ICF: -icf
Indirect Call Promotion: -indirect-call-promotion

We're currently using:

cpython/configure.ac

Lines 2199 to 2212 in b60044b

    
                -reorder-blocks=ext-tsp 
        
                -reorder-functions=cdsort 
        
                -split-functions 
        
                -icf=1 
        
                -inline-all 
        
                -split-eh 
        
                -reorder-functions-use-hot-size 
        
                -peepholes=none 
        
                -jump-tables=aggressive 
        
                -inline-ap 
        
                -indirect-call-promotion=all 
        
                -dyno-stats 
        
                -use-gnu-stack 
        
                -frame-opt=hot

Suggesting we should explore -split-strategy=cdsplit, -hugify, and -plt

zanieb · 2025-01-05T05:34:59Z

There's some commentary in #124948 (comment)

python-build-standalone recently added -hugify and -split-strategy=cdsplit (astral-sh/python-build-standalone#462), though the performance benefits were not validated.

My intent is to do some benchmarking for each flag.

liusy58 · 2025-01-07T06:56:13Z

Ask me everything if you need. And I am now working on BOLT, I also want to contribute to Python!

liusy58 · 2025-01-07T06:59:00Z

By the way, I wonder how you collect profiles? By instrumentation or perf ?

liusy58 · 2025-01-07T07:02:05Z

I strongly recommend that --split-all-cold should be added.

zanieb · 2025-01-07T23:33:29Z

I was going to collect benchmarks with https://github.com/python/pyperformance (i.e., not instrumentation) on my Linux machine.

I think I can also post branches and ask the faster-cpython team to run benchmarks https://github.com/faster-cpython/benchmarking-public

I have a few commits ready

-hugify: a2e6596
-split-strategy=cdsplit: 4afb98a
-split-all-cold: 3edb8c2

liusy58 · 2025-01-08T01:40:09Z

Profiles are key to BOLT. You are on x86, right? I remember cdsplit is not supported on AArch64.

zanieb · 2025-01-08T01:42:39Z

I have machines with both architectures.

Are you suggesting an alternative approach to measuring the effect?

liusy58 · 2025-01-08T03:01:01Z

Yeah, maybe aarch64 can get more performance.

zanieb · 2025-01-10T20:55:13Z

As an update, I set up an x86-64 bare metal machine with LLVM 19 and am running benchmarks for the flags I described above. I'm not including LTO in the baseline, should I?

zanieb · 2025-01-11T00:01:38Z

Here are my initial results https://gist.github.com/zanieb/eee897b77597f93d0d3e0c2081dbc712
Script at https://github.com/zanieb/cpython-bench/tree/34680389f94795b7b42d8c8dc784be34f0b14ce2

corona10 · 2025-01-15T02:07:40Z

By the way, I wonder how you collect profiles? By instrumentation or perf ?

FYI, We are getting BOLTed binary through instrumentation, not perf, when we actually build.

corona10 · 2025-01-15T02:11:00Z

As an update, I set up an x86-64 bare metal machine with LLVM 19 and am running benchmarks for the flags I described above. I'm not including LTO in the baseline, should I?

I belive that we don't have to, if only the difference between baseline is flag :)

corona10 · 2025-01-15T02:11:43Z

(I am adding myself as assignee to catch up)

zanieb · 2025-01-16T21:56:10Z

A second round of benchmarks with more samples comes out a little different https://gist.github.com/zanieb/8614bcb40b0db24dd678f2983146fb43

The effect depends on the workload.

zanieb added type-feature A feature request or enhancement performance Performance or resource usage build The build process and cross-build labels Jan 5, 2025

This was referenced Jan 5, 2025

Make BOLT as stable feature #101525

Open

[BOLT] [3.12] Python 3.12.7 --enable bolt option not working #124948

Open

This was referenced Jan 14, 2025

gh-128514: Enable -hugify for BOLT #128849

Draft

gh-128514: Enable -split-all-cold for BOLT #128850

Draft

corona10 assigned corona10 and zanieb Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing BOLT flags #128514

Optimizing BOLT flags #128514

zanieb commented Jan 5, 2025 •

edited by bedevere-app bot

Loading

zanieb commented Jan 5, 2025

zanieb commented Jan 5, 2025 •

edited

Loading

liusy58 commented Jan 7, 2025

liusy58 commented Jan 7, 2025

liusy58 commented Jan 7, 2025

zanieb commented Jan 7, 2025 •

edited

Loading

liusy58 commented Jan 8, 2025

zanieb commented Jan 8, 2025

liusy58 commented Jan 8, 2025

zanieb commented Jan 10, 2025

zanieb commented Jan 11, 2025

corona10 commented Jan 15, 2025

corona10 commented Jan 15, 2025

corona10 commented Jan 15, 2025

zanieb commented Jan 16, 2025

Optimizing BOLT flags #128514

Optimizing BOLT flags #128514

Comments

zanieb commented Jan 5, 2025 • edited by bedevere-app bot Loading

Feature or enhancement

Linked PRs

zanieb commented Jan 5, 2025

zanieb commented Jan 5, 2025 • edited Loading

liusy58 commented Jan 7, 2025

liusy58 commented Jan 7, 2025

liusy58 commented Jan 7, 2025

zanieb commented Jan 7, 2025 • edited Loading

liusy58 commented Jan 8, 2025

zanieb commented Jan 8, 2025

liusy58 commented Jan 8, 2025

zanieb commented Jan 10, 2025

zanieb commented Jan 11, 2025

corona10 commented Jan 15, 2025

corona10 commented Jan 15, 2025

corona10 commented Jan 15, 2025

zanieb commented Jan 16, 2025

zanieb commented Jan 5, 2025 •

edited by bedevere-app bot

Loading

zanieb commented Jan 5, 2025 •

edited

Loading

zanieb commented Jan 7, 2025 •

edited

Loading