[WIP] gh-104523 overhaul build rules for optimized binaries #104525

indygreg · 2023-05-16T02:21:01Z

Do not merge. There are known regressions where builds are no-op when they shouldn't be.

This commit overhauls the make-based build system's rules for
building optimized binaries. Along the way it fixes a myriad of
bugs and shortcomings with the prior approach.

The old way of producing optimized binaries had various limitations:

make [all] would do work when PGO was enabled because the phony
profile-opt rule was non-empty. This prevented no-op PGO builds
from working at all. This meant workflows like make; make install
either incurred extra work or failed due to race conditions.
Same thing for BOLT, as its bolt-opt rule was also non-empty
and always ran during make [all].
BOLT could not be run multiple times without a full rebuild because
llvm-bolt can't instrument binaries that have already received
BOLT optimizations.
It was difficult to run BOLT on its own because of how various make
targets and their dependencies were structured.
I found the old way that configure and make communicated the default
targets to be confusing and hard to understand.

There are essentially 2 major changes going on in this commit:

A rework of the high-level make targets for performing a build and
how they are defined.
A rework of all the make logic related to profile-based optimization
(read: PGO and BOLT).

Build Target Rework

Before, we essentially had build_all, profile-opt, bolt-opt and
build_wasm as our 3 targets for performing a build. all would alias
to one of these, as appropriate.

And there was another definition for which simple make target to
evaluate for non-optimized builds. This was likely build_all or
all.

In the rework, we introduce 2 new high-level targets:

build-plain - Perform a build without optimizations.
build-optimized - Perform a build with optimizations.

build-plain is aliased to build_all in all configurations except
WASM, where it is build_wasm.

build-optimized by default is aliased to a target that prints an error
message when optimizations aren't enabled. If PGO or BOLT are enabled,
it is aliased to their respective target.

build-optimized is the logical successor to profile-opt.

I felt it best to delete profile-opt completely, as the new build-*
high-level targets feel more friendly to use. But if people lament its
loss, we can add a profile-opt: build-optimized to achieve almost the
same result.

Profiled-Based Optimization Rework

Most of the make logic related to profile-based optimization (read: PGO
and BOLT) has been touched in this change.

A major issue with the old way of doing things was we used phony,
always-executed make rules. This is a bad practice in make because it
undermines no-op builds.

Another issue is that the separation between the rules and what order
they ran in wasn't always clear. Both PGO and BOLT consist of the same
4 phase solution: instrument, run, analyze, and apply. However, these
steps weren't clearly expressed in the make logic. This is especially
true for BOLT, which only had 1 make rule.

Another issue with BOLT is that it was really easy to get things into
a bad state. e.g. if you applied BOLT to pythonX.Y you could not
run BOLT again unless you rebuilt pythonX.Y from source.

In the new world, we have separate profile-<tool>-<stage>-stamp
rules defining the 4 distinct instrument, run, analyze, and
apply stages for both PGO and BOLT. Each of these stages is tracked
by a stamp semaphore file so progress can be captured. This should
all be pretty straightforward.

There is some minimal complexity here to handle BOLT's optional
dependency on PGO, as BOLT either depends on build_all or
profile-pgo-apply-stamp.

As part of the refactor to BOLT we also preserve the original input
binary before BOLT is applied. This original file is restored if
BOLT runs again. This greatly simplifies repeated BOLT invocations,
as make doesn't perform needless work. However, this is all best
effort, as it is possible for some make target evaluations to still
get things in a bad state.

Other Remarks

This change effectively reverts gh-103574. The readelf based mechanism
inserted by that change was effectively working around shortcomings
in the make DAG. This change addresses those shortcomings so the
readelf integration is no longer required.

If this change perturbs any bugs, they are likely around cleaning
behavior. The cleaning rules are a bit complicated and not clearly
documented. And I'm unsure which targets CPython developers often
iterate on. It is highly possible that state cleanup of PGO and/or
BOLT files isn't as robust as it needs to be.

I explicitly deleted some calls to PGO cleanup because those calls
prevented no-op make [all] from working. It is certainly possible
something somewhere (release automation?) relied on these files being
deleted when they no longer are. We still have targets to purge profile
files and it should be trivial to add these to appropriate make rules.

What I'm trying to say is there are likely subtle workflow regressions
with this refactor. But in my mind it is 3 steps forward 1 step back.
It should be pretty straightforward to fix any regressions once people
complain.

Issue: Overhaul build system rules related to profiling #104523

Various rules were only ever invoked once and had minimal bodies. I don't see a benefit to the indirection. So this commit inlines rules to simplify the PGO logic. skip news (trivial, non-user-visible change)

This commit overhauls the make-based build system's rules for building optimized binaries. Along the way it fixes a myriad of bugs and shortcomings with the prior approach. The old way of producing optimized binaries had various limitations: * `make [all]` would do work when PGO was enabled because the phony `profile-opt` rule was non-empty. This prevented no-op PGO builds from working at all. This meant workflows like `make; make install` either incurred extra work or failed due to race conditions. * Same thing for BOLT, as its `bolt-opt` rule was also non-empty and always ran during `make [all]`. * BOLT could not be run multiple times without a full rebuild because `llvm-bolt` can't instrument binaries that have already received BOLT optimizations. * It was difficult to run BOLT on its own because of how various make targets and their dependencies were structured. * I found the old way that configure and make communicated the default targets to be confusing and hard to understand. There are essentially 2 major changes going on in this commit: 1. A rework of the high-level make targets for performing a build and how they are defined. 2. A rework of all the make logic related to profile-based optimization (read: PGO and BOLT). Build Target Rework =================== Before, we essentially had `build_all`, `profile-opt`, `bolt-opt` and `build_wasm` as our 3 targets for performing a build. `all` would alias to one of these, as appropriate. And there was another definition for which _simple_ make target to evaluate for non-optimized builds. This was likely `build_all` or `all`. In the rework, we introduce 2 new high-level targets: * `build-plain` - Perform a build without optimizations. * `build-optimized` - Perform a build with optimizations. `build-plain` is aliased to `build_all` in all configurations except WASM, where it is `build_wasm`. `build-optimized` by default is aliased to a target that prints an error message when optimizations aren't enabled. If PGO or BOLT are enabled, it is aliased to their respective target. `build-optimized` is the logical successor to `profile-opt`. I felt it best to delete `profile-opt` completely, as the new `build-*` high-level targets feel more friendly to use. But if people lament its loss, we can add a `profile-opt: build-optimized` to achieve almost the same result. Profiled-Based Optimization Rework ================================== Most of the make logic related to profile-based optimization (read: PGO and BOLT) has been touched in this change. A major issue with the old way of doing things was we used phony, always-executed make rules. This is a bad practice in make because it undermines no-op builds. Another issue is that the separation between the rules and what order they ran in wasn't always clear. Both PGO and BOLT consist of the same 4 phase solution: instrument, run, analyze, and apply. However, these steps weren't clearly expressed in the make logic. This is especially true for BOLT, which only had 1 make rule. Another issue with BOLT is that it was really easy to get things into a bad state. e.g. if you applied BOLT to `pythonX.Y` you could not run BOLT again unless you rebuilt `pythonX.Y` from source. In the new world, we have separate `profile-<tool>-<stage>-stamp` rules defining the 4 distinct `instrument`, `run`, `analyze`, and `apply` stages for both PGO and BOLT. Each of these stages is tracked by a _stamp_ semaphore file so progress can be captured. This should all be pretty straightforward. There is some minimal complexity here to handle BOLT's optional dependency on PGO, as BOLT either depends on `build_all` or `profile-pgo-apply-stamp`. As part of the refactor to BOLT we also preserve the original input binary before BOLT is applied. This original file is restored if BOLT runs again. This greatly simplifies repeated BOLT invocations, as make doesn't perform needless work. However, this is all best effort, as it is possible for some make target evaluations to still get things in a bad state. Other Remarks ============= This change effectively reverts pythongh-103574. The readelf based mechanism inserted by that change was effectively working around shortcomings in the make DAG. This change addresses those shortcomings so the readelf integration is no longer required. If this change perturbs any bugs, they are likely around cleaning behavior. The cleaning rules are a bit complicated and not clearly documented. And I'm unsure which targets CPython developers often iterate on. It is highly possible that state cleanup of PGO and/or BOLT files isn't as robust as it needs to be. I explicitly deleted some calls to PGO cleanup because those calls prevented no-op `make [all]` from working. It is certainly possible something somewhere (release automation?) relied on these files being deleted when they no longer are. We still have targets to purge profile files and it should be trivial to add these to appropriate make rules. What I'm trying to say is there are likely subtle workflow regressions with this refactor. But in my mind it is 3 steps forward 1 step back. It should be pretty straightforward to fix any regressions once people complain.

corona10 · 2023-05-16T02:22:37Z

Do not merge. There are known regressions where builds are no-op when they shouldn't be.

Well, we also need to verify performance regression test for this build(PGO+LTO/ PGO + LTO + BOLT).

corona10 · 2023-05-16T03:22:28Z

Makefile.pre.in


+profile-bolt-apply-stamp: profile-bolt-analyze-stamp
+	@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot


I think that you use an old configuration.

-split-functions=3 -> -split-functions

-peepholes=all -> -peepholes=none

Probably. I just rebased this from commits I made in January and didn't pay much attention to the merge conflicts. Will fix before I remove the draft tag from this PR.

No, I think I rebased it as intended. If you are referring to the readelf integration, I purposefully removed it. See the commit message / PR summary.

I am talking about BOLT configuration not readelf, it was changed after January

gh-101525: Disable peephole optimization process of BOLT #103187

gh-101282: Update BOLT --split-functions flag not to use deprecated u… #101557

erlend-aasland · 2023-05-16T07:56:53Z

Thanks for the thorough explanations. Would you mind putting them on the issue instead of on the PR?

indygreg added 2 commits May 15, 2023 18:48

pythongh-104523: inline minimal PGO rules

dd66e38

Various rules were only ever invoked once and had minimal bodies. I don't see a benefit to the indirection. So this commit inlines rules to simplify the PGO logic. skip news (trivial, non-user-visible change)

indygreg requested review from erlend-aasland and corona10 as code owners May 16, 2023 02:21

bedevere-bot added the awaiting review label May 16, 2023

bedevere-bot mentioned this pull request May 16, 2023

Overhaul build system rules related to profiling #104523

Open

corona10 reviewed May 16, 2023

View reviewed changes

corona10 added the DO-NOT-MERGE label May 16, 2023

indygreg marked this pull request as draft May 16, 2023 03:40

bedevere-bot removed the awaiting review label May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] gh-104523 overhaul build rules for optimized binaries #104525

[WIP] gh-104523 overhaul build rules for optimized binaries #104525

indygreg commented May 16, 2023 •

edited by bedevere-bot

Loading

corona10 commented May 16, 2023 •

edited

Loading

corona10 May 16, 2023

corona10 May 16, 2023

indygreg May 16, 2023

indygreg May 16, 2023

corona10 May 16, 2023 •

edited

Loading

corona10 May 16, 2023

erlend-aasland commented May 16, 2023


		profile-bolt-apply-stamp: profile-bolt-analyze-stamp
		@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot

[WIP] gh-104523 overhaul build rules for optimized binaries #104525

Are you sure you want to change the base?

[WIP] gh-104523 overhaul build rules for optimized binaries #104525

Conversation

indygreg commented May 16, 2023 • edited by bedevere-bot Loading

Build Target Rework

Profiled-Based Optimization Rework

Other Remarks

corona10 commented May 16, 2023 • edited Loading

corona10 May 16, 2023

Choose a reason for hiding this comment

corona10 May 16, 2023

Choose a reason for hiding this comment

indygreg May 16, 2023

Choose a reason for hiding this comment

indygreg May 16, 2023

Choose a reason for hiding this comment

corona10 May 16, 2023 • edited Loading

Choose a reason for hiding this comment

corona10 May 16, 2023

Choose a reason for hiding this comment

erlend-aasland commented May 16, 2023

indygreg commented May 16, 2023 •

edited by bedevere-bot

Loading

corona10 commented May 16, 2023 •

edited

Loading

corona10 May 16, 2023 •

edited

Loading