Incremental GC causes a significant slowdown for Sphinx #124567

AlexWaygood · 2024-09-26T00:02:10Z

Bug report

A significant performance regression in Sphinx caused by changes in CPython 3.13

Here is a script that does the following things:

Replaces the contents of all CPython documentation files except Doc/library/typing.rst with simply "foo"
Creates a virtual environment
Installs our doc dependencies into the environment (making sure that we use pure-Python versions for all doc dependencies rather than built wheels that might include C extensions)
Times how long it takes to build the docs using that environment
Restores all the modified docs files and deletes the virtual environment again

The script

import contextlib
import shutil
import subprocess
import time
import venv
from pathlib import Path

def run(args):
    try:
        subprocess.run(args, check=True, capture_output=True, text=True)
    except subprocess.CalledProcessError as e:
        print(e.stdout)
        print(e.stderr)
        raise


with contextlib.chdir("Doc"):
    try:
        for path in Path(".").iterdir():
            if path.is_dir() and not str(path).startswith("."):
                for doc_path in path.rglob("*.rst"):
                    if doc_path != Path("library/typing.rst"):
                        doc_path.write_text("foo")

        venv.create(".venv", with_pip=True)

        run([
            ".venv/bin/python",
            "-m",
            "pip",
            "install",
            "-r",
            "requirements.txt",
            "--no-binary=':all:'",
        ])

        start = time.perf_counter()

        run([
            ".venv/bin/python",
            "-m",
            "sphinx",
            "-b",
            "html",
            ".",
            "build/html",
            "library/typing.rst",
        ])

        print(time.perf_counter() - start)
        shutil.rmtree(".venv")
        shutil.rmtree("build")
    finally:
        subprocess.run(["git", "restore", "."], check=True, capture_output=True)

Using a PGO-optimized build with LTO enabled, the script reports that there is a significant performance regression in Sphinx's parsing and building of library/typing.rst between v3.13.0a1 and 909c6f7:

On v13.0a1 the script reports a Sphinx build time of between 1.27s and 1.29s (I ran the script several times)
On ede1504, a Sphinx build time of between 1.76 and 1.82s is reported by the script (a roughly 48% regression).

A similar regression is reported in this (much slower) variation of the script that builds the entire set of CPython's documentation rather than just library/typing.rst.

More comprehensive variation of the script

import contextlib
import shutil
import subprocess
import time
import venv

def run(args):
    subprocess.run(args, check=True, text=True)


with contextlib.chdir("Doc"):
    venv.create(".venv", with_pip=True)

    run([
        ".venv/bin/python",
        "-m",
        "pip",
        "install",
        "-r",
        "requirements.txt",
        "--no-binary=':all:'",
    ])

    start = time.perf_counter()

    run([
        ".venv/bin/python",
        "-m",
        "sphinx",
        "-b",
        "html",
        ".",
        "build/html",
    ])

    print(time.perf_counter() - start)
    shutil.rmtree(".venv")
    shutil.rmtree("build")

The PGO-optimized timings for building the entire CPython documentation is as follows:

v3.13.0a1: 45.5s
ede1504: 62.7s

This indicates a 38% performance regression for building the entire set of CPython's documentation.

Cause of the performance regression

This performance regression was initially discovered in #118891: in our own CI, we use a fresh build of CPython in our Doctest CI workflow (since otherwise, we wouldn't be testing the tip of the main branch), and it was observed that the CI job was taking significantly longer on the 3.13 branch than the 3.12 branch. In the context of our CI, the performance regression is even worse, because of the fact that our Doctest CI workflow uses a debug build rather than a PGO-optimized build, and the regression is even more pronounced in a Debug build.

Using a debug build, I used the first script posted above to bisect the performance regression to commit 1530932 (below), which seemed to cause a performance regression of around 300% in a debug build

15309329b65a285cb7b3071f0f08ac964b61411b is the first bad commit
commit 15309329b65a285cb7b3071f0f08ac964b61411b
Author: Mark Shannon <mark@hotpy.org>
Date:   Wed Mar 20 08:54:42 2024 +0000

    GH-108362: Incremental Cycle GC (GH-116206)

 Doc/whatsnew/3.13.rst                              |  30 +
 Include/internal/pycore_gc.h                       |  41 +-
 Include/internal/pycore_object.h                   |  18 +-
 Include/internal/pycore_runtime_init.h             |   8 +-
 Lib/test/test_gc.py                                |  72 +-
 .../2024-01-07-04-22-51.gh-issue-108362.oB9Gcf.rst |  12 +
 Modules/gcmodule.c                                 |  25 +-
 Objects/object.c                                   |  21 +
 Objects/structseq.c                                |   5 +-
 Python/gc.c                                        | 806 +++++++++++++--------
 Python/gc_free_threading.c                         |  23 +-
 Python/import.c                                    |   2 +-
 Python/optimizer.c                                 |   2 +-
 Tools/gdb/libpython.py                             |   7 +-
 14 files changed, 684 insertions(+), 388 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2024-01-07-04-22-51.gh-issue-108362.oB9Gcf.rst

Performance was then significantly improved by commit e28477f (below), but it's unfortunately still the case that Sphinx is far slower on Python 3.13 than on Python 3.12:

commit e28477f214276db941e715eebc8cdfb96c1207d9
Author: Mark Shannon <mark@hotpy.org>
Date:   Fri Mar 22 18:43:25 2024 +0000

    GH-117108: Change the size of the GC increment to about 1% of the total heap size. (GH-117120)

 Include/internal/pycore_gc.h                       |  3 +-
 Lib/test/test_gc.py                                | 35 +++++++++++++++-------
 .../2024-03-21-12-10-11.gh-issue-117108._6jIrB.rst |  3 ++
 Modules/gcmodule.c                                 |  2 +-
 Python/gc.c                                        | 30 +++++++++----------
 Python/gc_free_threading.c                         |  2 +-
 6 files changed, 47 insertions(+), 28 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2024-03-21-12-10-11.gh-issue-117108._6jIrB.rst

See #118891 (comment) for more details on the bisection results.

Profiling by @nascheme in #118891 (comment) and #118891 (comment) also confirms that Sphinx spends a significant amount of time in the GC, so it seems very likely that the changes to introduce an incremental GC in Python 3.13 is the cause of this performance regression.

Cc. @markshannon for expertise on the new incremental GC, and cc. @hugovk / @AA-Turner for Sphinx expertise.

CPython versions tested on:

3.12, 3.13, CPython main branch

Operating systems tested on:

macOS

Linked PRs

The text was updated successfully, but these errors were encountered:

picnixz · 2024-09-26T00:13:35Z

I think I wanted to investigate it in sphinx-doc/sphinx#12181 but then I didn't have the motivation nor the time... (and we wanted to avoid keeping opened issues for too long). The root cause was the memory usage for docutils that increased a lot so this might be the reason why we are also seeing those slowdowns.

AlexWaygood · 2024-09-26T00:15:34Z

I think I wanted to investigate it in sphinx-doc/sphinx#12181 but then I didn't have the motivation nor the time... (and we wanted to avoid keeping opened issues for too long).

Yes, @AA-Turner mentioned as much in #118891 (comment). This seems like the kind of issue that could cause slowdowns for other workloads as well as Sphinx, however. (But I'll defer to our performance and GC experts on if there are obvious ways to mitigate it.)

AlexWaygood · 2024-09-26T00:37:27Z

Prompted by @ned-deily, I reran my PGO-optimized timings using --with-lto=full --enable-optimizations builds rather than simply --with-lto --enable-optimizations builds. (Since I'm benchmarking using a Mac, this could have been relevant.) The timings on v3.13.0a1 and main using this build configuration were essentially identical to the ones I reported above, however.

willingc · 2024-09-26T00:46:57Z

@AlexWaygood Let's rerun after #124538 lands.

Privat33r-dev · 2024-09-26T05:56:39Z

I suggest to add time test for doctests to notice such performance degradations in the future automatically. Preferably, with high margin, so new additions to the doctests do not cause it to trigger.

hauntsaninja · 2024-09-26T06:08:45Z

Could be cool to make the sphinx build a pyperformance benchmark (I think there might be a docutils pyperf benchmark, wonder if that saw a regression)

JelleZijlstra · 2024-09-26T06:13:20Z

@willingc I think that's unlikely to make a difference; that issue is related to capsules and probably doesn't interact with the regression we're seeing here.

@hauntsaninja there is a docutils benchmark, https://github.com/python/pyperformance/tree/main/pyperformance/data-files/benchmarks/bm_docutils. Looks like it builds a bunch of RST files, though not CPython's own docs. According to Michael that benchmark didn't show a regression, so it would be interesting to figure out what's different about the CPython docs relative to that benchmark.

picnixz · 2024-09-26T10:20:35Z

According to Michael that benchmark didn't show a regression

There is a bit of a regression actually, at least in the memory usage as reported by faster-cpython/ideas#668 (roughly a 20% increase of memory). I don't think that's all there is to actually make Sphinx slower.

AA-Turner · 2024-09-26T13:16:27Z

Thanks Alex for the excellent write-up. A note for reproduction, in my local tests the slowdown could be reproduced just in the parsing step, so you can use 'dummy' instead of 'html' in the reproduction script to save time (as no HTML files are produced).

library/typing.rst

Interestingly I found that when building just that document, timings were broadly the same. However when building everything, 3.13.0a5 was consistently 1.6x faster than 3.13.0a6 (using the built optimised releases from https://www.python.org/downloads/ for Windows, 64 bit).

Could be cool to make the sphinx build a pyperformance benchmark

When I added the Docutils benchmark, Michael Droettboom requested that I/O variables (reading & writing files) be removed as it adds too much noise. There isn't a good way to avoid I/O in Sphinx, unlike with Docutils, unfortunatley. I would be keen to add Sphinx to the benchmark suite.

Looks like it builds a bunch of RST files

Docutils' documentation, we can't use Python's as the Python documentation uses Sphinx-only extensions.

A

mdboom · 2024-09-26T18:05:00Z

According to Michael that benchmark didn't show a regression, so it would be interesting to figure out what's different about the CPython docs relative to that benchmark.

Here's the timing benchmark of 3.13.0rc2 vs 3.12.0. docutils is actually slightly faster (0.8%) but that's within the noise.

There is a 33% increase in memory usage however.

Revert the incremental GC in 3.13, since it's not clear that without further turning, the benefits outweigh the costs. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

Yhg1s · 2024-10-01T02:07:11Z

This is still an issue that needs more investigation for 3.14, but for 3.13 the incremental GC has been reverted and that solves the problem. I'll edit the issue title accordingly (and remove the release-blocker label).

Privat33r-dev · 2024-10-01T08:48:33Z

This is still an issue that needs more investigation for 3.14, but for 3.13 the incremental GC has been reverted and that solves the problem. I'll edit the issue title accordingly (and remove the release-blocker label).

Wouldn't reverting it resolve the issue for 3.13 and hence the title should be that it's "slower in 3.14 than in 3.13", instead of "than in 3.12"?

AlexWaygood · 2024-10-01T09:51:25Z

Wouldn't reverting it resolve the issue for 3.13 and hence the title should be that it's "slower in 3.14 than in 3.13", instead of "than in 3.12"?

If we're being pedantic, strictly speaking the modified title was still accurate ;) but I have modified the title again so that it doesn't reference any specific version of Python

mdboom · 2024-10-01T21:30:39Z

I finally have complete benchmarking results for this, and the results are mainly neutral or slightly better for reverting this change. The outlier is macOS, where I am seeing a 5% performance regression similar to @ned-deily. Note that these are our own builds on our own benchmarking machine (not the official python.org release). I'm not sure what to attribute the difference to -- maybe larger page sizes on macos?

commit	change
darwin arm64	1.05x ↓
linux x86_64	1.00x ↑
linux aarch64	1.00x ↑
windows x86	1.01x ↑
windows x86_64	1.01x ↑

pablogsal · 2024-10-02T17:21:29Z

I measured in my M1 mac the benchmark in #122580 after moving to pyperf and I cannot see any major different above the noise:

## RC3

~/github/python/3.13 tags/v3.13.0rc3* 38s
❯ ./python.exe benchmark.py
.....................
benchmark: Mean +- std dev: 634 us +- 14 us

~/github/python/3.13 tags/v3.13.0rc3* 16s
❯ ./python.exe benchmark.py
.....................
benchmark: Mean +- std dev: 632 us +- 5 us

## RC2

~/github/python/3.13 tags/v3.13.0rc2* 17s
❯ ./python.exe benchmark.py
.....................
benchmark: Mean +- std dev: 636 us +- 16 us

~/github/python/3.13 tags/v3.13.0rc2* 16s
❯ ./python.exe benchmark.py
.....................
benchmark: Mean +- std dev: 636 us +- 18 us

) Revert the incremental GC in 3.13, since it's not clear that without further turning, the benefits outweigh the costs. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

Revert the incremental GC in 3.14, since it's not clear that without further turning, the benefits outweigh the costs. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

iritkatriel · 2024-11-13T11:52:24Z

I think I may have found a micro benchmark that shows slowdown on main vs 3.13.
I guessed that it might be due to large unreachable cycles, so:

N = 1000000

class C:
    def __init__(self, x):
        self.next = x

def f():
    head = C(None)
    x = head
    for i in range(N):
        x = C(x)
    head.next = x

import timeit
import sys
print(sys.version_info[:2], timeit.timeit(f, number=100))

I ran it on Mac, on debug builds of 3.13 and main (because the sphinx test uses a debug build).

% repeat 4 {../python3.13/python.exe ttt.py; ./python.exe ttt.py}
(3, 13) 49.409357415977865
(3, 14) 63.12042100299732
(3, 13) 50.696041971008526
(3, 14) 57.32344063199707
(3, 13) 53.82498657301767
(3, 14) 57.68169612000929
(3, 13) 50.13815074498416
(3, 14) 55.76091634799377

Average is 51 vs 58.

…ormance (GH-126777)

markshannon · 2024-11-15T18:39:57Z

This is what was causing the slowdown: 278059b#diff-a848a0ef178aa113a092e72403da0e344f37bd141bb90a7aa65015c77bfe7385L1380
It is validating the entire increment each time we increase the increment size, so is potentially quadratic.
Presumably sphinx makes it quadratic.

With that line removed, Doc/Doctest now runs in 3 minutes

AlexWaygood · 2024-11-18T14:37:21Z

Can this be closed now? #126502 is merged; the CI doctest job now runs in <4 minutes consistently (from 19min before!).

Do we also need to check how the Sphinx benchmark looks with a PGO-optimized release build relative to 3.13? (The CI doctest job uses a debug build.)

AA-Turner · 2024-11-18T14:47:46Z

Do we also need to check how the Sphinx benchmark looks with a PGO-optimized release build relative to 3.13? (The CI doctest job uses a debug build.)

I agree, for completeness we should confirm that the slowdown has gone from release builds (it was always much more pronounced in debug builds).

A

mdboom · 2024-11-18T15:29:43Z

Do we also need to check how the Sphinx benchmark looks with a PGO-optimized release build relative to 3.13?

On PGO builds on our benchmarking hardware, the Sphinx benchmark is now not significantly different than 3.13.0 and about 2% faster than main prior to #126502 being merged.

AlexWaygood · 2024-11-18T15:47:15Z

Performance is also now unchanged for me locally relative to Python 3.13 (or maybe slightly better!) using the script I posted in #118891 (comment):

main (built using --enable-optimisations --with-lto=full): 37.22s
3.13: 38.15s

Closing as completed. Thanks everybody who helped investigate, and thanks @markshannon for fixing! 🥳

AlexWaygood · 2024-11-19T09:34:08Z

Reopening since #126502 was reverted in #126983 due to buildbot failures

markshannon · 2024-11-19T14:03:11Z

And closing again 🙂

…I performance (pythonGH-126777)

…27009)

AlexWaygood added type-bug An unexpected behavior, bug, or error performance Performance or resource usage 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Sep 26, 2024

AlexWaygood mentioned this issue Sep 26, 2024

Doc tests action takes too long #118891

Closed

AlexWaygood added the release-blocker label Sep 26, 2024

ezio-melotti added this to Release and Deferred blockers 🚫 Sep 26, 2024

github-project-automation bot moved this to Todo in Release and Deferred blockers 🚫 Sep 26, 2024

markshannon self-assigned this Sep 26, 2024

markshannon mentioned this issue Sep 27, 2024

GH-124567: Reduce overhead of cycle GC. #124717

Closed

ZeroIntensity mentioned this issue Sep 29, 2024

GH-124742: Add the PYTHON_GC_THRESHOLD environment variable. #124743

Closed

bedevere-app bot mentioned this issue Sep 29, 2024

[3.13] GH-124567: Revert the Incremental GC in 3.13 #124770

Merged

Yhg1s removed release-blocker 3.13 bugs and security fixes labels Oct 1, 2024

Yhg1s changed the title ~~Incremental GC means that Sphinx is significantly slower in 3.13 than in 3.12~~ Incremental GC means that Sphinx is significantly slower in 3.14 than in 3.12 Oct 1, 2024

AlexWaygood changed the title ~~Incremental GC means that Sphinx is significantly slower in 3.14 than in 3.12~~ Incremental GC causes a significant slowdown for Sphinx Oct 1, 2024

ned-deily mentioned this issue Oct 1, 2024

Python3.13 performance Issue with python.org macOS installers on ARM Macs #122580

Closed

mdboom mentioned this issue Oct 2, 2024

Fix #359: Add a Sphinx benchmark python/pyperformance#360

Merged

hugovk mentioned this issue Nov 12, 2024

GH-126491: GC: Mark objects reachable from roots before doing cycle collection #126502

Merged

bedevere-app bot mentioned this issue Nov 13, 2024

GH-124567: Reduce overhead of debug build for GC. Should help CI performance #126777

Merged

markshannon added a commit that referenced this issue Nov 15, 2024

GH-124567: Reduce overhead of debug build for GC. Should help CI perf…

c0f045f

…ormance (GH-126777)

AlexWaygood closed this as completed Nov 18, 2024

github-project-automation bot moved this from Todo to Done in Release and Deferred blockers 🚫 Nov 18, 2024

AlexWaygood reopened this Nov 19, 2024

github-project-automation bot moved this from Done to In Progress in Release and Deferred blockers 🚫 Nov 19, 2024

bedevere-app bot mentioned this issue Nov 19, 2024

GH-124567: Replace quadratic assert with linear one #127009

Merged

markshannon added a commit that referenced this issue Nov 19, 2024

GH-124567: Replace quadratic assert with linear one (GH-127009)

a99dd23

AlexWaygood closed this as completed Nov 19, 2024

github-project-automation bot moved this from In Progress to Done in Release and Deferred blockers 🚫 Nov 19, 2024

picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024

pythonGH-124567: Reduce overhead of debug build for GC. Should help C…

abc7a52

…I performance (pythonGH-126777)

ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025

pythonGH-124567: Reduce overhead of debug build for GC. Should help C…

910d0b9

…I performance (pythonGH-126777)

ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025

pythonGH-124567: Replace quadratic assert with linear one (pythonGH-1…

ce12caa

…27009)

nascheme mentioned this issue Jan 14, 2025

gh-128807: Add marking phase for free-threaded cyclic GC #128808

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental GC causes a significant slowdown for Sphinx #124567

Incremental GC causes a significant slowdown for Sphinx #124567

AlexWaygood commented Sep 26, 2024 •

edited by bedevere-app bot

Loading

picnixz commented Sep 26, 2024 •

edited

Loading

AlexWaygood commented Sep 26, 2024

AlexWaygood commented Sep 26, 2024

willingc commented Sep 26, 2024

Privat33r-dev commented Sep 26, 2024

hauntsaninja commented Sep 26, 2024

JelleZijlstra commented Sep 26, 2024

picnixz commented Sep 26, 2024

AA-Turner commented Sep 26, 2024 •

edited

Loading

mdboom commented Sep 26, 2024

Yhg1s commented Oct 1, 2024

Privat33r-dev commented Oct 1, 2024

AlexWaygood commented Oct 1, 2024 •

edited

Loading

mdboom commented Oct 1, 2024 •

edited

Loading

pablogsal commented Oct 2, 2024 •

edited

Loading

iritkatriel commented Nov 13, 2024 •

edited by hugovk

Loading

markshannon commented Nov 15, 2024

AlexWaygood commented Nov 18, 2024

AA-Turner commented Nov 18, 2024

mdboom commented Nov 18, 2024 •

edited

Loading

AlexWaygood commented Nov 18, 2024 •

edited

Loading

AlexWaygood commented Nov 19, 2024

markshannon commented Nov 19, 2024

Incremental GC causes a significant slowdown for Sphinx #124567

Incremental GC causes a significant slowdown for Sphinx #124567

Comments

AlexWaygood commented Sep 26, 2024 • edited by bedevere-app bot Loading

Bug report

A significant performance regression in Sphinx caused by changes in CPython 3.13

Cause of the performance regression

CPython versions tested on:

Operating systems tested on:

Linked PRs

picnixz commented Sep 26, 2024 • edited Loading

AlexWaygood commented Sep 26, 2024

AlexWaygood commented Sep 26, 2024

willingc commented Sep 26, 2024

Privat33r-dev commented Sep 26, 2024

hauntsaninja commented Sep 26, 2024

JelleZijlstra commented Sep 26, 2024

picnixz commented Sep 26, 2024

AA-Turner commented Sep 26, 2024 • edited Loading

mdboom commented Sep 26, 2024

Yhg1s commented Oct 1, 2024

Privat33r-dev commented Oct 1, 2024

AlexWaygood commented Oct 1, 2024 • edited Loading

mdboom commented Oct 1, 2024 • edited Loading

pablogsal commented Oct 2, 2024 • edited Loading

iritkatriel commented Nov 13, 2024 • edited by hugovk Loading

markshannon commented Nov 15, 2024

AlexWaygood commented Nov 18, 2024

AA-Turner commented Nov 18, 2024

mdboom commented Nov 18, 2024 • edited Loading

AlexWaygood commented Nov 18, 2024 • edited Loading

AlexWaygood commented Nov 19, 2024

markshannon commented Nov 19, 2024

AlexWaygood commented Sep 26, 2024 •

edited by bedevere-app bot

Loading

picnixz commented Sep 26, 2024 •

edited

Loading

AA-Turner commented Sep 26, 2024 •

edited

Loading

AlexWaygood commented Oct 1, 2024 •

edited

Loading

mdboom commented Oct 1, 2024 •

edited

Loading

pablogsal commented Oct 2, 2024 •

edited

Loading

iritkatriel commented Nov 13, 2024 •

edited by hugovk

Loading

mdboom commented Nov 18, 2024 •

edited

Loading

AlexWaygood commented Nov 18, 2024 •

edited

Loading