-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental GC causes a significant slowdown for Sphinx #124567
Comments
I think I wanted to investigate it in sphinx-doc/sphinx#12181 but then I didn't have the motivation nor the time... (and we wanted to avoid keeping opened issues for too long). The root cause was the memory usage for docutils that increased a lot so this might be the reason why we are also seeing those slowdowns. |
Yes, @AA-Turner mentioned as much in #118891 (comment). This seems like the kind of issue that could cause slowdowns for other workloads as well as Sphinx, however. (But I'll defer to our performance and GC experts on if there are obvious ways to mitigate it.) |
Prompted by @ned-deily, I reran my PGO-optimized timings using |
@AlexWaygood Let's rerun after #124538 lands. |
I suggest to add time test for doctests to notice such performance degradations in the future automatically. Preferably, with high margin, so new additions to the doctests do not cause it to trigger. |
Could be cool to make the sphinx build a pyperformance benchmark (I think there might be a docutils pyperf benchmark, wonder if that saw a regression) |
@willingc I think that's unlikely to make a difference; that issue is related to capsules and probably doesn't interact with the regression we're seeing here. @hauntsaninja there is a docutils benchmark, https://github.com/python/pyperformance/tree/main/pyperformance/data-files/benchmarks/bm_docutils. Looks like it builds a bunch of RST files, though not CPython's own docs. According to Michael that benchmark didn't show a regression, so it would be interesting to figure out what's different about the CPython docs relative to that benchmark. |
There is a bit of a regression actually, at least in the memory usage as reported by faster-cpython/ideas#668 (roughly a 20% increase of memory). I don't think that's all there is to actually make Sphinx slower. |
Thanks Alex for the excellent write-up. A note for reproduction, in my local tests the slowdown could be reproduced just in the parsing step, so you can use
Interestingly I found that when building just that document, timings were broadly the same. However when building everything, 3.13.0a5 was consistently 1.6x faster than 3.13.0a6 (using the built optimised releases from https://www.python.org/downloads/ for Windows, 64 bit).
When I added the Docutils benchmark, Michael Droettboom requested that I/O variables (reading & writing files) be removed as it adds too much noise. There isn't a good way to avoid I/O in Sphinx, unlike with Docutils, unfortunatley. I would be keen to add Sphinx to the benchmark suite.
Docutils' documentation, we can't use Python's as the Python documentation uses Sphinx-only extensions. A |
Here's the timing benchmark of 3.13.0rc2 vs 3.12.0. docutils is actually slightly faster (0.8%) but that's within the noise. There is a 33% increase in memory usage however. |
This is still an issue that needs more investigation for 3.14, but for 3.13 the incremental GC has been reverted and that solves the problem. I'll edit the issue title accordingly (and remove the release-blocker label). |
Wouldn't reverting it resolve the issue for 3.13 and hence the title should be that it's "slower in 3.14 than in 3.13", instead of "than in 3.12"? |
If we're being pedantic, strictly speaking the modified title was still accurate ;) but I have modified the title again so that it doesn't reference any specific version of Python |
I finally have complete benchmarking results for this, and the results are mainly neutral or slightly better for reverting this change. The outlier is macOS, where I am seeing a 5% performance regression similar to @ned-deily. Note that these are our own builds on our own benchmarking machine (not the official python.org release). I'm not sure what to attribute the difference to -- maybe larger page sizes on macos?
|
I measured in my M1 mac the benchmark in #122580 after moving to pyperf and I cannot see any major different above the noise: ## RC3
## RC2
|
Revert the incremental GC in 3.14, since it's not clear that without further turning, the benefits outweigh the costs. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Revert the incremental GC in 3.14, since it's not clear that without further turning, the benefits outweigh the costs. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
I think I may have found a micro benchmark that shows slowdown on main vs 3.13. N = 1000000
class C:
def __init__(self, x):
self.next = x
def f():
head = C(None)
x = head
for i in range(N):
x = C(x)
head.next = x
import timeit
import sys
print(sys.version_info[:2], timeit.timeit(f, number=100)) I ran it on Mac, on debug builds of 3.13 and main (because the sphinx test uses a debug build). % repeat 4 {../python3.13/python.exe ttt.py; ./python.exe ttt.py}
(3, 13) 49.409357415977865
(3, 14) 63.12042100299732
(3, 13) 50.696041971008526
(3, 14) 57.32344063199707
(3, 13) 53.82498657301767
(3, 14) 57.68169612000929
(3, 13) 50.13815074498416
(3, 14) 55.76091634799377 Average is 51 vs 58. |
This is what was causing the slowdown: 278059b#diff-a848a0ef178aa113a092e72403da0e344f37bd141bb90a7aa65015c77bfe7385L1380 With that line removed, |
Can this be closed now? #126502 is merged; the CI doctest job now runs in <4 minutes consistently (from 19min before!). Do we also need to check how the Sphinx benchmark looks with a PGO-optimized release build relative to 3.13? (The CI doctest job uses a debug build.) |
I agree, for completeness we should confirm that the slowdown has gone from release builds (it was always much more pronounced in debug builds). A |
On PGO builds on our benchmarking hardware, the Sphinx benchmark is now not significantly different than 3.13.0 and about 2% faster than main prior to #126502 being merged. |
Performance is also now unchanged for me locally relative to Python 3.13 (or maybe slightly better!) using the script I posted in #118891 (comment):
Closing as completed. Thanks everybody who helped investigate, and thanks @markshannon for fixing! 🥳 |
And closing again 🙂 |
Bug report
A significant performance regression in Sphinx caused by changes in CPython 3.13
Here is a script that does the following things:
Doc/library/typing.rst
with simply"foo"
The script
Using a PGO-optimized build with LTO enabled, the script reports that there is a significant performance regression in Sphinx's parsing and building of
library/typing.rst
betweenv3.13.0a1
and 909c6f7:v13.0a1
the script reports a Sphinx build time of between 1.27s and 1.29s (I ran the script several times)A similar regression is reported in this (much slower) variation of the script that builds the entire set of CPython's documentation rather than just
library/typing.rst
.More comprehensive variation of the script
The PGO-optimized timings for building the entire CPython documentation is as follows:
v3.13.0a1
: 45.5sThis indicates a 38% performance regression for building the entire set of CPython's documentation.
Cause of the performance regression
This performance regression was initially discovered in #118891: in our own CI, we use a fresh build of CPython in our Doctest CI workflow (since otherwise, we wouldn't be testing the tip of the
main
branch), and it was observed that the CI job was taking significantly longer on the3.13
branch than the3.12
branch. In the context of our CI, the performance regression is even worse, because of the fact that our Doctest CI workflow uses a debug build rather than a PGO-optimized build, and the regression is even more pronounced in a Debug build.Using a debug build, I used the first script posted above to bisect the performance regression to commit 1530932 (below), which seemed to cause a performance regression of around 300% in a debug build
Performance was then significantly improved by commit e28477f (below), but it's unfortunately still the case that Sphinx is far slower on Python 3.13 than on Python 3.12:
See #118891 (comment) for more details on the bisection results.
Profiling by @nascheme in #118891 (comment) and #118891 (comment) also confirms that Sphinx spends a significant amount of time in the GC, so it seems very likely that the changes to introduce an incremental GC in Python 3.13 is the cause of this performance regression.
Cc. @markshannon for expertise on the new incremental GC, and cc. @hugovk / @AA-Turner for Sphinx expertise.
CPython versions tested on:
3.12, 3.13, CPython main branch
Operating systems tested on:
macOS
Linked PRs
The text was updated successfully, but these errors were encountered: