Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

vladima · 2021-03-19T20:57:58Z

BPO	43563
Nosy	@gvanrossum, @rhettinger, @markshannon, @vladima, @isidentical
PRs	bpo-43563 : Introduce dedicated opcodes for super calls #24936

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-03-19.20:57:57.938>
labels = ['interpreter-core', '3.10']
title = 'Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver'
updated_at = <Date 2021-04-27.18:23:03.781>
user = 'https://github.com/vladima'

bugs.python.org fields:

activity = <Date 2021-04-27.18:23:03.781>
actor = 'v2m'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2021-03-19.20:57:57.938>
creator = 'v2m'
dependencies = []
files = []
hgrepos = []
issue_num = 43563
keywords = ['patch']
message_count = 7.0
messages = ['389114', '389156', '389168', '389337', '389340', '389344', '392116']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'rhettinger', 'Mark.Shannon', 'v2m', 'BTaskaya']
pr_nums = ['24936']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue43563'
versions = ['Python 3.10']

Linked PRs

vladima · 2021-03-19T20:57:58Z

Calling methods and lookup up attributes when receiver is super() has extra cost comparing to regular attribute lookup. It mainly comes from the need to allocate and initialize the instance of the super which for zero argument case also include peeking into frame/code object for the __class__ cell and first argument. In addition because PySuper_Type has custom implementation of tp_getattro - _PyObject_GetMethod would always return bound method.

import timeit

setup = """
class A:
    def f(self): pass
class B(A):
    def f(self): super().f()
    def g(self): A.f(self)
b = B()
"""
print(timeit.timeit("b.f()", setup=setup, number=20000000))
print(timeit.timeit("b.g()", setup=setup, number=20000000))

7.329449548968114
3.892987059080042

One option to improve it could be to make compiler/interpreter aware of super calls so they can be treated specially. Attached patch introduces two new opcodes LOAD_METHOD_SUPER and LOAD_ATTR_SUPER that are intended to be counterparts for LOAD_METHOD and LOAD_ATTR for cases when receiver is super with either zero or two arguments.

Immediate argument for both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER is a pair that consist of:
0: index of method/attribute in co_names
1: Py_True if super was originally called with 0 arguments and Py_False otherwise.

Both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER expect 3 elements on the stack:
TOS3: global_super
TOS2: type
TOS1: self/cls

Result of LOAD_METHOD_SUPER is the same as LOAD_METHOD.
Result of LOAD_ATTR_SUPER is the same as LOAD_ATTR

In runtime both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER will check if global_super is PySuper_Type to handle situations when super is patched. If global_super is PySuper_Type then it can use dedicated routine to perform the lookup for provided __class__ and cls/self without allocating new super instance. If global_super is different from PySuper_Type then runtime will fallback to the original logic using global_super and original number of arguments that was captured in immediate.

Benchmark results with patch:
4.381768501014449
3.9492998640052974

markshannon · 2021-03-20T12:21:53Z

Why?

Do you have any evidence that the overhead of super() is significant in real programs, or that the proposed change actually speeds up anything beyond your micro-benchmark?

rhettinger · 2021-03-20T16:06:02Z

Currently, super() is decoupled from the core language. It is just a builtin that provides customized attribute lookup. This PR makes super() more tightly integrated with the core language, treating it as if it were a keyword and part of the grammar. Also note, users can currently create their own versions of super(), shadowing the builtin super().

vladima · 2021-03-22T19:26:00Z

Currently, super() is decoupled from the core language. It is just a builtin that provides customized attribute lookup. This PR makes super() more tightly integrated with the core language, treating it as if it were a keyword and part of the grammar. Also note, users can currently create their own versions of super(), shadowing the builtin super().

This is true however:

this patch does not block people from introducing custom version of super so this scenario still work. The idea was to streamline the common case
based on digging into Instagram codebase and its transitive dependencies (which is reasonably large amount of code) all spots where super() appear in sources assume super to be builtin and for a pretty common use-case its cost is noticeable in profiler.
zero-argument super() still a bit magical since it requires compiler support to create cell for __class__ and assumes certain shape of the frame object so this patch is a step forward with a better compiler support and removing runtime dependency on the frame

Do you have any evidence that the overhead of super() is significant in real programs

I do see the non-negligible cost of allocation/initialization of super object in IG profiling data.

gvanrossum · 2021-03-22T20:49:58Z

This looks like a sensible idea to me. The safeguards to ensure that customized 'super' still works seem reasonable to me. I have to admit that I sometimes refrain from using super() where I should because of the expense, so this would be welcome.

I do wonder -- is two-arg super() important enough to support it at all? Maybe the code would be somewhat simpler if the special opcodes were only generated for zero-arg super() calls.

markshannon · 2021-03-22T21:27:59Z

Numbers please.

What is "non-negligible cost of allocation/initialization" mean as a fraction of runtime?
What sort of speed up are you seeing on whole programs?

vladima · 2021-04-27T18:23:04Z

Apologies for the delay in reply: in more concrete numbers for IG codebase enabling this optimization resulted in 0.2% CPU win.

This speeds up `super()` (by around 85%, for a simple one-level `super().meth()` microbenchmark) by avoiding allocation of a new single-use `super()` object on each use.

* main: pythongh-87729: add LOAD_SUPER_ATTR instruction for faster super() (python#103497) pythongh-103791: Make contextlib.suppress also act on exceptions within an ExceptionGroup (python#103792)

* main: pythongh-104057: Fix direct invocation of test_support (pythonGH-104069) pythongh-87729: improve hit rate of LOAD_SUPER_ATTR specialization (python#104270) pythongh-101819: Fix inverted debug preprocessor check in winconsoleio.c (python#104388)

(cherry picked from commit 7fbac51) Co-authored-by: Carl Meyer <carl@oddbird.net>

…105143)

vladima mannequin added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 19, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

carljm added a commit to carljm/cpython that referenced this issue Apr 13, 2023

pythongh-87729: add instruction for faster zero-arg super()

0a0ebe2

bedevere-bot mentioned this issue Apr 13, 2023

gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497

Merged

carljm mentioned this issue Apr 24, 2023

gh-87729: specialize LOAD_SUPER_ATTR_METHOD #103809

Merged

carljm closed this as completed in #103809 Apr 25, 2023

carljm added a commit that referenced this issue Apr 25, 2023

gh-87729: specialize LOAD_SUPER_ATTR_METHOD (#103809)

ef25feb

itamaro pushed a commit to itamaro/cpython that referenced this issue Apr 26, 2023

pythongh-87729: specialize LOAD_SUPER_ATTR_METHOD (python#103809)

bf717d4

Fidget-Spinner mentioned this issue Apr 27, 2023

Near zero-cost super().meth() calls via adaptive superinstructions #90722

Closed

carljm added a commit to carljm/cpython that referenced this issue May 7, 2023

pythongh-87729: improve hit rate of LOAD_SUPER_ATTR specialization

5b64c58

bedevere-bot mentioned this issue May 7, 2023

gh-87729: improve hit rate of LOAD_SUPER_ATTR specialization #104270

Merged

carljm added a commit that referenced this issue May 11, 2023

gh-87729: improve hit rate of LOAD_SUPER_ATTR specialization (#104270)

7726245

freakboy3742 mentioned this issue May 28, 2023

Incompatibility with Python 3.12.0b1 beeware/rubicon-objc#313

Closed

carljm added a commit to carljm/cpython that referenced this issue May 31, 2023

pythongh-87729: add LOAD_SUPER_ATTR to 3.12 What's New

65b96ed

bedevere-bot mentioned this issue May 31, 2023

gh-87729: add LOAD_SUPER_ATTR to 3.12 What's New #105125

Merged

carljm added a commit that referenced this issue May 31, 2023

gh-87729: add LOAD_SUPER_ATTR to 3.12 What's New (#105125)

7fbac51

bedevere-bot mentioned this issue May 31, 2023

[3.12] gh-87729: add LOAD_SUPER_ATTR to 3.12 What's New (GH-105125) #105143

Merged

miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 31, 2023

pythongh-87729: add LOAD_SUPER_ATTR to 3.12 What's New (pythonGH-105125)

b7c2029

(cherry picked from commit 7fbac51) Co-authored-by: Carl Meyer <carl@oddbird.net>

carljm pushed a commit that referenced this issue May 31, 2023

[3.12] gh-87729: add LOAD_SUPER_ATTR to 3.12 What's New (GH-105125) (#…

4f477c7

…105143)

erlend-aasland mentioned this issue Jun 6, 2023

3.12 backport gh 105236 #105358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

vladima mannequin commented Mar 19, 2021 •

edited by bedevere-bot

Loading

vladima mannequin commented Mar 19, 2021

markshannon commented Mar 20, 2021

rhettinger commented Mar 20, 2021

vladima mannequin commented Mar 22, 2021

gvanrossum commented Mar 22, 2021

markshannon commented Mar 22, 2021

vladima mannequin commented Apr 27, 2021

Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver #87729

Comments

vladima mannequin commented Mar 19, 2021 • edited by bedevere-bot Loading

Linked PRs

vladima mannequin commented Mar 19, 2021

markshannon commented Mar 20, 2021

rhettinger commented Mar 20, 2021

vladima mannequin commented Mar 22, 2021

gvanrossum commented Mar 22, 2021

markshannon commented Mar 22, 2021

vladima mannequin commented Apr 27, 2021

vladima mannequin commented Mar 19, 2021 •

edited by bedevere-bot

Loading