Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-90350: Optimize builtin functions min() and max() #30286

Merged
merged 6 commits into from
Dec 11, 2023

Conversation

colorfulappl
Copy link
Contributor

@colorfulappl colorfulappl commented Dec 29, 2021

Builtin functions min() and max() are labeled as METH_VARARGS | METH_KEYWORDS, which use tp_call calling convention.
After changing their label to METH_FASTCALL | METH_KEYWORDS, they can be invoked by vectorcall.

This optimization simplifies parameter passing and avoids creation of temporary tuple while parsing arguments, brings about up to 200%+ speed up on microbenchmarks.

faster-cpython/ideas#199

https://bugs.python.org/issue46192

@the-knights-who-say-ni

This comment has been minimized.

@colorfulappl colorfulappl changed the title bpo-46192: Builtin functions min() and max() now use METH_FASTCALL bpo-46192: Optimize builtin functions min() and max() Dec 29, 2021
@AlexWaygood AlexWaygood added the performance Performance or resource usage label Dec 29, 2021
@erlend-aasland
Copy link
Contributor

erlend-aasland commented Dec 29, 2021

If we're touching these, it would make sense to convert them to Argument Clinic, now that *args support is implemented (9af34c9).

cpython/Python/bltinmodule.c

Lines 1818 to 1823 in 77195cd

/* AC: cannot convert yet, waiting for *args support */
static PyObject *
builtin_min(PyObject *self, PyObject *args, PyObject *kwds)
{
return min_max(args, kwds, Py_LT);
}

cpython/Python/bltinmodule.c

Lines 1835 to 1840 in 77195cd

/* AC: cannot convert yet, waiting for *args support */
static PyObject *
builtin_max(PyObject *self, PyObject *args, PyObject *kwds)
{
return min_max(args, kwds, Py_GT);
}

@colorfulappl
Copy link
Contributor Author

I wrote an "Argument Clinic" version here:
colorfulappl@29b9559

But seems it is not as fast as current "without Argument Clinic" version:
colorfulappl@a9413ab

Result of microbench:

code snippet with AC w/o AC
max(1, 2) 1.11x faster 3.25x faster
max([1, 2]) 1.14x faster 1.41x faster
max((1, ), (2, ), key=lambda x: x[0]) 1.89x faster 1.85x faster
max([(1, ), (2, )], key=lambda x: x[0]) 1.73x faster 1.52x faster
max([], default=-1) 2.39x faster 2.61x faster
max(1, 2, 3, 4, 5) 1.02x faster 2.46x faster

On the most commonly used case, "max(a, b [, ...])", there is not noticeable speed up when we use AC, especially when multiple arguments are passed (the last case).

I noticed that in 9af34c9 , the varargs on stack are packed to tuple before passed to callee, and callee obtains each argument by access the tuple's elements.
This slows down the function invocation.

@colorfulappl
Copy link
Contributor Author

One goal of fastcall is to avoid the creation of a temporary tuple to pass positional arguments (https://bugs.python.org/issue29259).

IMHO, the process of pack/unpack arguments to a tuple is unnecessary in 9af34c9 .
Perhaps it's better to pass an Object * const * pointer which points to the first argument and a nargs integer to indicate how many positional arguments are passed when we are processing varargs.

I would have a try, then may open another issue if necessary.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Dec 30, 2021

IMHO, the process of pack/unpack arguments to a tuple is unnecessary in 9af34c9 . Perhaps it's better to pass an Object * const * pointer which points to the first argument and a nargs integer to indicate how many positional arguments are passed when we are processing varargs.

I would have a try, then may open another issue if necessary.

Yes, such a change demands a separate issue / PR. It would be nice if you could add proof-of-concept benchmark results when you present the idea in the new bpo issue. I can add the relevant core devs on the nosy list, if you want.

Keep in mind that readability and maintainability weigh very heavy when considering a PR; that also goes for optimisation changes.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Dec 30, 2021

I wrote an "Argument Clinic" version here: colorfulappl@29b9559

While that change certainly looks better, it is buggy; it does not heed the key function. With that in mind, the benchmarks you posted in #30286 (comment) are obviously wrong; the benchmarks with the key keyword are not correct.

@sweeneyde
Copy link
Member

I wonder if Argument Clinic Itself could be updated to convert *args to vectorcall/fastcall if some kind of flag appears in the AC declaration

@sweeneyde
Copy link
Member

*args support was added to AC in #18609

@isidentical any thoughts on removing the tuple creation in some cases, or with some flag?

@erlend-aasland

This comment was marked as off-topic.

@colorfulappl
Copy link
Contributor Author

colorfulappl commented Dec 31, 2021

I would have a try, then may open another issue if necessary.

I opened a new issue (gh-903701),
and made a PR (#30312).

Footnotes

  1. edit by @erlend-aasland: convert bpo link to gh link

@colorfulappl
Copy link
Contributor Author

the benchmarks you posted in #30286 (comment) are obviously wrong; the benchmarks with the key keyword are not correct.

Thanks for revising my mistake. I'v seen the new results after your amend.
Actually, when the key function is empty, the "AC version" runs faster.

MaxwellDupre

This comment was marked as outdated.

@hauntsaninja hauntsaninja changed the title bpo-46192: Optimize builtin functions min() and max() gh-90350: Optimize builtin functions min() and max() Feb 26, 2023
@eendebakpt
Copy link
Contributor

@colorfulappl Is this PR still blocked by #90370?

@serhiy-storchaka serhiy-storchaka merged commit 0066ab5 into python:main Dec 11, 2023
29 checks passed
facebook-github-bot pushed a commit to facebookincubator/cinder that referenced this pull request Jan 16, 2024
Summary:
Builtin functions min() and max() now use METH_FASTCALL

upstream issue: python/cpython#90350
upstream PR: python/cpython#30286
upstream commit: python/cpython@0066ab5

Reviewed By: carljm

Differential Revision: D52650071

fbshipit-source-id: 93971e865ab9515efc9771c58582f63d15e0342d
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
…30286)

Builtin functions min() and max() now use METH_FASTCALL
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
…30286)

Builtin functions min() and max() now use METH_FASTCALL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-argument-clinic
Projects
None yet
Development

Successfully merging this pull request may close these issues.