You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently iterating over a generator or awaiting a coroutine goes through several layers of C code, performing lots of wasteful transformations to do little more than make a jump in the bytecode.
By specializing FOR_ITER for generators, and SEND for coroutines we can remove this overhead.
However, we will either need trampolines to fix up returns, or to change the behavior of RETURN_VALUE in generators and coroutines
The FOR_ITER bytecode pushes the yielded value when __next__ returns a value, so that's simple enough. YIELD_VALUE already does that. The complication is that RETURN_VALUE pushes a value, but we actually need to POP the generator. So we need an additional two POPs after the return.
We can either change the way return works for generators, adding a new instruction GEN_RETURN, change the way FOR_ITER works, some combination of those, or insert a trampoline.
Inserting a trampoline is relatively expensive, so I'd like to do this without one.
First, we can implement GEN_RETURN which would cleanup the generator, and replace the caller's TOS with the returned value.
Then we change FOR_ITER to not pop the iterator on completion.
A for loop will now compile to:
FOR_ITER end
body
...
end:
POP_TOP
This cost one more POP_TOP per loop, but simplifies FOR_ITER a bit.
We can then specialize FOR_ITER for generators in a straightforward fashion, as no cleanup shim will be needed.
Awaiting a coroutine
SEND operates much like FOR_ITER, but the transformation is simpler, as we don't need to POP the result. await compiles exactly as before, as GEN_RETURN leaves the result on the caller's stack.
The new bytecodes
GEN_RETURN
Does the following:
Pops the TOS from the caller (will be the generator)
Pushes the result to the caller's stack
Pops and destroys the current frame
Resumes the caller at next_instr + gen_return_offset
FOR_ITER_GENERATOR
Does the following:
Deopts if iterator is not a generator
Deopts if the generator is not suspended
Sets the current frame's gen_return_offset to oparg
Pushes the generator's frame
Pushes None to the generator's stack
Resumes execution of the generator
SEND_COROUTINE
Does the following:
Deopts if awaitable is not a coroutine
Deopts if the coroutine is not suspended
Sets the current frame's gen_return_offset to oparg
Pop the value from the callers' stack
Pushes the coroutine's frame
Pushes the value to the coroutine's stack
Resumes execution of the coroutine
The text was updated successfully, but these errors were encountered:
Having GEN_RETURN pop the stack of the caller is a bit weird and doesn't play nicely with localized optimizations, or PyEval_EvalFrame(), as the latter would have a potentially surprising side-effect.
So:
FOR_ITER end
body
...
end:
POP_TOP
will become
FOR_ITER end
body
...
end:
END_FOR
Where END_FOR is equivalent to POP_TOP; POP_TOP but allows us to handle pushing NULL to the stack.
Having a special instruction also tells the bytecode compiler to leave it alone, to enable FOR_ITER to skip it, retaining the same efficiency as the old FOR_ITER.
Currently iterating over a generator or awaiting a coroutine goes through several layers of C code, performing lots of wasteful transformations to do little more than make a jump in the bytecode.
By specializing
FOR_ITER
for generators, andSEND
for coroutines we can remove this overhead.However, we will either need trampolines to fix up returns, or to change the behavior of
RETURN_VALUE
in generators and coroutinesThe following assumes that python/cpython#96319 has been merged.
Iterating over a generator
The
FOR_ITER
bytecode pushes the yielded value when__next__
returns a value, so that's simple enough.YIELD_VALUE
already does that. The complication is thatRETURN_VALUE
pushes a value, but we actually need toPOP
the generator. So we need an additional twoPOP
s after the return.We can either change the way return works for generators, adding a new instruction
GEN_RETURN
, change the wayFOR_ITER
works, some combination of those, or insert a trampoline.Inserting a trampoline is relatively expensive, so I'd like to do this without one.
First, we can implement
GEN_RETURN
which would cleanup the generator, and replace the caller's TOS with the returned value.Then we change
FOR_ITER
to not pop the iterator on completion.A for loop will now compile to:
This cost one more
POP_TOP
per loop, but simplifiesFOR_ITER
a bit.We can then specialize
FOR_ITER
for generators in a straightforward fashion, as no cleanup shim will be needed.Awaiting a coroutine
SEND
operates much likeFOR_ITER
, but the transformation is simpler, as we don't need to POP the result.await
compiles exactly as before, asGEN_RETURN
leaves the result on the caller's stack.The new bytecodes
GEN_RETURN
Does the following:
next_instr
+gen_return_offset
FOR_ITER_GENERATOR
Does the following:
gen_return_offset
tooparg
None
to the generator's stackSEND_COROUTINE
Does the following:
gen_return_offset
tooparg
The text was updated successfully, but these errors were encountered: