-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are bytecode strings used? #538
Comments
Another thought: If we start having actual opargs (not caches) wider than one byte, we're going to need to decide if we want to use a fixed-endianness, or the platform's native endianness. It seems to me, based on the above graphs, that having a single format for both serialized and "live" forms makes the most sense. We can use a little-endian format since that's the common case, but even on big-endian platforms, both GCC and Clang still seem to understand what we're doing. So I don't imagine that the cost will be high, even in that case. |
So the proposal is that the compiler generates (presumably in a new pass after everything else) quickened code, i.e. with super-instructions but no specialized instructions, and with cache counters set to their initial value. And the marshalled form is the same. The "reset" operation gets rid of specialized instructions and resets caches to their initial value (i.e. counter in its initial non-zero state and all other cache fields zeroed out). Maybe the "primitive" operation should just be copy-and-reset? Then that can be used to get the raw bytes for the disassembler, it can be used to copy the bytes in marshal, and with some care about aliasing it can be used to reset the code in place (but do we ever need that?). |
Sure, but it could be part of the final instruction emission that we already have.
Interesting idea! I like it.
Yeah, when finalizing static (deep-frozen) code objects. |
One other point I glossed over that still needs ironing out: the ergonomics of pre-initialized counters on different architectures. I think the ideal solution is probably the same as the closely-related "16-bit oparg" problem discussed above: just use a fixed endianness. Another awkward thing is that we will now have these counters present in However, using a fixed-endianness could help this: whatever the first byte of an initialized counter ends up being, just add a new Since we initialize the counters to 17 ( |
So we should look at what the ADAPTIVE_COUNTER macros would look like.
I think I can live with this. |
Please leave the in-memory form as is. There is no need to break 16 bit integers into two bytes. Replace the term "bytecode string" with the term "instruction sequence", and the issue of endianness goes away. A quick example, saving and restoring the format struct instruction_IBCO {
uint8_t opcode;
uint8_t oparg;
uint16_t counter;
uint16_t cache0;
};
int read(Stream *stream, instruction_IBCO *inst)
{
RETURN_IF_ERROR(read_byte(stream, &inst->opcode));
RETURN_IF_ERROR(read_byte(stream, &inst->oparg));
inst->counter = adaptive_counter_init();
inst->cache0 = 0;
return SUCCESS;
}
int write(instruction_IBCO *inst, Stream *stream)
{
RETURN_IF_ERROR(write_byte(stream, inst->opcode));
RETURN_IF_ERROR(write_byte(stream, inst->oparg));
return SUCCESS;
} |
Okay, but it will mean that EDIT: Disregard the following
|
Well, it could be ignorant of the instruction format if we just treat the instruction stream as an array of 16-bit values. It doesn't really matter if something is a counter, opcode, 8-bit oparg, or 16-bit oparg, as long as the 16-bit values are 16-bit aligned in the stream. Marshal just walks over them, swapping bytes if needed in I like that idea (it avoids needing to hack around a fixed-endian counter). The only oddity is that the actual bytes of But I think I can live with that... it's a problem we'll already have to face with wide opargs and variable-length instructions. I think the solution is just to encourage people to use |
Yeah, interpreting cache entries as opcodes (e.g. CACHE) just isn't tenable, it's at best a debugging hack. We'd end with the opcode/oparg pair being the low/high byte instead of the first/second byte, but that can all be hidden in macros or platform-dependent struct definitions. However, note that Mark is after something else, which is actually much closer to the PR we decided to abandon over the holidays. Maybe you should just dust that one off and switch to using the new opcode_metadata.h (in particular the INSTR_FMT enums). |
Sounds good. I'm logging off for the day, but I'll read any more comments and pick this work up tomorrow. |
Can we use a native endianness and communicate it through # Python 3.12a1 3516 (Add COMAPRE_AND_BRANCH instruction)
+# Python 3.12a? 3517 (Add FOO_BAR for little-endian platforms)
+# Python 3.12a? 3518 (Add FOO_BAR for big-endian platforms) It's worth noting that we don't need to allocate specific bit flags in |
@arhadthedev I don't like having two different .PYC formats. Especially since this would mean that the marshal format itself would become platform-specific, which it never has been before. The marshal format (which does not have a magic number itself) is occasionally used in different contexts, and it would be a regression if it was no longer platform-independent. |
I didn't think about this, thank you for clarification. |
We don't need any of this. There is no endianness in the definition of a code unit, nor does there need to be in marshal. https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html |
Sure, if we go the route you indicated in your earlier comment we don't need explicit endianness (it's implicit in all that code). I have some concerns about building knowledge about the instruction set into marshal too though, see python/cpython#99555, esp. the reasons it was closed unmerged. We should revisit that decision. |
We've been talking recently about how we might want to change the format of our instructions in different ways (variable-length instructions, wider opargs, compression of serialized forms, etc.). I think it's useful to consider all of the different forms that bytecode takes throughout a typical Python process when discussing these ideas.
The lifecycle of a string of bytecode (opcodes, opargs, and caches) currently looks something like this:
The boxes in red are quickened forms, while the boxes in blue are unquickened forms. Quickening (
_PyCode_Quicken
) currently initializes adaptive counters and inserts superinstructions. Unquickening (deopt_code
) removes superinstructions, converts other instructions back to their adaptive form, and zeroes out all caches (including counters).Let's remove frozen and cached modules, for simplicity (they're basically just marshalled bytes):
Some observations:
co_code
and finalization of deepfrozen code objects. This means that superinstructions and non-zero counters would be present inco_code
, but no specialized instructions or other populated caches. If we do this, we only have one idempotent transformation that can be applied to the bytecode, and what we currently call "quickening" can be entirely encapsulated in the compiler, where it belongs (not evenmarshal
or code objects need to understand it). If so, the new graph would be roughly:At this point, there's not really any difference between static and heap code (we just need to reset static code at finalization):
bytes
object):If marshal has a way of building code without an intermediate
bytes
object, then the compiler does too:So, by changing these two relatively minor things, it seems that we can simplify our handling of the bytecode quite a bit.
The text was updated successfully, but these errors were encountered: