Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-115869: Don't JIT zeroed bytes #130023

Merged
merged 2 commits into from
Feb 13, 2025

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Feb 12, 2025

@pitrou pointed out that the JIT's stencils are bloated with zeroed bytes. Since we request fresh pages of memory for JIT code, it's guaranteed to be zeroed anyways, so we can save space in the file and operations at runtime by eliding the writes where appropriate.

Here's a before-and-after for one of our most common uops, _CHECK_VALIDITY:

void
emit__CHECK_VALIDITY(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // _CHECK_VALIDITY.o:     file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 48 8b 05 00 00 00 00          movq    (%rip), %rax            # 0x7 <_JIT_ENTRY+0x7>
    // 0000000000000003:  R_X86_64_REX_GOTPCRELX       _JIT_EXECUTOR-0x4
    // 7: f6 40 22 01                   testb   $0x1, 0x22(%rax)
    // b: 75 06                         jne     0x13 <_JIT_ENTRY+0x13>
    // d: ff 25 00 00 00 00             jmpq    *(%rip)                 # 0x13 <_JIT_ENTRY+0x13>
    // 000000000000000f:  R_X86_64_GOTPCRELX   _JIT_JUMP_TARGET-0x4
    // 13: ff 25 00 00 00 00             jmpq    *(%rip)                 # 0x19 <_JIT_ENTRY+0x19>
    // 0000000000000015:  R_X86_64_GOTPCRELX   _JIT_CONTINUE-0x4
    const unsigned char code_body[19] = {
        0x48, 0x8b, 0x05, 0x00, 0x00, 0x00, 0x00, 0xf6,
        0x40, 0x22, 0x01, 0x75, 0x06, 0xff, 0x25, 0x00,
        0x00, 0x00, 0x00,
    };
    // 0: EXECUTOR
    // 8: JUMP_TARGET
    const unsigned char data_body[16] = {
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    };
    memcpy(data, data_body, sizeof(data_body));
    patch_64(data + 0x0, (uintptr_t)executor);
    patch_64(data + 0x8, state->instruction_starts[instruction->jump_target]);
    memcpy(code, code_body, sizeof(code_body));
    patch_x86_64_32rx(code + 0x3, (uintptr_t)data + -0x4);
    patch_x86_64_32rx(code + 0xf, (uintptr_t)data + 0x4);
}
void
emit__CHECK_VALIDITY(
    unsigned char *code, unsigned char *data, _PyExecutorObject *executor,
    const _PyUOpInstruction *instruction, jit_state *state)
{
    // 
    // _CHECK_VALIDITY.o:     file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 48 8b 05 00 00 00 00          movq    (%rip), %rax            # 0x7 <_JIT_ENTRY+0x7>
    // 0000000000000003:  R_X86_64_REX_GOTPCRELX       _JIT_EXECUTOR-0x4
    // 7: f6 40 22 01                   testb   $0x1, 0x22(%rax)
    // b: 75 06                         jne     0x13 <_JIT_ENTRY+0x13>
    // d: ff 25 00 00 00 00             jmpq    *(%rip)                 # 0x13 <_JIT_ENTRY+0x13>
    // 000000000000000f:  R_X86_64_GOTPCRELX   _JIT_JUMP_TARGET-0x4
    // 13: ff 25 00 00 00 00             jmpq    *(%rip)                 # 0x19 <_JIT_ENTRY+0x19>
    // 0000000000000015:  R_X86_64_GOTPCRELX   _JIT_CONTINUE-0x4
    const unsigned char code_body[19] = {
        0x48, 0x8b, 0x05, 0x00, 0x00, 0x00, 0x00, 0xf6,
        0x40, 0x22, 0x01, 0x75, 0x06, 0xff, 0x25,
    };
    // 0: EXECUTOR
    // 8: JUMP_TARGET
    patch_64(data + 0x0, (uintptr_t)executor);
    patch_64(data + 0x8, state->instruction_starts[instruction->jump_target]);
    memcpy(code, code_body, sizeof(code_body));
    patch_x86_64_32rx(code + 0x3, (uintptr_t)data + -0x4);
    patch_x86_64_32rx(code + 0xf, (uintptr_t)data + 0x4);
}

@brandtbucher brandtbucher added skip news interpreter-core (Objects, Python, Grammar, and Parser dirs) build The build process and cross-build topic-JIT labels Feb 12, 2025
@brandtbucher brandtbucher self-assigned this Feb 12, 2025
@pitrou
Copy link
Member

pitrou commented Feb 12, 2025

Since we request fresh pages of memory for JIT code, it's guaranteed to be zeroed anyways, so we can save space in the file and operations at runtime by eliding the writes where appropriate.

Note that even without that property, you could simply have issued a memset instead of copying from a statically-allocated area of zeros :)

Here's a before-and-after for one of our most common uops, _CHECK_VALIDITY:

It seems strange to have a dedicated µop doing just this :) Is there a documentation for µops somewhere?

@brandtbucher
Copy link
Member Author

brandtbucher commented Feb 12, 2025

It seems strange to have a dedicated µop doing just this :)

Does it? The role of this uop is to quickly check a single bit of state to check the our optimizer's assumptions hold. This can happen in lots of different places (a single Py_DECREF can change the world), so it helps to have a small check for it that can be put anywhere.

Is there a documentation for µops somewhere?

The general format and approach is documented in InternalDocs/jit.md. The individual uops aren't documented publicly, since they're a very unstable, low level implementation detail of an experimental feature. If there's a real need to internally document each of the 296 uops we currently have, we can probably find the time to do it. But most of them are either simple enough to follow (like type or dictionary version checks), or are identical to a full bytecode instruction that's already documented.

Copy link
Member

@savannahostrowski savannahostrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smaller stencils 🎉

@TeamSpen210
Copy link

In the stripped version, code_body is still set to have the original length. Looks like the format string wasn't updated?

@brandtbucher
Copy link
Member Author

Yeah, that's expected. There are places where we use sizeof(code_body), and those are a bit more disruptive to change. I felt it wasn't worth it... the real wins come from saving space in the file, and removing entire memcpy calls.

@brandtbucher brandtbucher merged commit 05e89c3 into python:main Feb 13, 2025
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build interpreter-core (Objects, Python, Grammar, and Parser dirs) skip news topic-JIT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants