From 83b900140e6d9b86ff4699b53ebcca98790e7538 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Thu, 13 Jun 2024 10:29:51 +0100 Subject: [PATCH] gh-119786: move locations doc to InternalDocs. Delete lnotab_notes.txt as it is for older versions --- {Objects => InternalDocs}/locations.md | 8 +- Objects/lnotab_notes.txt | 228 ------------------------- 2 files changed, 4 insertions(+), 232 deletions(-) rename {Objects => InternalDocs}/locations.md (88%) delete mode 100644 Objects/lnotab_notes.txt diff --git a/Objects/locations.md b/InternalDocs/locations.md similarity index 88% rename from Objects/locations.md rename to InternalDocs/locations.md index 18a338a95978a9d..91a7824e2a8e4d8 100644 --- a/Objects/locations.md +++ b/InternalDocs/locations.md @@ -1,10 +1,10 @@ # Locations table -For versions up to 3.10 see ./lnotab_notes.txt +The `co_linetable` bytes object of code objects contains a compact +representation of the source code positions of instructions, which are +returned by the `co_positions()` iterator. -In version 3.11 the `co_linetable` bytes object of code objects contains a compact representation of the positions returned by the `co_positions()` iterator. - -The `co_linetable` consists of a sequence of location entries. +`co_linetable` consists of a sequence of location entries. Each entry starts with a byte with the most significant bit set, followed by zero or more bytes with most significant bit unset. Each entry contains the following information: diff --git a/Objects/lnotab_notes.txt b/Objects/lnotab_notes.txt deleted file mode 100644 index 0f3599340318f09..000000000000000 --- a/Objects/lnotab_notes.txt +++ /dev/null @@ -1,228 +0,0 @@ -Description of the internal format of the line number table in Python 3.10 -and earlier. - -(For 3.11 onwards, see Objects/locations.md) - -Conceptually, the line number table consists of a sequence of triples: - start-offset (inclusive), end-offset (exclusive), line-number. - -Note that not all byte codes have a line number so we need handle `None` for the line-number. - -However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry. - -First, note that the end of one entry is the same as the start of the next, so we can overlap entries. -Second, we don't really need arbitrary access to the sequence, so we can store deltas. - -We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero. - -Third, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line. - -Consider the following table - Start End Line - 0 6 1 - 6 50 2 - 50 350 7 - 350 360 No line number - 360 376 8 - 376 380 208 - -Stripping the redundant ends gives: - - End-Start Line-delta - 6 +1 - 44 +1 - 300 +5 - 10 No line number - 16 +1 - 4 +200 - - -Note that the end - start value is always positive. - -Finally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254, -and line deltas to the range -127 <= delta <= 127. -A line delta of -128 is used to indicate no line number. -Also note that a delta of zero indicates that there are no bytecodes in the given range, -which means we can use an invalid line number for that range. - -Final form: - - Start delta Line delta - 6 +1 - 44 +1 - 254 +5 - 46 0 - 10 -128 (No line number, treated as a delta of zero) - 16 +1 - 0 +127 (line 135, but the range is empty as no bytecodes are at line 135) - 4 +73 - -Iterating over the table. -------------------------- - -For the `co_lines` method we want to emit the full form, omitting the (350, 360, No line number) and empty entries. - -The code is as follows: - -def co_lines(code): - line = code.co_firstlineno - end = 0 - table_iter = iter(code.internal_line_table): - for sdelta, ldelta in table_iter: - if ldelta == 0: # No change to line number, just accumulate changes to end - end += sdelta - continue - start = end - end = start + sdelta - if ldelta == -128: # No valid line number -- skip entry - continue - line += ldelta - if end == start: # Empty range, omit. - continue - yield start, end, line - - - - -The historical co_lnotab format -------------------------------- - -prior to 3.10 code objects stored a field named co_lnotab. -This was an array of unsigned bytes disguised as a Python bytes object. - -The old co_lnotab did not account for the presence of bytecodes without a line number, -nor was it well suited to tracing as a number of workarounds were required. - -The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format. - -Below is the description of the old co_lnotab format: - - -The array is conceptually a compressed list of - (bytecode offset increment, line number increment) -pairs. The details are important and delicate, best illustrated by example: - - byte code offset source code line number - 0 1 - 6 2 - 50 7 - 350 207 - 361 208 - -Instead of storing these numbers literally, we compress the list by storing only -the difference from one row to the next. Conceptually, the stored list might -look like: - - 0, 1, 6, 1, 44, 5, 300, 200, 11, 1 - -The above doesn't really work, but it's a start. An unsigned byte (byte code -offset) can't hold negative values, or values larger than 255, a signed byte -(line number) can't hold values larger than 127 or less than -128, and the -above example contains two such values. (Note that before 3.6, line number -was also encoded by an unsigned byte.) So we make two tweaks: - - (a) there's a deep assumption that byte code offsets increase monotonically, - and - (b) if byte code offset jumps by more than 255 from one row to the next, or if - source code line number jumps by more than 127 or less than -128 from one row - to the next, more than one pair is written to the table. In case #b, - there's no way to know from looking at the table later how many were written. - That's the delicate part. A user of co_lnotab desiring to find the source - line number corresponding to a bytecode address A should do something like - this: - - lineno = addr = 0 - for addr_incr, line_incr in co_lnotab: - addr += addr_incr - if addr > A: - return lineno - if line_incr >= 0x80: - line_incr -= 0x100 - lineno += line_incr - -(In C, this is implemented by PyCode_Addr2Line().) In order for this to work, -when the addr field increments by more than 255, the line # increment in each -pair generated must be 0 until the remaining addr increment is < 256. So, in -the example above, assemble_lnotab in compile.c should not (as was actually done -until 2.2) expand 300, 200 to - 255, 255, 45, 45, -but to - 255, 0, 45, 127, 0, 73. - -The above is sufficient to reconstruct line numbers for tracebacks, but not for -line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c -and maybe_call_line_trace() in ceval.c. - -*** Tracing *** - -To a first approximation, we want to call the tracing function when the line -number of the current instruction changes. Re-computing the current line for -every instruction is a little slow, though, so each time we compute the line -number we save the bytecode indices where it's valid: - - *instr_lb <= frame->f_lasti < *instr_ub - -is true so long as execution does not change lines. That is, *instr_lb holds -the first bytecode index of the current line, and *instr_ub holds the first -bytecode index of the next line. As long as the above expression is true, -maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note -that the same line may appear multiple times in the lnotab, either because the -bytecode jumped more than 255 indices between line number changes or because -the compiler inserted the same line twice. Even in that case, *instr_ub holds -the first index of the next line. - -However, we don't *always* want to call the line trace function when the above -test fails. - -Consider this code: - -1: def f(a): -2: while a: -3: print(1) -4: break -5: else: -6: print(2) - -which compiles to this: - - 2 0 SETUP_LOOP 26 (to 28) - >> 2 LOAD_FAST 0 (a) - 4 POP_JUMP_IF_FALSE 18 - - 3 6 LOAD_GLOBAL 0 (print) - 8 LOAD_CONST 1 (1) - 10 CALL_NO_KW 1 - 12 POP_TOP - - 4 14 BREAK_LOOP - 16 JUMP_ABSOLUTE 2 - >> 18 POP_BLOCK - - 6 20 LOAD_GLOBAL 0 (print) - 22 LOAD_CONST 2 (2) - 24 CALL_NO_KW 1 - 26 POP_TOP - >> 28 LOAD_CONST 0 (None) - 30 RETURN_VALUE - -If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18 -and the co_lnotab will claim that execution has moved to line 4, which is wrong. -In this case, we could instead associate the POP_BLOCK with line 5, but that -would break jumps around loops without else clauses. - -We fix this by only calling the line trace function for a forward jump if the -co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current -instruction offset matches the offset given for the start of a line by the -co_lnotab. For backward jumps, however, we always call the line trace function, -which lets a debugger stop on every evaluation of a loop guard (which usually -won't be the first opcode in a line). - -Why do we set f_lineno when tracing, and only just before calling the trace -function? Well, consider the code above when 'a' is true. If stepping through -this with 'n' in pdb, you would stop at line 1 with a "call" type event, then -line events on lines 2, 3, and 4, then a "return" type event -- but because the -code for the return actually falls in the range of the "line 6" opcodes, you -would be shown line 6 during this event. This is a change from the behaviour in -2.2 and before, and I've found it confusing in practice. By setting and using -f_lineno when tracing, one can report a line number different from that -suggested by f_lasti on this one occasion where it's desirable.