gh-105820: Fix tok_mode expression buffer in file & readline tokenizer #105828

lysnikolaou · 2023-06-15T13:11:23Z

…kenizer

pablogsal · 2023-06-15T13:25:16Z

@lysnikolaou Hummm, can you quickly cover why moving this chunk solves the problem? I think I am missing some key insight here

lysnikolaou · 2023-06-15T13:37:57Z

Sure. There's two different cases to consider here:

When we're calling the tokenizer from the tokenize module, it uses the readline tokenizer, which you added in 9216e69. In that branch, update_fstring_expr wasn't being called at all from tok_underflow_readline, which resulted in the last_expr_size not being updated. That's why we were passing a negative count to PyUnicode_DecodeUTF8.

cpython on  main via C v14.0.3-clang via 🐍 pyenv 3.11.3 
❯ cat tmp/t.py
print(f'''{
3
=}''')%                                                                                                                                                                                                                                                                                                                                                                                                         

cpython on  main via C v14.0.3-clang via 🐍 pyenv 3.11.3 
❯ ./python.exe -m tokenize tmp/t.py
unexpected error: Negative size passed to PyUnicode_New
Traceback (most recent call last):
  File "/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 547, in <module>
    main()
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 489, in main
    tokens = list(tokenize(f.readline))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 447, in tokenize
    yield from _generate_tokens_from_c_tokenizer(rl_gen.__next__, encoding, extra_tokens=True)
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 537, in _generate_tokens_from_c_tokenizer
    for info in it:
SystemError: Negative size passed to PyUnicode_New

There was a second bug with an f-string being tokenized from a file, only when the closing brace of the f-string expression part is in the last line of the file and there's no newline at the end. In that branch we use tok->cur to calculate the size of the buffer before adding an implicit newline and then tok->start after adding the newline. That resulted in the end buffer being one character too short.

cpython on  main via C v14.0.3-clang via 🐍 pyenv 3.11.3 took 15s 
❯ cat tmp/t.py
print(f'''{
3
=}''')%                                                                                                                                                                                                                                                                                                                                                                                                         

cpython on  main via C v14.0.3-clang via 🐍 pyenv 3.11.3 
❯ ./python.exe tmp/t.py            

3
3

lysnikolaou · 2023-06-15T13:38:25Z

Do you think I should split this PR in two and open a second issue for the second case above?

pablogsal · 2023-06-15T15:37:46Z

Do you think I should split this PR in two and open a second issue for the second case above?

Nah, I think is fine, but we may want to add two NEWS entries

pablogsal

LGTM

Thanks for the explanation @lysnikolaou. Fantastic work as always! 💪

pablogsal · 2023-06-15T15:39:50Z

@lysnikolaou If you want add the second NEWS entry and then land it.

miss-islington · 2023-06-15T16:21:29Z

Thanks @lysnikolaou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

…kenizer (pythonGH-105828) (cherry picked from commit d382ad4) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>

bedevere-bot · 2023-06-15T16:21:40Z

GH-105832 is a backport of this pull request to the 3.12 branch.

…okenizer (GH-105828) (#105832) (cherry picked from commit d382ad4) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>

* main: (57 commits) pythongh-105831: Fix NEWS blurb from pythongh-105828 (python#105833) pythongh-105820: Fix tok_mode expression buffer in file & readline tokenizer (python#105828) pythongh-105751, test_ctypes: Remove disabled tests (python#105826) pythongh-105821: Use a raw f-string in test_httpservers.py (python#105822) pythongh-105751: Remove platform usage in test_ctypes (python#105819) pythongh-105751: Reenable disable test_ctypes tests (python#105818) pythongh-105751: Remove dead code in test_ctypes (python#105817) More reorganisation of the typing docs (python#105787) Improve docs for `typing.dataclass_transform` (python#105792) pythonGH-89812: Churn `pathlib.Path` test methods (python#105807) pythongh-105800: Issue SyntaxWarning in f-strings for invalid escape sequences (python#105801) pythongh-105751: Cleanup test_ctypes imports (python#105803) pythongh-105481: add HAS_JUMP flag to opcode metadata (python#105791) pythongh-105751: test_ctypes avoids the operator module (pythonGH-105797) pythongh-105751: test_ctypes: Remove @need_symbol decorator (pythonGH-105798) pythongh-104909: Implement conditional stack effects for macros (python#105748) pythongh-75905: Remove test_xmlrpc_net: skipped since 2017 (python#105796) pythongh-105481: Fix types and a bug for pseudos (python#105788) Update DSL docs for cases generator (python#105753) pythonGH-77273: Better bytecodes for f-strings (pythonGH-6132) ...

…kenizer (python#105828)

pythongh-105820: Fix tok_mode expression buffer in file & readline to…

1ebf46b

…kenizer

lysnikolaou requested a review from pablogsal as a code owner June 15, 2023 13:11

bedevere-bot added the awaiting core review label Jun 15, 2023

bedevere-bot mentioned this pull request Jun 15, 2023

patchcheck.py raises SystemError on main #105820

Closed

lysnikolaou added skip news needs backport to 3.12 bug and security fixes labels Jun 15, 2023

Normalize newline in test_fstring to fix on Windows

a83de94

lysnikolaou force-pushed the fix-expr-buffer-file-readline branch from 2efb0a4 to a83de94 Compare June 15, 2023 13:46

pablogsal approved these changes Jun 15, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jun 15, 2023

lysnikolaou mentioned this pull request Jun 15, 2023

F-string debug mode does not print correctly when in last line of file #105831

Closed

📜🤖 Added by blurb_it.

eaa4a28

lysnikolaou enabled auto-merge (squash) June 15, 2023 15:56

lysnikolaou merged commit d382ad4 into python:main Jun 15, 2023

bedevere-bot removed the awaiting merge label Jun 15, 2023

bedevere-bot removed the needs backport to 3.12 bug and security fixes label Jun 15, 2023

lysnikolaou added a commit to lysnikolaou/cpython that referenced this pull request Jun 15, 2023

Fix NEWS blurb from pythongh-105828

4c8af22

lysnikolaou added a commit that referenced this pull request Jun 15, 2023

gh-105831: Fix NEWS blurb from gh-105828 (#105833)

3af2dc7

lysnikolaou added a commit that referenced this pull request Jun 15, 2023

[3.12] gh-105820: Fix tok_mode expression buffer in file & readline t…

e9cf5a3

…okenizer (GH-105828) (#105832) (cherry picked from commit d382ad4) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>

gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023

pythongh-105820: Fix tok_mode expression buffer in file & readline to…

5862c96

…kenizer (python#105828)

gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023

pythongh-105831: Fix NEWS blurb from pythongh-105828 (python#105833)

2b54991

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-105820: Fix tok_mode expression buffer in file & readline tokenizer #105828

gh-105820: Fix tok_mode expression buffer in file & readline tokenizer #105828

lysnikolaou commented Jun 15, 2023 •

edited

Loading

pablogsal commented Jun 15, 2023

lysnikolaou commented Jun 15, 2023 •

edited

Loading

lysnikolaou commented Jun 15, 2023

pablogsal commented Jun 15, 2023

pablogsal left a comment

pablogsal commented Jun 15, 2023

miss-islington commented Jun 15, 2023

bedevere-bot commented Jun 15, 2023

gh-105820: Fix tok_mode expression buffer in file & readline tokenizer #105828

gh-105820: Fix tok_mode expression buffer in file & readline tokenizer #105828

Conversation

lysnikolaou commented Jun 15, 2023 • edited Loading

pablogsal commented Jun 15, 2023

lysnikolaou commented Jun 15, 2023 • edited Loading

lysnikolaou commented Jun 15, 2023

pablogsal commented Jun 15, 2023

pablogsal left a comment

Choose a reason for hiding this comment

pablogsal commented Jun 15, 2023

miss-islington commented Jun 15, 2023

bedevere-bot commented Jun 15, 2023

lysnikolaou commented Jun 15, 2023 •

edited

Loading

lysnikolaou commented Jun 15, 2023 •

edited

Loading