Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-105820: Fix tok_mode expression buffer in file & readline tokenizer #105828

Merged
merged 3 commits into from
Jun 15, 2023

Conversation

lysnikolaou
Copy link
Member

@lysnikolaou lysnikolaou commented Jun 15, 2023

@pablogsal
Copy link
Member

@lysnikolaou Hummm, can you quickly cover why moving this chunk solves the problem? I think I am missing some key insight here

@lysnikolaou
Copy link
Member Author

lysnikolaou commented Jun 15, 2023

Sure. There's two different cases to consider here:

  1. When we're calling the tokenizer from the tokenize module, it uses the readline tokenizer, which you added in 9216e69. In that branch, update_fstring_expr wasn't being called at all from tok_underflow_readline, which resulted in the last_expr_size not being updated. That's why we were passing a negative count to PyUnicode_DecodeUTF8.
cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3cat tmp/t.py
print(f'''{
3
=}''')%                                                                                                                                                                                                                                                                                                                                                                                                         

cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3 
❯ ./python.exe -m tokenize tmp/t.py
unexpected error: Negative size passed to PyUnicode_New
Traceback (most recent call last):
  File "/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 547, in <module>
    main()
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 489, in main
    tokens = list(tokenize(f.readline))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 447, in tokenize
    yield from _generate_tokens_from_c_tokenizer(rl_gen.__next__, encoding, extra_tokens=True)
  File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 537, in _generate_tokens_from_c_tokenizer
    for info in it:
SystemError: Negative size passed to PyUnicode_New
  1. There was a second bug with an f-string being tokenized from a file, only when the closing brace of the f-string expression part is in the last line of the file and there's no newline at the end. In that branch we use tok->cur to calculate the size of the buffer before adding an implicit newline and then tok->start after adding the newline. That resulted in the end buffer being one character too short.
cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3 took 15scat tmp/t.py
print(f'''{
3
=}''')%                                                                                                                                                                                                                                                                                                                                                                                                         

cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3 
❯ ./python.exe tmp/t.py            

3
3

@lysnikolaou
Copy link
Member Author

Do you think I should split this PR in two and open a second issue for the second case above?

@lysnikolaou lysnikolaou force-pushed the fix-expr-buffer-file-readline branch from 2efb0a4 to a83de94 Compare June 15, 2023 13:46
@pablogsal
Copy link
Member

Do you think I should split this PR in two and open a second issue for the second case above?

Nah, I think is fine, but we may want to add two NEWS entries

Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the explanation @lysnikolaou. Fantastic work as always! 💪

@pablogsal
Copy link
Member

@lysnikolaou If you want add the second NEWS entry and then land it.

@lysnikolaou lysnikolaou enabled auto-merge (squash) June 15, 2023 15:56
@lysnikolaou lysnikolaou merged commit d382ad4 into python:main Jun 15, 2023
@miss-islington
Copy link
Contributor

Thanks @lysnikolaou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2023
…kenizer (pythonGH-105828)

(cherry picked from commit d382ad4)

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
@bedevere-bot
Copy link

GH-105832 is a backport of this pull request to the 3.12 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.12 bug and security fixes label Jun 15, 2023
lysnikolaou added a commit to lysnikolaou/cpython that referenced this pull request Jun 15, 2023
lysnikolaou added a commit that referenced this pull request Jun 15, 2023
…okenizer (GH-105828) (#105832)

(cherry picked from commit d382ad4)

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
carljm added a commit to carljm/cpython that referenced this pull request Jun 15, 2023
* main: (57 commits)
  pythongh-105831: Fix NEWS blurb from pythongh-105828 (python#105833)
  pythongh-105820: Fix tok_mode expression buffer in file & readline tokenizer (python#105828)
  pythongh-105751, test_ctypes: Remove disabled tests (python#105826)
  pythongh-105821: Use a raw f-string in test_httpservers.py (python#105822)
  pythongh-105751: Remove platform usage in test_ctypes (python#105819)
  pythongh-105751: Reenable disable test_ctypes tests (python#105818)
  pythongh-105751: Remove dead code in test_ctypes (python#105817)
  More reorganisation of the typing docs (python#105787)
  Improve docs for `typing.dataclass_transform` (python#105792)
  pythonGH-89812: Churn `pathlib.Path` test methods (python#105807)
  pythongh-105800: Issue SyntaxWarning in f-strings for invalid escape sequences (python#105801)
  pythongh-105751: Cleanup test_ctypes imports (python#105803)
  pythongh-105481: add HAS_JUMP flag to opcode metadata (python#105791)
  pythongh-105751: test_ctypes avoids the operator module (pythonGH-105797)
  pythongh-105751: test_ctypes: Remove @need_symbol decorator (pythonGH-105798)
  pythongh-104909: Implement conditional stack effects for macros (python#105748)
  pythongh-75905: Remove test_xmlrpc_net: skipped since 2017 (python#105796)
  pythongh-105481: Fix types and a bug for pseudos (python#105788)
  Update DSL docs for cases generator (python#105753)
  pythonGH-77273: Better bytecodes for f-strings (pythonGH-6132)
  ...
gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023
gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants