Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

untokenize() does not round-trip for code containing line breaks (\ + \n) #125553

Closed
tomasr8 opened this issue Oct 15, 2024 · 0 comments
Closed
Labels
stdlib Python modules in the Lib dir topic-parser type-bug An unexpected behavior, bug, or error

Comments

@tomasr8
Copy link
Member

tomasr8 commented Oct 15, 2024

Bug report

Bug description:

Code which contains line breaks is not round-trip invariant:

import tokenize, io

source_code = r"""
1 + \
    2
"""

tokens = list(tokenize.generate_tokens(io.StringIO(source_code).readline))
x = tokenize.untokenize(tokens)
print(x)
# 1 +\
#     2

Notice that the space between + and \ is now missing. The current tokenizer code simply inserts a backslash when it encounters two subsequent tokens with a differeing row offset:

cpython/Lib/tokenize.py

Lines 179 to 182 in 9c2bb7d

row_offset = row - self.prev_row
if row_offset:
self.tokens.append("\\\n" * row_offset)
self.prev_col = 0

I think this should be fixed. The docstring of tokenize.untokenize says:

Round-trip invariant for full input:
Untokenized source will match input source exactly

To fix this, it will probably be necessary to inspect the raw line contents and count how much whitespace there is at the end of the line.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@tomasr8 tomasr8 added type-bug An unexpected behavior, bug, or error topic-parser labels Oct 15, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Oct 26, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 21, 2025
…-126010)

(cherry picked from commit 7ad793e)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
pablogsal pushed a commit that referenced this issue Jan 21, 2025
…) (#129153)

gh-125553: Fix backslash continuation in `untokenize` (GH-126010)
(cherry picked from commit 7ad793e)

Co-authored-by: Tomas R <tomas.roun8@gmail.com>
danigm added a commit to danigm/hypothesis that referenced this issue Feb 26, 2025
In python 3.13.2 untokenize() does not round-trip for code containing
line breaks (\ + \n). This patch removes the test case for the space
removal before line break.

python/cpython#125553
hugovk pushed a commit to hugovk/cpython that referenced this issue Feb 26, 2025
…ythonGH-126010)

(cherry picked from commit 7ad793e)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
danigm added a commit to danigm/hypothesis that referenced this issue Feb 27, 2025
In python 3.13.2 untokenize() does not round-trip for code containing
line breaks (\ + \n). This patch removes the test case for the space
removal before line break.

python/cpython#125553
pablogsal pushed a commit that referenced this issue Feb 27, 2025
…) (#130579)

(cherry picked from commit 7ad793e)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
Zac-HD pushed a commit to HypothesisWorks/hypothesis that referenced this issue Mar 1, 2025
In python 3.13.2 untokenize() does not round-trip for code containing
line breaks (\ + \n). This patch removes the test case for the space
removal before line break.

python/cpython#125553
Zac-HD pushed a commit to HypothesisWorks/hypothesis that referenced this issue Mar 2, 2025
In python 3.13.2 untokenize() does not round-trip for code containing
line breaks (\ + \n). This patch removes the test case for the space
removal before line break.

python/cpython#125553
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-parser type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants