Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer crash when redirecting input to stdin #94360

Closed
mimicria opened this issue Jun 28, 2022 · 9 comments · Fixed by #94386
Closed

Tokenizer crash when redirecting input to stdin #94360

mimicria opened this issue Jun 28, 2022 · 9 comments · Fixed by #94386
Assignees
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@mimicria
Copy link

Hi!
We were doing some fuzzing using AFL for latest version 3.10.5, and we found an interesting issue.
There is some crash that may be exploitable. I check latest version from git and crash was reproduced.
So input file attached with screenshots

input.tar.gz
2022-06-26_064337
2022-06-28_050343
2022-06-28_051209

@mimicria mimicria added the type-crash A hard crash of the interpreter, possibly with a core dump label Jun 28, 2022
@mimicria
Copy link
Author

Could you please provide a standalone reproducer (a bash script that feeds ill-formed input into Python causing the crash)? We need something to run under a debugger without installation of full-fledged American Fuzzy Lop.

BTW, I tried naive

python s5_id_000269,sig_06,src_000898+006160,time_30133834,execs_1088822,op_splice,rep_4

on Windows and got no crash.

This is file name for input, file is included in archive input.tar.gz (attached) also with GDB commands to use, such as

file /home/user/cpython/python
run < /home/user/fuzz/python/out/checked_crashes/'s5:id:000269,sig:06,src:000898+006160,time:30133834,execs:1088822,op:splice,rep:4'

@kumaraditya303
Copy link
Contributor

Can you copy and paste the GDB backtrace rather than a screenshot?

@mimicria
Copy link
Author

mimicria commented Jun 28, 2022

Can you copy and paste the GDB backtrace rather than a screenshot?

Starting program: /home/user/cpython/python < /home/user/fuzz/python/out/checked_crashes/'s5:id:000269,sig:06,src:000898+006160,time:30133834,execs:1088822,op:s                                                                             plice,rep:4'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
__strchr_sse2 () at ../sysdeps/x86_64/multiarch/../strchr.S:32
32      ../sysdeps/x86_64/multiarch/../strchr.S: Нет такого файла или каталога.

@arhadthedev
Copy link
Member

@mimicria My bad, I've already found that it's enough to install https://github.com/jfoote/exploitable and feed the attached gdb_script into gdb.

That's why I deleted my comment hoping that nobody've seen it.

@arhadthedev
Copy link
Member

arhadthedev commented Jun 28, 2022

I currently have no access to a Linix machine so I can't create a debug build to get a nice stack trace from gdb.

@zware
Copy link
Member

zware commented Jun 28, 2022

I've reduced the reproducer to b'#coding:latin1\n#\x00\n#\x00\n\xff a\n' and confirmed the crash in 3.10, 3.11, and main after 261a452 (GH-25050; cc @pablogsal, @erlend-aasland, @serhiy-storchaka).

Interestingly, the very similar b'#coding:latin1\n#\x00\n\x00\n\xff a\n' crashes 3.10, but appears to have been fixed in 3.11 and later by @pablogsal in GH-29654, which backports cleanly to 3.10 but does nothing for the form with third line commented.

Also note that this only appears to reproduce with ./python < reproducer; about every other form I've tried (subprocess.run(sys.executable, input=b'reproducer'), cat reproducer | ./python, ./python <(cat reproducer), ./python reproducer, etc.) results in a proper SyntaxError.

For a self-contained reproducer that still requires Bash:

#!/bin/bash

./python -c 'import sys; sys.stdout.buffer.write(b"#coding:latin1\n#\x00\n#\x00\n\xff a\n")' > bad_script
./python < bad_script

Hopefully this is helpful, but this is about all I can contribute here.

@zware zware added 3.11 only security fixes 3.10 only security fixes 3.12 bugs and security fixes labels Jun 28, 2022
@pablogsal pablogsal changed the title Crash found Tokenizer crash when redirecting input to stdin Jun 28, 2022
pablogsal added a commit to pablogsal/cpython that referenced this issue Jun 28, 2022
… syntax errors from stdin

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit that referenced this issue Jul 5, 2022
…x errors from stdin (#94386)

* gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

* nitty nit

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 5, 2022
… syntax errors from stdin (pythonGH-94386)

* pythongh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

* nitty nit

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 36fcde6)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
miss-islington added a commit that referenced this issue Jul 5, 2022
…x errors from stdin (GH-94386)

* gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

* nitty nit

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 36fcde6)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
pablogsal added a commit to pablogsal/cpython that referenced this issue Jul 5, 2022
…es with syntax errors from stdin (pythonGH-94386)

* pythongh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

* nitty nit

Co-authored-by: Łukasz Langa <lukasz@langa.pl>.
(cherry picked from commit 36fcde6)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
ambv pushed a commit that referenced this issue Jul 5, 2022
…h syntax errors from stdin (GH-94386) (GH-94574)

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>

(cherry picked from commit 36fcde6)
@vincedani
Copy link

Hello @mimicria, it looks like you are (or were) fuzzing this repository, and you’ve found some interesting bugs. 🥇

I would like to create a Python based test case reduction test suite that contains fuzzer generated outputs, and benchmark automatic test case reducers how they perform on Python inputs. It looks like to me you have opened this issue with the already reduced input that caused malfunction. Is it possible that you still have the output of the fuzzer, which is free of any reduction?

Thanks in advance,
Daniel

@mimicria
Copy link
Author

mimicria commented Feb 20, 2023

Is it possible that you still have the output of the fuzzer, which is free of any reduction?

Hello, Daniel!
Python inputs that cause crashes are not reduced after fuzzing, just packaged in an archive (issue starting message).

@vincedani
Copy link

Thank you @mimicria, I'll try to use that artifact.

Is it possible that you still have the output of the fuzzer, which is free of any reduction?

Hello, Daniel! Python inputs that cause crashes are not reduced after fuzzing, just packaged in an archive (issue starting message).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
6 participants