Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python fails to read a script whose path is /dev/fd/X #87235

Closed
cipriancraciun mannequin opened this issue Jan 29, 2021 · 10 comments
Closed

Python fails to read a script whose path is /dev/fd/X #87235

cipriancraciun mannequin opened this issue Jan 29, 2021 · 10 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-mac topic-importlib

Comments

@cipriancraciun
Copy link
Mannequin

cipriancraciun mannequin commented Jan 29, 2021

BPO 43069
Nosy @ronaldoussoren, @ned-deily, @izbyshev, @cipriancraciun

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-01-29.22:32:21.638>
labels = ['OS-mac', 'interpreter-core', '3.8', '3.9']
title = 'Python fails to read a script whose path is `/dev/fd/X`'
updated_at = <Date 2021-01-30.12:21:41.381>
user = 'https://github.com/cipriancraciun'

bugs.python.org fields:

activity = <Date 2021-01-30.12:21:41.381>
actor = 'izbyshev'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core', 'macOS']
creation = <Date 2021-01-29.22:32:21.638>
creator = 'ciprian.craciun'
dependencies = []
files = []
hgrepos = []
issue_num = 43069
keywords = []
message_count = 3.0
messages = ['385955', '385978', '385982']
nosy_count = 4.0
nosy_names = ['ronaldoussoren', 'ned.deily', 'izbyshev', 'ciprian.craciun']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue43069'
versions = ['Python 3.8', 'Python 3.9']

Linked PRs

@cipriancraciun
Copy link
Mannequin Author

cipriancraciun mannequin commented Jan 29, 2021

Sometimes (especially from wrapper scripts) one would like to call Python with a script that should be read from a file-descriptor, like python3 /dev/fd/9; most likely the backing is a file in TMPDIR (perhaps unlinked, like bash's "here documents").

However on OSX (I've tried it on 10.15.7 and 10.13.6), for some reason if that script is too "large" (more than a couple of characters) it just bails out, even if that file-descriptor is in fact backed by a file.

For example:

echo 'print("1234567890")' \>/tmp/x.py && python3 /dev/fd/9 9\</tmp/x.py
\# correctly prints 1234567890

echo 'print("12345678901234567890")' \>/tmp/x.py && python3 /dev/fd/9 9\</tmp/x.py
\# prints nothing, no error, no exit code, etc.

# alternative variant 1, with `bash` "here-documents"
python3 /dev/fd/9 9<<<'print("12345678901234567890")'
# still prints nothing.

# alternative variant 2, that uses `/dev/stdin` instead of `/dev/fd/N`
python3 /dev/stdin <<<'print("12345678901234567890")'
# still prints nothing.

# alternative variant 3, that uses `open` and `exec`
python3 -c 'exec(open("/dev/fd/9").read())' 9<<<'print("12345678901234567890")'
# correctly prints 12345678901234567890

The file /tmp/x.py is just a simple script that prints that token.

I've tried both Python 3.9.1, 3.8.2 and even 2.7.18 and 2.7.16, all with the same results. (This makes me think it's actually an OSX issue?)

On Linux this works flawlesly. Furthermore if one uses something else like cat, bash or anything else it works. Thus it is something related with Python on OSX.

Also as seen from the examples, this is not a "shell issue" or something similar; it just seems to hit a corner case when the script path is /dev/fd/... or /dev/stdin

@cipriancraciun cipriancraciun mannequin added 3.8 (EOL) end of life 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-mac labels Jan 29, 2021
@ronaldoussoren
Copy link
Contributor

I can reproduce the issue on macOS 11.1.

As you write:

  • running /dev/fd/X as a script fails silently if it refers to an smallish file
  • reading /dev/fd/X referring to the same smallish files works fine (the open('/dev/fd/9').read() scenario.

If I read the code in Modules/main.c correctly the main difference between the two scenario's is that the first scenario using the C stdio library to read the file (in pymain_run_file_obj), and the latter uses the normal Python io stack.

Reading /dev/fd/9 works fine when using either stdio or open/read in C code.

I have not yet tried to untangle the layers of C code from pymain_run_file_obj to actually reading the script, there might be something there that sheds light on what's going on here.

I'm not yet willing to claim this is an OS bug as I've failed to reproduce this outside of Python.

@izbyshev
Copy link
Mannequin

izbyshev mannequin commented Jan 30, 2021

I would suggest to start digging from the following piece of code in maybe_pyc_file() (Python/pythonrun.c):

     int ispyc = 0;
     if (ftell(fp) == 0) {
         if (fread(buf, 1, 2, fp) == 2 &&
             ((unsigned int)buf[1]<<8 | buf[0]) == halfmagic)
             ispyc = 1;
         rewind(fp);
     }

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@ronaldoussoren
Copy link
Contributor

The problem is not in pythonrun.c as the program doesn't get there. The issue is very weird, and could be a bug in the OS.

The script is read by Modules/main.c: pymain_run_file_obj. That calls _Py_fopen_obj to open the file object, and when I instrument pymain_run_file_obj I see that the file offset is wrong: when the contents of /dev/fd/9 is large enough the result of ftell(3) for the newly opened file is not 0 but at EOF. Calling rewind(3) just after opening "fixes" the issue, but that's a gross hack and not the correct fix.

git diff ../Modules/main.c                                                                                                                         (gh-83765-doc-large-shmmem)cpython
diff --git a/Modules/main.c b/Modules/main.c
index 7edfeb3365..d5a7577098 100644
--- a/Modules/main.c
+++ b/Modules/main.c
@@ -329,6 +329,8 @@ pymain_run_file_obj(PyObject *program_name, PyObject *filename,
                            program_name, filename, errno, strerror(errno));
         return 2;
     }
+    PySys_FormatStderr("opened %R at %ld\n", filename, ftell(fp));
+    rewind(fp);
 
     if (skip_source_first_line) {
         int ch;

With this patch a small file prints: opened '/dev/fd/9' at 0, while a larger file prints opened '/dev/fd/9' at 72 (file size is 72 bytes).

Tracing further: _Py_fopen_obj is defined in Python/fileutils.c, and calls fopen (on unix-y systems like macOS). Similar instrumentation in that file shows that the file offset is wrong directly after calling fopen(3).

But that's in the context of Python, a small program that contains similar code works fine. However... If I change that program to first read some binds and then reopen /dev/fd/... the second open inherits the file offset from the first file pointer!

E.g. (in pseudo C without error checking):

FILE* fp1 = fopen("/dev/fd/9", "rb");
fgets(fp1);
FILE* fp2 = fopen("/dev/fd/9", "rb");
tell(fp2); // <--- Does *not* return 0

I do not yet know how relevant this is, from what I've seen so far pymain_run_file_obj is the first time the file gets opened.

@ronaldoussoren
Copy link
Contributor

I've done some more sleuthing, and the finding in the previous message was relevant...

Turns out the file is opened a second time, but not using _Py_fopen_obj: pymain_run_python calls pymain_get_importer, which calls PyImport_GetImporter and that calls into importlib, which call zip import.zipimport which opens the file and tries to read a zip directory from it. This changes the file offset and does not reset it (and that's generally not necessary).

This also explains why small files work fine: those are to small to seek to the offset of the zip directory, and hence zipimporter fails when it tries to change the file offset.

In a quick hack I changed zipimporter._read_directory to reset the file offset and that fixes this issue:

diff --git a/Lib/zipimport.py b/Lib/zipimport.py
index 016f1b8a79..9d0e6c808d 100644
--- a/Lib/zipimport.py
+++ b/Lib/zipimport.py
@@ -347,6 +347,8 @@ def _read_directory(archive):
         raise ZipImportError(f"can't open Zip file: {archive!r}", path=archive)
 
     with fp:
+      ro_off = fp.tell()
+      try:
         try:
             fp.seek(-END_CENTRAL_DIR_SIZE, 2)
             header_position = fp.tell()
@@ -455,6 +457,8 @@ def _read_directory(archive):
             t = (path, compress, data_size, file_size, file_offset, time, date, crc)
             files[name] = t
             count += 1
+      finally:
+          fp.seek(ro_off, 0)
     _bootstrap._verbose_message('zipimport: found {} names in {!r}', count, archive)
     return files
 

Inline patch because it is not ready for a PR at this point.

@ronaldoussoren
Copy link
Contributor

Summary:

  • On macOS the file offset for multiple open file descriptors for /dev/fd/N are not independent.
  • python /dev/fd/N opens /dev/fd/N twice: first in zipimport and then to actually read the script. Zipimport changes the file offset and the code that actually runs the script sees an empty file (for a large enough file)
  • A workaround for this issue is to change zipimporter to reset the file offset when scanning the zip directory

Creating a PR for this shouldn't be too hard, but not today... This definitely needs a test.

ronaldoussoren added a commit to ronaldoussoren/cpython that referenced this issue Nov 25, 2022
…ks on macOS

On macOS all file descriptors for a particular file in /dev/fd
share the same file offset, that is ``open("/dev/fd/9", "r")`` behaves
more like ``dup(9)`` than a regular open.

This causes problems when a user tries to run "/dev/fd/9" as a script
because zipimport changes the file offset to try to read a zipfile
directory. Therefore change zipimport to reset the file offset after
trying to read the zipfile directory.
@ronaldoussoren
Copy link
Contributor

I've added a PR that fixes the issue, including a test that failed before I changed the zipimport module.

ronaldoussoren added a commit that referenced this issue Nov 27, 2022
…macOS (#99768)

On macOS all file descriptors for a particular file in /dev/fd
share the same file offset, that is ``open("/dev/fd/9", "r")`` behaves
more like ``dup(9)`` than a regular open.

This causes problems when a user tries to run "/dev/fd/9" as a script
because zipimport changes the file offset to try to read a zipfile
directory. Therefore change zipimport to reset the file offset after
trying to read the zipfile directory.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 27, 2022
…ks on macOS (pythonGH-99768)

On macOS all file descriptors for a particular file in /dev/fd
share the same file offset, that is ``open("/dev/fd/9", "r")`` behaves
more like ``dup(9)`` than a regular open.

This causes problems when a user tries to run "/dev/fd/9" as a script
because zipimport changes the file offset to try to read a zipfile
directory. Therefore change zipimport to reset the file offset after
trying to read the zipfile directory.
(cherry picked from commit d08fb25)

Co-authored-by: Ronald Oussoren <ronaldoussoren@mac.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 27, 2022
…ks on macOS (pythonGH-99768)

On macOS all file descriptors for a particular file in /dev/fd
share the same file offset, that is ``open("/dev/fd/9", "r")`` behaves
more like ``dup(9)`` than a regular open.

This causes problems when a user tries to run "/dev/fd/9" as a script
because zipimport changes the file offset to try to read a zipfile
directory. Therefore change zipimport to reset the file offset after
trying to read the zipfile directory.
(cherry picked from commit d08fb25)

Co-authored-by: Ronald Oussoren <ronaldoussoren@mac.com>
miss-islington added a commit that referenced this issue Nov 27, 2022
…macOS (GH-99768)

On macOS all file descriptors for a particular file in /dev/fd
share the same file offset, that is ``open("/dev/fd/9", "r")`` behaves
more like ``dup(9)`` than a regular open.

This causes problems when a user tries to run "/dev/fd/9" as a script
because zipimport changes the file offset to try to read a zipfile
directory. Therefore change zipimport to reset the file offset after
trying to read the zipfile directory.
(cherry picked from commit d08fb25)

Co-authored-by: Ronald Oussoren <ronaldoussoren@mac.com>
@vstinner
Copy link
Member

vstinner commented Dec 5, 2022

The newly added test fails on FreeBSD, see issue #100005. I proposed PR #100006 to fix the issue.

@hauntsaninja
Copy link
Contributor

Nice catch! Looks like the 3.10 backport is still pending

ambv added a commit that referenced this issue Mar 28, 2023
…rks on macOS (GH-99768) (#99817)

On macOS all file descriptors for a particular file in /dev/fd
share the same file offset, that is ``open("/dev/fd/9", "r")`` behaves
more like ``dup(9)`` than a regular open.

This causes problems when a user tries to run "/dev/fd/9" as a script
because zipimport changes the file offset to try to read a zipfile
directory. Therefore change zipimport to reset the file offset after
trying to read the zipfile directory.
(cherry picked from commit d08fb25)

Co-authored-by: Ronald Oussoren <ronaldoussoren@mac.com>

* Regen zipimport

---------

Co-authored-by: Ronald Oussoren <ronaldoussoren@mac.com>
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
@ambv
Copy link
Contributor

ambv commented Mar 28, 2023

Fixed in 3.10 as well.

@ambv ambv closed this as completed Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 (EOL) end of life 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-mac topic-importlib
Projects
None yet
Development

No branches or pull requests

4 participants