-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run reports in a separate Python process. #41
Conversation
0053fd1
to
cedb387
Compare
This comment was marked as outdated.
This comment was marked as outdated.
92c0ecf
to
7f85351
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked through this a couple of times now, and am glad you've taken the time to sort this all out
from outpack.util import openable_temporary_file | ||
|
||
|
||
def run_in_sandbox(target, args=(), cwd=None, syspath=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use callr
for this sort of manoeuvre in R - does python not have something we can pull in to do this for us? This way is fine though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I could find is the multiprocessing module, but it's a bit inconvenient to use and puts a burden on the caller to setup their script in the right way.
except BaseException as e: | ||
# This allows the traceback to be pickled and communicated out of the | ||
# the sandbox. | ||
pickling_support.install(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job finding this
The runpy module we've been using to run reports provides very little isolation. In particular, "any side effects (such as cached imports of other modules) will remain in place after the functions have returned". This means that if a user splits their report into multiple files, with one file importing another, the latter will be cached and never reloaded by Python, even if it is modified. Similarly, if the module was already loaded before, the report runs against old code that does not match the files included in the packet. Using a separate subprocess gives us a much stronger isolation, including all global variables and not sharing any of previously loaded modules. It means a user can run a report, modify some of the imported code, run the report again and have this behave as expected. The sandbox is implemented by pickling the function and its argument into a file, starting a new Python process which unpickles the file and calls the target. Similarly, the function's return value, or any exception it may throw, it pickled to an output file, which is read by the parent. I tried to use the multiprocessing module to implement this, but it puts inconvenient requirements on the way the top-level source file is written. The module re-evaluates the file in the subprocess, meaning its contents must check for `__name__ == "__main__"` to avoid doing the work again. It also doesn't work too well with the REPL. Using our own implementation fixes these issues. The Packet class is refactored a little and its scope is reduced to just creating the packet and its metadata, without inserting it into the repository. The insertion is now handled by a separate function. This is done as a way of distinguishing what happens in the child process and what happens in the parent.
The runpy module we've been using to run reports provides very little isolation. In particular, "any side effects (such as cached imports of other modules) will remain in place after the functions have returned". This means that if a user splits their report into multiple files, with one file importing another, the latter will be cached and never reloaded by Python, even if it is modified. Similarly, if the module was already loaded before, the report runs against old code that does not match the files included in the packet.
Using a separate subprocess gives us a much stronger isolation, including all global variables and not sharing any of previously loaded modules. It means a user can run a report, modify some of the imported code, run the report again and have this behave as expected.
The Packet class is refactored a little and its scope is reduced to just creating the packet and its metadata, without inserting it into the repository. The insertion is now handled by a separate function. This is done as a way of distinguishing what happens in the child process and what happens in the parent.