-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure the kernel to have packages automatically installed #60
Comments
I feel we need to figure out some way to get these hosted with the build,
get the service worker going, use dat/ipfs, etc. before installing more
stuff magically. Spamming pypi with requests without any kind of caching
isn't very nice... They probably weren't signing up to be a static cdn for
web page assets.
|
Maybe one way could be to make that part that @psychemedia linked to in https://github.com/jtpio/jupyterlite-demo/pull/7#issuecomment-870144941 configurable at build time: To download the wheels only once and place them at the right location. But that would be specific to the pyolite kernel, and the jupyterlite toolchain should ideally be kernel agnostic. |
I haven't dug enough into At any rate, wheels will get special treatment in the toolchain anyway as they are the most reliable source of labextensions, as we can predict precisely where the assets will be inside them... until we have a |
One of the things I hacked to get the folium demo working was a prebuilt universal wheel for one of the packages served from a Github page: https://github.com/OpenComputingLab/vce-wheelhouse
It's probably a good idea to separate out the ideas of installing from pypi, from a JupyterLite wheelhouse, or from a local wheel. In building an http servable distribution, one approach might be:
If |
Installs with micropip should be cached by the browser same as when downloading any other Pyodide package. Also,
so the requests are being received by Fastly, which is really not that different from mirroring those files via some other CDN (including JsDelivr which will end up also using Fastly). Compared to the overall PyPi traffic I think this is still negligble, but we can certainly ask them this question. Generally if you have question or feature requests about micropip, please open an issue in the pyodide repo. |
well, I ran a test with pyolite in vscode, and also question the need to load all of those libraries for simple notebooks that don't use most of them: joyceerhl/vscode-pyolite#2 (comment) this is what's being loaded for a simple notebook with 2 imports to load json data: https://github.com/RandomFractals/vscode-data-table/blob/main/notebooks/chicago-red-light-cameras.ipynb
Why do we need to load matplotlib and other libraries pyodide has configured by default? I think they should be loaded dynamically based on imports in the notebook cell code. It's exactly what ObservableHQ and other intelligent JS notebook platforms do since dynamic modules loading is readily available in most web browsers now. Have you considered optimizing your kernel initialization with just the basics without adding all the other dataviz packages most devs and data scientists might not be using in their pyodide notebooks. |
@RandomFractals for some libraries it's a bit trickier as they require patches to be rendered properly in output cells in JupyterLite.
Yes, see jupyterlite/jupyterlite#239 as an example. Unfortunately there were some issues with the recursion limit in upstream Pyodide, so the change had to be reverted. But hopefully will come back at some point: jupyterlite/jupyterlite#254 |
ain't nothing free... there isn't a 1-1 mapping of imported names to installable python packages, and the need for patching is very real. but we certainly need some better approaches for handling these things: there's a sketch of a path forward over on jupyterlite/jupyterlite#151 (comment) and i'll get around to it when i can, but there's also a lot of other stuff on our plates.... and host environments other than real browsers are going to be best-effort for the foreseeable. |
@bollwyvl understood. I was just sharing my first impression after trying it in vscode. I will try to find some time soon to provide good data viz examples and notebooks for a deeper dive. I think porting some of these real data notebooks could be fun to validate and troubleshoot some of the hooks you are working on: https://github.com/RandomFractals/Chicago-transportation-notebooks Other than those extraneous js libs loading nitpicks, I like what you've created so far, and def. plan on using your Py lite stack in browser and vscode. |
I added vscode Pyolite example docs here: https://github.com/RandomFractals/vscode-data-table#pyolite-notebook-example |
Does the It would be really handy to have a CLI argument |
As it says in the docs,
yep, a notebook would still have to among my issues with "magic" importing (or even installing) is... where would (inevitable) errors get shown? To this end, there are a few approaches, such as jupyterlab-scenes, which move this into a clearly "ui automation" place rather than "magic". The other is just straight perceived performance... even if i have to wait around for something after pressing Run, at least i had the option of starting it (or tweaking it) sooner rather than later.
yeah, nah, that's why it's as raw as it is... there are so many ways to be wrong for different use cases vs "give me a directory of wheels," which |
I take it the current status is that you can't "inject" wheels that can be loaded with a simple import, like the pre-built pyodide wheels? Adding wheels to It's also possible to pre-install in javascript when using pyodide directly, though with that you have to do the download into the browser on page load, rather than only if imported. When using JupyterLite for teaching or documentation, it would be really helpful to be able to show the actual command, and not add extra pyodide-only or JupyterLite-only commands that will confuse readers and not work anywhere else. Also not sure why there's a piplite and a micropip. |
With jupyterlite/jupyterlite#655 in, we can now theoretically use IPython profiles to achieve setup pyoodide code in a consistent manner, based on deployer, and then user, preference. We might also be able to somehow lazily mount all of the known custom wheels into the filesystem... but they'd still have to be found/resolved with the index files to do dependencies properly. I'm pretty confident we still don't want to download and install every package before letting the user interact with the kernel.
for the specific case of the packages built and shipped by pyodide, there is a hard-coded mapping of imported names to distrubution names. this does not work for the general case.
there's a lot of things that won't work here.
micropip doesn't know how to reference non-PyPI sources of full chains of wheels with dependencies. |
I have tried to execute
Maybe it is enough to have an option to place a script into the folder
Looks like each kernel instance use isolated non-persistant file system for |
I have found more straightforward way to configure just the list of pre-installed packages and proposed it in a pull request (mentioned above). |
Thank you @vasiljevic ! I'll have a look. Note that you can also use xeus-python which allows you to pre-install packages https://xeus-python-kernel.readthedocs.io/en/latest/configuration.html |
My organization already uses xeus-python based JupyterLite websites for exercise files (example). We used to suggest MyBinder to our users for quick look into an exercise file set, but that practice had produced half of all MyBinder traffic during a few weeks last spring and for this school year we have decided to early adopt JupyterLite as "the quick look solution" for exercise files. xeus-python does not use Pyodide, but empack. I like the approach, but it is always possible to face some issue that is better handled in Pyodide, or vice versa. So, I need to be ready to use both. Since our exercise files are not primarily designed for JupyterLite environment, we can't add JupytrerLite specific code. That is why package pre-installing is so important for our use case. |
I'd be curious to know which issues are better handled by Pyodide (and vice versa).
Makes total sense 👍🏽 |
Especially in the jupyterlite setting, for the primary use case of interactive, lightweight interactive computing, we need to focus on kernels starting quickly and predictably. So I'll still come down on the side of, in lite core, preferring to pursue reducing time-to-interactive editing and crucially, user-focused error reporting rather than putting more stuff that makes the time to hand-off slower in our base kernels. Of course, I feel pre-running code is better handled on the "client" labextension side, with the equivalent of old school If this config ended up in a custom kernelspec, rather than site-wide, that might be more reasonable, as one could have multiple kernels with different names defined in a site with the same underlying implementation, but different packages... but would still want to see the code executed from the "client" side. |
For instance, in the Issue jupyterlite/jupyterlite#798 there is a statement: Specifically, (Pyodide) 0.21.1 fixed a bug that causes Safari to hang when doing almost anything. It seems to have already been updated, but no JupyterLite release has been made. Also some package I would like to use may be available for Pyodide, but not avaliable for empack , or vice versa. Just take a look at Pyodide change log page and search for "new packages:" . For instance Piodide supports shapely and geos from the version 0.21.0. I could not find thos packages on https://repo.mamba.pm/emscripten-forge . |
Indeed. Those packages should be added by emscripten-forge/recipes#131, we should probably give a final push to this PR. |
In the proposed pull request, the list of packages to be pre-installed is handled on the client labestension side, you may take a look at the code in jupyterlite/packages/pyolite-kernel-extension/src/index.ts. The package list is passed to the worker side through the worker initialization since we need the packages be installed before the end of worker initialization to avoid race conditions. |
We are interesting in using JupyterLite at our university to minimize setup for programming and data science novices. We are currently help back by that it doesn't seem possible to create a JupyterLite instance with pre-installed packages. Is it correct that students would need to (re-)install all packages each time they load JupyterLite in the browser or is there any way to setup JupyterLite with additional packages pre-installed or a one time client-side setup of a "virtual environment" (maybe using the browser cache?). My understanding from the docs is that the user still needs to install packages each time even if pyolite is shipped with additional wheels. We think it would be discouraging for students to sit and wait for installation each time they want to use JupyterLite since we use many packages (scikit-learn, altair, pandas, and similar), but maybe there are other ways to cut down this wait time? |
Yep. Every kernel launch is basically building a new linux computer in RAM from first principles. This is basically the case for all WASM applications. There are already many layers of caching, but it's not really feasible to run multiple kernels in the same WASM virtual machine, and at present, there is no particular way to snapshot a full running machine. The reason we (but mostly I) continue to push against pre-installing packages are the many exciting failure modes that can occur for every line of code that gets run before the user has control of their kernel. Once jupyterlite/jupyterlite#386 is complete, folk will much more easily be able to take matters into their own hands by forking it and building their own kernels that include whatever the heck they like at startup. So at present, the current one-liner The jupyterlite xeus-python kernel actually unpacks all of the the dependencies from conda packages at build time, and then bulk loads it, but at present has no dynamic installation capability. |
@joelostblom You can use below code snippets to build an extension for pre-run codes const PrerunCodes = [
'import piplite',
'await piplite.install("ipywidgets")',
'await piplite.install("my-own-package")'
];
const preRunPlugin: JupyterFrontEndPlugin<void> = {
id: 'jupyterlite-prerun-extension:plugin',
autoStart: true,
requires: [INotebookTracker],
optional: [],
activate: async (
app: JupyterFrontEnd,
nbTracker: INotebookTracker
) => {
nbTracker.currentChanged.connect(() => {
if (!nbTracker.currentWidget) {
return;
}
var prevSessionStatus: Status = 'unknown';
nbTracker.currentWidget.context.ready.then(() => {
var sessionContext = nbTracker.currentWidget.sessionContext;
sessionContext.statusChanged.connect((sender, args) => {
if (((args == 'restarting' && prevSessionStatus != 'restarting') || (args == 'starting' && prevSessionStatus != 'starting'))
&& (sessionContext.kernelDisplayName.toLocaleLowerCase() == 'pyolite' || sessionContext.kernelDisplayName.toLocaleLowerCase() == 'python')) {
sessionContext.ready.then(() => {
console.log('Session ready, execute prerun codes...');
const content: KernelMessage.IExecuteRequestMsg['content'] = {
code: PrerunCodes.join('\n'),
stop_on_error: true
};
sessionContext.session.kernel.requestExecute(content, false);
})
}
prevSessionStatus = args;
})
})
});
}
}; |
@bollwyvl Ooh.. does |
In the Pyodide kernel. The examples in the repo have been updated to use the magic if you want to have a look. |
Thanks for the detailed info and quick reply @bollwyvl ! It is really helpful and I will try to setup an instance that uses @qqdaiyu55 Is my understanding that this would still require the installation code to be run each time the JupyterLite is loaded on the client side? It just makes it a bit more automatic when it is in the extension? Or does this somehow allow the code to only be installed once or otherwise speed up the process? |
Yes, this code will still re-install every time you load the website, that's hard to avoid for reasons that @bollwyvl explained above. Browsers will generally cleverly cache the downloads, so they don't happen every time you (re-)load the page (you can check in your console, just make sure not to disable caching when you do). I loaded the packages you say you want to use and it really doesn't take that long to get started with them, so I think that if you can get these to pre-install or just give students the relevant |
Thank @jobovy! That's encouraging to hear, I'm going to try this out in the coming weeks and will report back if I discover anything notable that hasn't already been mentioned here. |
With pyodide-lock 0.1.0a4, there is now enough API surface to do this in a semi-sane, less-hacky fashion. The
However, as a lock (and not an index), it will not allow for packages that might have multiple versions at runtime, depending on prior imports... and i'm not entirely sure I have some related work which would benefit significantly from this, but am not yet up to where I can take advantage of it fully. |
Problem
It is currently not possible to automatically install Python packages inside the pyolite kernel.
Proposed Solution
It would be nice to be able to configure Pyolite to include some packages that would automatically be installed with
micropip
.cc. @psychemedia who suggested it here https://github.com/jtpio/jupyterlite-demo/pull/7#issuecomment-870144941
Additional context
It might be possible to have it once we use IPython for the execution (see related work https://github.com/jtpio/jupyterlite/pull/171), as it's possible to configure IPython itself to run code at the shell start.
The text was updated successfully, but these errors were encountered: