Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support passing of index URLs to piplite (CLI, runtime configuration, and requirements files) #169

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

agriyakhetarpal
Copy link
Member

Description

Users can now specify alternate package indices to install packages with micropip through multiple interfaces: the piplite command line interface (-i/--index-url), requirements files, and site-wide configuration in jupyter-lite.json

Closes #166

Changes made

These are mostly along the lines of #166 (comment).

In the Python layer

  • Added support for -i/--index-url flags in the piplite CLI,
  • Enhanced requirements file parsing to properly handle index URL specifications, i.e., a requirements file can be passed to piplite along with an index URL (the index URL is applied to the entire list of requirements coming from the file, akin to what pip does: https://stackoverflow.com/a/2477610).

In the TypeScript/configuration layer

  • Extended the kernel schema to support pipliteInstallDefaultOptions with indexUrls configuration
  • Defined a new TypeScript interface IPipliteInstallOptions to add type safety for index URL parameters
  • Added code in the kernel extension to extract indexUrls from the jupyter-lite.json runtime configuration
  • Extended PyodideRemoteKernel's initPackageManager method to configure index URLs in the Python environment

The idea is to get the following behaviour in piplite:

  1. via the CLI
%pip install numpy -i https://pypi.myindex.com/simple
  1. via a requirements.txt file
-i https://pypi.myindex.com/simple
numpy
scipy
  1. Or, via a jupyter-lite.json
{
  "jupyter-config-data": {
    "litePluginSettings": {
      "@jupyterlite/pyodide-kernel-extension:kernel": {
        "pipliteInstallDefaultOptions": {
          "indexUrls": ["https://pypi.myindex.com/simple"]
        }
      }
    }
  }
}

Copy link
Contributor

lite-badge 👈 Try it on ReadTheDocs

@agriyakhetarpal agriyakhetarpal added the enhancement New feature or request label Feb 13, 2025
@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Feb 13, 2025

cc: @bollwyvl

I can write TypeScript to get by, but it is still very much Greek to me. Even though this is a draft, I would like to hear more from you about my approach and whether I'm going in the right direction with these changes when you have time. :) I think splitting the changes into multiple PRs might also be possible.

Also, you mentioned "support parsing of said feature as a line of -r requirements.txt" in the issue thread. Is this what you had in mind, or did I misunderstand?

@agriyakhetarpal
Copy link
Member Author

The fact that it's tricky to get a development version installed for me does not help either 😅 It took me a few tries but testing in CI is probably more convenient.

@agriyakhetarpal
Copy link
Member Author

jlpm lint:py:check and jlpm:lint:py are passing for me locally – I don't immediately see what the CI is complaining about.

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Feb 13, 2025

lite-badge 👈 Try it on ReadTheDocs

Trying to test this with the Anaconda index does not seem to work as the CORS headers have been blocked for some reason again... I'll ping at pyodide/pyodide#4898. never mind, it seems that was just an issue where I added a trailing backslash at the end of the index URL (simple/ instead of just simple). Should we guard against that, or remove the backslash at runtime? I suspect other users can come across it as well.

@agriyakhetarpal
Copy link
Member Author

A bit more testing reveals that this works well. Here are a few points about things I noticed, and some questions:

  1. If I try to install NumPy (which should give me version 2.3.0dev0 at the time of writing), it gets us version 2.0.2, and NumPy gets loaded from the defaults channel. This is because NumPy is being fetched from

    "loadPyodideOptions": {
    "packages": ["matplotlib", "micropip", "numpy", "sqlite3", "ssl"],
    "lockFileURL": "https://cdn.jsdelivr.net/pyodide/v0.27.2/full/pyodide-lock.json?from-lite-config=1"

    at the moment – it should work well outside of this PR, as it works

  2. I am successfully able to install packages listed in a requirements.txt file. This is working with both cases:
    a. When I use %pip install -r requirements.txt -i https://myindexurl.com/simple
    b. When I use %pip install -r requirements.txtand the index URL is listed in the topmost line ofrequirements.txt`

  3. It is a bit of a pain to install packages with just --index-url. For example, %pip install sympy -i "https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/" does not work, as mpmath being its dependency exists on PyPI and not with this index. I have to call %pip install mpmath before installing a nightly version of SymPy. This is going to be a bit of a bother for other packages as well, especially those that are consumers of packages hosted on limited indices such as Anaconda.org SPNW in this case. Would it make sense to also support --extra-index-url? One downside would be that it would also make the users' browsers' environments prone to dependency confusion attacks, which we would definitely not want. My use case with the functionality I'm trying to offer through this PR is that I should be able to write %pip install mypackage -i https://myindexurl.com/simple, where it would install mypackage through the passed index URL, but fetch the dependencies for mypackage from PyPI (this is possible maybe with disablePyPIFallback set to False?) or Pyodide's jsDelivr CDN (with the latter being preferred, of course) if they don't exist on the passed index URL. This is so that I won't need to write multiple pip-install magics for multiple dependency installations. In some sense, I would like --index-url to behave like --extra-index-url, which would be different from pip's behaviour, but still making sense for a platform like JupyterLite (which could be documented). Another solution for this could be to deviate from pip's behaviour of supporting just a single index URL with a requirements file, and supporting multiple URLs instead. Here is what a sample requirements.txt file with multiple URLs could look like:

    requirements.txt
    mypackage1
    mypackage2
    sympy
    mypackage3
    -r another_set_of_requirements.txt
    -i https://myindex1.com/simple # an index URL, only for the packages listed below up to another index URL if there is one
    scipy
    -i https://myindex2.com/simple # another index, only for numpy in this case
    numpy
    

    But I am not too keen on such an interface, as it is a bit of extra effort to implement and maintain. The current solution is not a big pain, it is manageable to some extent.

  4. I haven't yet tried out the jupyter-lite.json and pipliteInstallDefaultOptions pathway I added yet (which is the only reason I haven't marked this fully as ready for review yet). I shall try to do so later in the day. Feedback in the meantime is still welcome!

  5. What would be the ideal way to go about adding some tests for this? We don't want to query the indices unnecessarily as a part of a test suite in CI. In micropip, we have another test suite that is triggered with "real" (non-mocked) indices, which we run through through a manual workflow triggered by a workflow_dispatch: hook.

"description": "Default options to pass to piplite.install",
"default": {},
"properties": {
"indexUrls": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reckon this should be index_urls, and then wherever possible, the parent object handed around as a blob of JSON-compatible kwargs.

const pythonConfig = [
'import piplite.piplite',
`piplite.piplite._PIPLITE_DISABLE_PYPI = ${disablePyPIFallback ? 'True' : 'False'}`,
`piplite.piplite._PIPLITE_URLS = ${JSON.stringify(pipliteUrls)}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would then become piplite.piplite._PIPLITE_DEFAULT_INSTALL_ARGS

agriyakhetarpal and others added 3 commits February 14, 2025 20:49
Co-Authored-By: Nicholas Bollweg <bollwyvl@users.noreply.github.com>
Co-Authored-By: Nicholas Bollweg <bollwyvl@users.noreply.github.com>
@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Feb 14, 2025

The integration tests are now timing out. I'm not sure if this is due to a change I made... reverting and trying again

This is what the logs were saying from the RTD build:

Uncaught (in promise) PythonError: Traceback (most recent call last):
  File "/lib/python312.zip/_pyodide/_base.py", line 597, in eval_code_async
    await CodeRunner(
  File "/lib/python312.zip/_pyodide/_base.py", line 411, in run_async
    coroutine = eval(self.code, globals, locals)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<exec>", line 3, in <module>
NameError: name 'false' is not defined. Did you mean: 'False'?
    T https://cdn.jsdelivr.net/pyodide/v0.27.2/full/pyodide.asm.js:10
    new_error https://cdn.jsdelivr.net/pyodide/v0.27.2/full/pyodide.asm.js:10
    Xe https://cdn.jsdelivr.net/pyodide/v0.27.2/full/pyodide.asm.js:10
    callPyObjectMaybePromising https://cdn.jsdelivr.net/pyodide/v0.27.2/full/pyodide.asm.js:10
    wrapper https://cdn.jsdelivr.net/pyodide/v0.27.2/full/pyodide.asm.js:10
    onmessage https://cdn.jsdelivr.net/pyodide/v0.27.2/full/pyodide.asm.js:10
[comlink.mjs:51](webpack://jupyterlite/node_modules/comlink/dist/esm/comlink.mjs)

Edit: this was indeed due to my change. I'll take a look at it again.

@agriyakhetarpal agriyakhetarpal force-pushed the feat/index-url-for-pip-install-magics branch from e1fd8bb to e4e7a30 Compare February 14, 2025 18:35
@agriyakhetarpal
Copy link
Member Author

I seem to have broken the installation of packages from an in-line index URL inside a requirements.txt file while addressing your review, @bollwyvl. Would it make sense to split adding a --index-url flag and the rest of your prior asks in #166 into separate PRs, or should I proceed to debug within the same PR?

My idea is that if we are to support multiple index URLs and multiple requirements files, each of those files could possibly use a non-default index URL, i.e., $n$ requirements files passed to the pip-install command could have $&lt;= n$ index URLs for them (if the URLs are included within the files, that is – otherwise if an index URL is passed via the CLI, it would take priority). Handling this is becoming a bit messy, and at the same time, this is a rare scenario in real-life use cases.

@agriyakhetarpal
Copy link
Member Author

Never mind: actually, a local test to see pip's behaviour reveals that it considers only the last index URL in the requirements file, i.e., it searches for packages in the last index URL it sees. Passing --index-url in the command with multiple requirements files containing an index of their own disregards them, as only the CLI one has any effect.

This should make things a bit more simpler.

Pro tip: never combine the act of installing packages from separate indices and requirements files, and invoke a pip-install separately for each requirements file if any of them has an inline index URL. This is a mess that I hope no one has to encounter, and one we don't have to support in too convoluted a manner 🤠

@agriyakhetarpal agriyakhetarpal force-pushed the feat/index-url-for-pip-install-magics branch from 3fadb88 to d694c00 Compare February 15, 2025 02:02
@agriyakhetarpal agriyakhetarpal force-pushed the feat/index-url-for-pip-install-magics branch from d9f112e to 3fc2381 Compare February 15, 2025 02:32
@agriyakhetarpal
Copy link
Member Author

I think this PR is ready to garner some more feedback.

About the pipliteInstallDefaultOptions setting I mentioned in point 4 of #169 (comment), it isn't working yet, but I also have some second thoughts about implementing it. Having an index URL in place as a setting would constrain the user to look only through said index URL, which isn't a good solution as many dependencies might not be available. Is there a better interface in mind, or maybe I missed something?

#166 (comment) mentioned adding it as a site-level configurable, but in this case, this acts more like --index-url and not --extra-index-url (the latter, again, is susceptible to dependency confusion as I mentioned somewhere above). There is a way to add multiple indices through micropip.set_index_urls() and we can call that with piplite, but maybe micropip.install() should have an extra_index_urls argument similar to micropip.install(index_urls="<...>")?

I feel we should limit my scope with this PR, but I am happy to implement what sounds best either upstream or here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a --index-urls CLI flag for pip-install magics
2 participants