Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make faiss-cpu an optional dependency in v3.0.0 #1100

Open
dlqqq opened this issue Nov 11, 2024 · 8 comments
Open

Make faiss-cpu an optional dependency in v3.0.0 #1100

dlqqq opened this issue Nov 11, 2024 · 8 comments
Labels
dependency:faiss-cpu Issues pertaining to `faiss-cpu` enhancement New feature or request
Milestone

Comments

@dlqqq
Copy link
Member

dlqqq commented Nov 11, 2024

Problem

Listing faiss-cpu as a required dependency has caused numerous issues for users.

  • Meta does not distribute any builds of FAISS. FAISS distributions on PyPI and Conda Forge are unofficially maintained by third-party contributors.
  • On PyPI, we can only install FAISS via faiss-cpu, therefore we have faiss-cpu listed as a required dependency.
  • However, installing faiss-cpu from Conda Forge does not install a Python package named faiss-cpu, but instead a Python package named faiss. This causes pip check to fail, since faiss-cpu is not installed. Jupyter AI still works however, since both packages provide the faiss module that we import.
  • FAISS releases on PyPI and Conda Forge may lag behind the latest releases of FAISS on GitHub, as they are maintained by third-party contributors. This will limit the rate at which we can upgrade to new releases of Python, e.g. Python 3.13.

Proposed Solution

  • Make faiss-cpu an optional dependency.
  • Allow other vector databases to be used by /learn.
  • Ideally, users should be able to install support for other vector databases at runtime, without requiring a restart of the Jupyter Server.

Additional context

@dlqqq dlqqq added enhancement New feature or request dependency:faiss-cpu Issues pertaining to `faiss-cpu` labels Nov 11, 2024
@dlqqq dlqqq added this to the v3.0.0 milestone Nov 11, 2024
@asadoughi
Copy link

Hi! Official FAISS builds are available via the conda pytorch channel: https://github.com/facebookresearch/faiss/blob/main/INSTALL.md

@krassowski
Copy link
Member

Thanks for the link @asadoughi! However, asking all conda users of jupyter-ai to include pytorch channel might be a high bar. Some organizations only allow conda-forge (due to licensing concerns) and sometimes having multiple channels enabled will lead to conflicts in the binaries. I think it would be much easier to reconsider if the official distributions were mirrored in the conda-forge channel.

@krassowski
Copy link
Member

Also, I understand that the pytorch channel is deprecated anyways, right?

@krassowski
Copy link
Member

Taking liberty to mention @h-vetinari as the only listed maintainer on conda-forge feedstock for faiss (and also a frequent contributor to conda-forge/pytorch-cpu-feedstock it seems).

@asadoughi do you think that there would be a chance for the FAISS team to collaborate with @h-vetinari on getting the conda-forge feedstock back to speed given the deprecation of pytorch channel and that the pytorch team is directing users to conda-forge channel too:

We are directing users to utilize our official wheel packages from download.pytorch.org or PyPI, or switch to utilizing conda-forge (pytorch-cpu, pytorch-gpu) packages if they would like to continue to use conda.
[...]
As well, we have met with conda-forge maintainers and are looking to address any gaps that may be present in the pytorch-cpu / pytorch-gpu packages on conda-forge in order to make this move as seamless as possible for users.

@krassowski
Copy link
Member

Allow other vector databases to be used by /learn.

I'm hearing a lot of good things about https://github.com/lancedb/lancedb, I wonder how much effort it would be to allow one or the other.

@h-vetinari
Copy link

Hey 👋

Faiss has fallen off the radar a bit, because it was quite a handful to maintain (...for free and with no feedback or help from anyone...). If it turns out to be useful, it's not a big deal to bring it back up to speed. There's further optimization work possible (e.g. doing AVX2 or AVX512-enabled builds), but for now I've kept it running. Just saw that the bot didn't open a PR for 1.9, that should be easy to fix. Any help on the feedstock from interested parties is more than welcome.

I'm hearing a lot of good things about https://github.com/lancedb/lancedb, I wonder how much effort it would be to allow one or the other.

Not involved there, but it has a healthy-looking feedstock in conda-forge.

@dlqqq
Copy link
Member Author

dlqqq commented Nov 14, 2024

@h-vetinari

Faiss has fallen off the radar a bit, because it was quite a handful to maintain (...for free and with no feedback or help from anyone...).

We really appreciate the effort you've put in to making faiss available on conda-forge! It's incredible that you've maintained it in your free time. The PyPI wheels are also being maintained by an external contributor, and also have a few issues.

@h-vetinari
Copy link

Just saw that the bot didn't open a PR for 1.9, that should be easy to fix.

FWIW, faiss 1.9.0 is now available on conda-forge (there's still some open improvements around optimizing for various CPU architectures, but that's currently not a priority)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency:faiss-cpu Issues pertaining to `faiss-cpu` enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants