Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support higher vector dimension limit for lucene #925

Closed
ankitvij7 opened this issue Jun 3, 2023 · 8 comments · Fixed by #1346
Closed

[FEATURE] Support higher vector dimension limit for lucene #925

ankitvij7 opened this issue Jun 3, 2023 · 8 comments · Fixed by #1346
Assignees
Labels
Enhancements Increases software capabilities beyond original client specifications

Comments

@ankitvij7
Copy link

ankitvij7 commented Jun 3, 2023

Is your feature request related to a problem?
Lucene is the only engine that supports pre-filtering however it has a vector dimension limit of 1024. This kinds of limits the use of lucene with bigger models like OpenAI text embedding models. I do see that recently that support for pre-filtering was added to faiss, but not to nsmlib

What solution would you like?
The lucene dimension limit in is being actively discussed to be configurable in this Draft PR. However, Elastic with this PR overrides this limit to 2048. Can we do something similar for OpenSearch?

@jmazanec15
Copy link
Member

Hi @ankitvij7, for licensing reasons, we cannot refer to Elastic PRs, so we cannot look at the one above.

Id prefer to just maintain consistency with Lucene.

We are adding support for pre-filtering in faiss, btw: #903

@ankitvij7
Copy link
Author

ankitvij7 commented Jun 22, 2023

@jmazanec15 Thanks for getting back. Is there a plan to support pre-filtering in nsmlib?

@navneet1v
Copy link
Collaborator

@ankitvij7 we don't have a plan for that. Reason being Nmslib currently doesn't have that feature.

@vamshin
Copy link
Member

vamshin commented Aug 17, 2023

@ankitvij7 we have pre filter support in faiss in 2.9 version. Did you get chance to try that?

@jmazanec15
Copy link
Member

Closing issue - no activity

@sam-herman
Copy link
Contributor

Hey @jmazanec15 , I am also interested in this one. do you mind if I will assign it to myself and take a stab at it?

@jmazanec15 jmazanec15 reopened this Sep 20, 2023
@github-project-automation github-project-automation bot moved this from Backlog to 2.10 (September 11th, 2023) in Vector Search RoadMap Sep 20, 2023
@sam-herman
Copy link
Contributor

Per our discussion on slack, it seems that this change is probably no longer needed because Lucene recently merged moving the max dimension limit to codec:
apache/lucene#12436

This allows users to customize the config without overriding it ad-hoc via OpenSearch.
I'm ok with closing this one as it seems as a no-op until OpenSearch will migrate to the latest Lucene version with the change.

@heemin32
Copy link
Collaborator

We can keep this thread open for now. Once lucene is updated, we can incorporate the changes in knn repo to support higher dimension.

@navneet1v navneet1v moved this from 2.10 (September 11th, 2023) to Backlog (Hot) in Vector Search RoadMap Oct 5, 2023
@vamshin vamshin moved this from Backlog (Hot) to 2.12.0 in Vector Search RoadMap Oct 5, 2023
@vamshin vamshin moved this from 2.12.0 to 2.13.0 in Vector Search RoadMap Nov 17, 2023
@junqiu-lei junqiu-lei assigned junqiu-lei and unassigned sam-herman Dec 8, 2023
@junqiu-lei junqiu-lei added Enhancements Increases software capabilities beyond original client specifications and removed enhancement labels Dec 12, 2023
@github-project-automation github-project-automation bot moved this from 2.13.0 to ✅ Done in Vector Search RoadMap Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants