Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16675: dense vector function queries #1750

Conversation

eliaporciani
Copy link
Contributor

@eliaporciani eliaporciani commented Jul 3, 2023

https://issues.apache.org/jira/browse/SOLR-16675

Description

Add function queries for dense vector that can be used a rerank time.

Solution

Use the latest changes in LUCENE for apache/lucene#12253 I added a parser in ValueSourceParser:

  • vectorSimilarity(vectorEncoding, similarityFunction, valueSource1, valueSource2)

vectorEncoding: FLOAT32 or BYTE
similarityFunction: COSINE, DOT_PRODUCT, EUCLIDEAN
valueSources: here it is accepted or a const vector (e.g. [1,2,3...]) or a fieldName of a DenseVectorField.

Tests

Unit tests for the vector parsing in FunctionQParser:
Integration tests:

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Contributor

@alessandrobenedetti alessandrobenedetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just wondering if the vector parsing can be done differently (re-using existent code).
I'll investigate and discuss with @eliaporciani , but in the meantime, any other review is welcome!

@alessandrobenedetti
Copy link
Contributor

Also basic documentation and changes is missing

@alessandrobenedetti alessandrobenedetti marked this pull request as ready for review July 6, 2023 14:37
@alessandrobenedetti
Copy link
Contributor

alessandrobenedetti commented Jul 6, 2023

This:
https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#dist-function
does a similar thing but I don't like the syntax much.

Not that I want to modify the old one, I believe the proposed syntax for the new similarity function is better, but wanted to list the other existing function query for coherence and completeness.

I don't think it's a big deal to have the "distance" function with a syntax and the "similarity" function with a different one for vectors.
Also the code used is completely different i.e. the distance Lucene code is not used and ad hoc code in Solr is used for the distance function query.
My suspicion is that the "dist" between vectors has been there for a while, much earlier than when dense vector fields were introduced in both Lucene and Solr, and no one really updated the code there.

@uschindler
Copy link
Contributor

When updating Lucene, please also fix the startup scripts to pass vector incubator module in command line if Java version is exactly Java 20 or 21. Otherwise Lucene prints warnings and people will not get optimal performance.

So I would really make the Lucene upgrade a separate issue/PR. All documentation should be updated, all startup scripts fixed,....

@alessandrobenedetti
Copy link
Contributor

alessandrobenedetti commented Jul 6, 2023 via email

@stillalex
Copy link
Member

@uschindler the Lucene upgrade is tracked under #1749 I am looking at what needs to be added to the startup script now

@alessandrobenedetti alessandrobenedetti force-pushed the feature/SOLR-16675_dense_vector_function_queries branch from c94893d to 04b31e4 Compare July 7, 2023 14:50
@HoustonPutman HoustonPutman changed the title Feature/solr 16675 dense vector function queries SOLR-16675: dense vector function queries Jul 7, 2023
@HoustonPutman
Copy link
Contributor

@alessandrobenedetti there are some issues with the string formatting, namely you provide args but don't use "%s". You can probably just replace with a string concatenation.

@alessandrobenedetti
Copy link
Contributor

alessandrobenedetti commented Jul 7, 2023 via email

@eliaporciani eliaporciani force-pushed the feature/SOLR-16675_dense_vector_function_queries branch from 8745fb4 to 9a40aed Compare July 8, 2023 13:44
@alessandrobenedetti alessandrobenedetti force-pushed the feature/SOLR-16675_dense_vector_function_queries branch from 9a40aed to 4c8616a Compare July 9, 2023 08:12
@alessandrobenedetti alessandrobenedetti merged commit d713595 into apache:main Jul 9, 2023
alessandrobenedetti added a commit that referenced this pull request Jul 9, 2023
---------

Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io>
alessandrobenedetti added a commit that referenced this pull request Jul 9, 2023
---------

Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants