-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] OpenSearch Remote Ranker Plugin (semantic) #4
Comments
I'd like to help with this plug-in. Is there a small project that I can start to get acquainted with the code-base? |
Hi, thank you for your interest. We are currently working on a fork https://github.com/kevinawskendra/search-relevance but will eventually merge into this repo. |
This is an interesting idea. Thanks for contributing @kevinawskendra ! It looks like the external passage ranking service is called for each passage within each top-doc. In your testing what is a realistic number of documents to rerank this way based upon the added latency from the callouts? |
Thank you Peter. We haven't run any latency tests yet, but we are targeting top 3 passages for up to 500 documents. |
@kevinawskendra Looks like that metarank can be a good candidate for a remote ranker implementation according to this spec (disclaimer: I'm the maintainer). I understand that the RFC and the implementation are on an early stage yet, but I already have a couple of questions regarding these:
|
@kevinawskendra Have the open questions been resolved? I'd say this should be configured at the query level because we do want to compare results for multiple search configurations down the road. |
Thanks Mark. Yes, we are going to support both index and query level configurations. |
@macohen we're reviewing the code and will post our questions,, thx! |
@vgoloviznin, changes were just merged in to make the plugin more generic with clearer APIs. If you haven't taken a look recently, it may be good to revisit and provide feedback or PRs for what you might want to see to integrate. Thanks! |
An implementation of this (for AWS Kendra Ranking) has already been built in this repo, and "released" with a 2.4.0 tag. (It's not included in the OpenSearch distribution, but we have instructions to install the plugin standalone.) We have #36 to address the need to add more ranker implementations, and we have a forthcoming RFC to try to nail down a generic request/response processor API, kind of like ingest processor pipelines. |
What is semantic search?
Semantic search is a data searching technique that aims to not only find keywords in documents, but to determine the intent and contextual meaning of the words a person is using for search. Essentially, semantic search is search with meaning and can provide higher quality search results.
What is the OpenSearch Semantic Ranker?
The OpenSearch Semantic Ranker is a plugin that will re-rank search results at search time by calling an external service with semantic search capabilities for improved accuracy and relevance. This plugin will make it easier for OpenSearch users to quickly and easily connect with a service of their choice to improve search results in their applications.
How the plugin will work?
The plugin will modify the OpenSearch query flow and do the following:
*N will be based on requirements of the external service and customizable by the user.
How users will use the plugin?
We are considering two options for using the plugin. The first option is having the plugin be configured at the OpenSearch index level, meaning users will be able to enable/disable semantic re-ranking for each index. After the Semantic Ranker plugin is enabled on a index, all queries to that index will go through the plugin and have their results re-ranked. There will be no change to the query syntax in this option.
The second option is having plugin being configured at the query level, meaning users can enable/disable semantic re-ranking per query. This option will allow for more flexibility as users will be able to selectively choose which queries to apply semantic re-ranking intelligence to, but will require updating the query syntax.
Example usages for both options will be provided below.
What configuration will the plugin have?
Field Configuration
Since data in a user’s OpenSearch index is mostly unstructured, the plugin will need to know which fields in the user’s OpenSearch documents map to specific fields of a “document”. Here is a breakdown of the fields that the plugin will use:
The following are not as important and also optional, but may improve relevance of the results if the external service supports them and the user has right inputs for them. These fields may or may be supported on the first version of the plugin.
In the plugin configuration, the user will provide OpenSearch field names to map to these fields.
Here is an example: let’s say a user has the following document structure in their OpenSearch index:
In this example, the user may want to configure [“article_content”] as the body field and [“article_title”] as the title field in the plugin.
As mentioned above, the configurations for body and title will be lists of OpenSearch field names in order of importance. The reason for this is because there may be use cases in which documents have multiple body/title fields and/or use cases in which documents in the same index have different body/title fields.
Using the same example as above, let’s say there is another field called article_content2. Then, the user may want to configure [“article_content”, “article_content2”] as the body fields.
External Service Configuration
The plugin will require also configuration to connect with the external service.
Non-sensitive inputs such as endpoint and retry count will be provided in opensearch.yml config file. For example:
Credentials to connect to the service will be stored in the OpenSearch keystore. Users will be able to provide the username/password or the access/secret keys for the service.
How will the plugin modify the query response?
The plugin will re-score and re-rank the query results from OpenSearch, but there should also be a way for users to compare results before/after applying the plugin.
Users can execute queries with/without the plugin enabled themselves and compare the results. If the plugin is configured at the index level, the user can enable/disable the plugin in the index settings and test queries. If the plugin is configured at the query level, the user can choose to enable the plugin by providing the necessary config in the query syntax.
Another option is to provide both original “un-re-ranked” results and re-ranked results in the query response. The advantage of this is that user can compare the results more easily without executing two separate queries, but this will increase the size of the response payload. In this option, re-ranked results will go under “hits” and the original results will go under a new field in the response. The reason for this is to allow for quick and easy usage of the plugin without forcing users to make application code changes to point to a new field in the response.
Example Usage
Note: the following are examples. Actual endpoints/syntax may change on the release of the plugin.
Option 1 (Index level configuration):
Option 2 (Query level configuration):
Open Questions
The text was updated successfully, but these errors were encountered: