Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/rrf score normalization v2 #1089

Closed
wants to merge 26 commits into from

Conversation

martin-gaievski
Copy link
Member

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Johnsonisaacn and others added 26 commits December 17, 2024 15:49
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <isaacnj@amazon.com>

Co-authored-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <isaacnj@amazon.com>

Co-authored-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Bo Zhang <bzhangam@amazon.com>
* add impl

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add UT

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename pruneType; UT

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* changelog

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* change on 2-phase

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* UT

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enhance: more detailed error message

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* refactor to prune and split

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* changelog

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix UT cov

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* address review comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enlarge score diff range

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* address comments: check lowScores non null instead of flag

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: Bo Zhang <bzhangam@amazon.com>
* Allow empty string for field in field map

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

* Allow empty string when validation

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

* Add to change log

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

* Update CHANGELOG to: Support empty string for fields in text embedding processor

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

---------

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>
…nested objects (#1040)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

* Address comments to use better method name/implementation

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

* Address comments: modify the test case to have doc with various fields

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

---------

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…es (#1043)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
)

Signed-off-by: will-hwang <sang7239@gmail.com>
* add support for builder constructor in neural query builder

Signed-off-by: will-hwang <sang7239@gmail.com>

* create custom builder class to enforce valid neural query builder instantiation

Signed-off-by: will-hwang <sang7239@gmail.com>

* refactor code to remove duplicate

Signed-off-by: will-hwang <sang7239@gmail.com>

* include new constructor in qa packages

Signed-off-by: will-hwang <sang7239@gmail.com>

* refactor code to remove unnecessary code

Signed-off-by: will-hwang <sang7239@gmail.com>

* fix bug in neural query builder instantiation

Signed-off-by: will-hwang <sang7239@gmail.com>

---------

Signed-off-by: will-hwang <sang7239@gmail.com>
* add hybrid search with rescore IT

Signed-off-by: will-hwang <sang7239@gmail.com>

* remove rescore in hybrid search IT

Signed-off-by: will-hwang <sang7239@gmail.com>

* remove previous version checks in build file

Signed-off-by: will-hwang <sang7239@gmail.com>

* removing version checks only in rolling upgrade tests

Signed-off-by: will-hwang <sang7239@gmail.com>

* remove newly added tests in restart test

Signed-off-by: will-hwang <sang7239@gmail.com>

* Revert "remove newly added tests in restart test"

This reverts commit 0987831.

Signed-off-by: will-hwang <sang7239@gmail.com>

---------

Signed-off-by: will-hwang <sang7239@gmail.com>
…t has dot in field name (#1062)

* Fix bug where document embedding fails to be generated due to document has dot in field name

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

* Address comments

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>

---------

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>
…EmbeddingProcessorIT (#1074)

Signed-off-by: Yizhe Liu <yizheliu@amazon.com>
* Add reindex integration tests

Signed-off-by: Andy Qin <“qinandy@amazon.com”>
* Fix github CI by adding eclipse dependency in formatting.gradle

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Add changelog

Signed-off-by: Varun Jain <varunudr@amazon.com>

---------

Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Bo Zhang <bzhangam@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
…874)

* initial commit of RRF

Signed-off-by: Isaac Johnson <isaacnj@amazon.com>

Co-authored-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Initial unit test implementation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Integrate explainability for hybrid query into RRF processor

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 87.33333% with 19 lines in your changes missing coverage. Please review.

Project coverage is 80.47%. Comparing base (b4cb267) to head (a661ca3).

Files with missing lines Patch % Lines
...rocessor/normalization/ScoreNormalizationUtil.java 41.17% 6 Missing and 4 partials ⚠️
...pensearch/neuralsearch/processor/RRFProcessor.java 83.78% 0 Missing and 6 partials ⚠️
...rmalization/MinMaxScoreNormalizationTechnique.java 60.00% 2 Missing ⚠️
...essor/normalization/RRFNormalizationTechnique.java 97.72% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1089      +/-   ##
============================================
+ Coverage     80.19%   80.47%   +0.28%     
- Complexity     1139     1198      +59     
============================================
  Files            87       93       +6     
  Lines          3953     4077     +124     
  Branches        666      681      +15     
============================================
+ Hits           3170     3281     +111     
- Misses          531      536       +5     
- Partials        252      260       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants