-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cross encoder support #1615
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1615 +/- ##
============================================
+ Coverage 80.83% 80.98% +0.15%
- Complexity 4215 4246 +31
============================================
Files 404 408 +4
Lines 16977 17122 +145
Branches 1818 1835 +17
============================================
+ Hits 13723 13867 +144
+ Misses 2539 2534 -5
- Partials 715 721 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor question, but overall looks great!
common/src/main/java/org/opensearch/ml/common/dataset/TextSimilarityInputDataSet.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/opensearch/ml/common/dataset/TextSimilarityInputDataSet.java
Outdated
Show resolved
Hide resolved
Signed-off-by: HenryL27 <hmlindeman@yahoo.com>
Signed-off-by: HenryL27 <hmlindeman@yahoo.com>
Thanks for working on this. Approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with one minor question. You can answer and resolve.)
common/src/main/java/org/opensearch/ml/common/dataset/TextSimilarityInputDataSet.java
Show resolved
Hide resolved
* add text similarity inputs and function name Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity cross encoder model Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity unit tests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity input unittests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity dataset unittests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add function name annotation Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * refactor API to use single query Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * omit private from class vars Co-authored-by: Navneet Verma <vermanavneet003@gmail.com> Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * change output name from logits to similarity Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * hashify isDLModel Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add error message for non-torchscript cross encoders Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * allow onnx, actually. Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * apply spotless after rebase Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add unittest for new mlinput toXcontent clause Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * static DLModels Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add tests and error message tweaks Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * name test models w framework Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * change pt->torch_script Signed-off-by: HenryL27 <hmlindeman@yahoo.com> --------- Signed-off-by: HenryL27 <hmlindeman@yahoo.com> Co-authored-by: Navneet Verma <vermanavneet003@gmail.com> (cherry picked from commit 2761d7d)
* add text similarity inputs and function name Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity cross encoder model Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity unit tests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity input unittests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity dataset unittests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add function name annotation Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * refactor API to use single query Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * omit private from class vars Co-authored-by: Navneet Verma <vermanavneet003@gmail.com> Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * change output name from logits to similarity Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * hashify isDLModel Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add error message for non-torchscript cross encoders Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * allow onnx, actually. Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * apply spotless after rebase Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add unittest for new mlinput toXcontent clause Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * static DLModels Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add tests and error message tweaks Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * name test models w framework Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * change pt->torch_script Signed-off-by: HenryL27 <hmlindeman@yahoo.com> --------- Signed-off-by: HenryL27 <hmlindeman@yahoo.com> Co-authored-by: Navneet Verma <vermanavneet003@gmail.com> (cherry picked from commit 2761d7d) Co-authored-by: HenryL27 <hmlindeman@yahoo.com>
@HenryL27 can you please share details of meta config for
error response:
|
* add text similarity inputs and function name Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity cross encoder model Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity unit tests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity input unittests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add text similarity dataset unittests Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add function name annotation Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * refactor API to use single query Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * omit private from class vars Co-authored-by: Navneet Verma <vermanavneet003@gmail.com> Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * change output name from logits to similarity Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * hashify isDLModel Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add error message for non-torchscript cross encoders Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * allow onnx, actually. Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * apply spotless after rebase Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add unittest for new mlinput toXcontent clause Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * static DLModels Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * add tests and error message tweaks Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * name test models w framework Signed-off-by: HenryL27 <hmlindeman@yahoo.com> * change pt->torch_script Signed-off-by: HenryL27 <hmlindeman@yahoo.com> --------- Signed-off-by: HenryL27 <hmlindeman@yahoo.com> Co-authored-by: Navneet Verma <vermanavneet003@gmail.com>
Description
Adds support for (huggingface) cross encoders to ml-commons. Uses a new function name (
TEXT_SIMILARITY
) which takes as input a list of text pairs and spits out 1-dimensional tensors representing the similarity of the items in each pair. E.g.yields
This was using the model
cross-encoder/ms-marco-TinyBERT-L-2-v2
- the config I used to upload it looked likeIssues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.