Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exception when using tf Painless method suggests using the deprecated classic similarity #9958

Closed
kolchfa-aws opened this issue Sep 9, 2023 · 3 comments
Assignees
Labels
bug Something isn't working Search Search query, autocomplete ...etc

Comments

@kolchfa-aws
Copy link

Describe the bug
Without the correct setup, if a user calls the tf Painless method in the script_score query, the user gets back an UnsupportedOperationException:

{
  "error": {
    "root_cause": [
      {
        "type": "unsupported_operation_exception",
        "reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test",
        "node": "sQkgvgQWRT2pfI4leTC8lg",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.opensearch.ExceptionsHelper.convertToOpenSearchException(ExceptionsHelper.java:86)",
            "org.opensearch.script.ScoreScriptUtils$TF.tf(ScoreScriptUtils.java:113)",
            "tf(params.field, params.term)",
            "                       ^---- HERE"
          ],
          "script": "tf(params.field, params.term)",
          "lang": "painless",
          "position": {
            "offset": 23,
            "start": 0,
            "end": 29
          },
          "caused_by": {
            "type": "exception",
            "reason": "java.lang.UnsupportedOperationException: requires a TFIDFSimilarity (such as ClassicSimilarity)",
            "caused_by": {
              "type": "unsupported_operation_exception",
              "reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
            }
          }
        }
      }
    ],
    "caused_by": {
      "type": "unsupported_operation_exception",
      "reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
    }
  },
  "status": 400
}

This exception instructs the user to use classic similarity. If the user then executes a query to set the similarity to classic:

PUT /test/_mapping
{
  "properties": {
    "<field_name>": {
      "type": "text",
      "similarity": "classic"
    }
  }
}

the user gets an exception saying that the classic similarity is deprecated and to use the BM25 similarity in its place.

To Reproduce
Steps to reproduce the behavior:

  1. Run a query above
  2. See the UnsupportedOperationException error

Expected behavior
Instead of classic, the exception should instruct the user to use the BM25 similarity:

unsupported_operation_exception: requires a TFIDFSimilarity (such as BM25Similarity)

Host/Environment (please complete the following information):

  • Version 2.10

Additional context
See discussion in the documentation PR opensearch-project/documentation-website#4988

@noCharger
Copy link
Contributor

noCharger commented Sep 9, 2023

"unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"

This error message is generated by the TF constructor in lucene, not OpenSearch. https://github.com/apache/lucene/blob/a7202e2e6fdf119e7f1971a47b77648fd767f4c1/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/TFValueSource.java#L57-L60

The classic similarity constructor in Lucene can be found here and OpenSearch uses it in various:

  1. Term vectors https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/termvectors/TermVectorsFilter.java#L84
  2. More like this query https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/lucene/search/MoreLikeThisQuery.java#L152
  3. BlendedTermQuery is capable of working with ClassicSimilarity and takes into account the document frequency statistic used by ClassicSimilarity. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java#L74

Discussion in lucene: apache/lucene#8430

"The [classic] similarity may not be used anymore. Please use the [BM25] "
+ "similarity or build a custom [scripted] similarity instead."

However, with OpenSearch, the construction of classic similarity using the index mapping API is disabled.
https://github.com/opensearch-project/OpenSearch/blame/main/server/src/main/java/org/opensearch/index/similarity/SimilarityService.java#L94-L99

I think we should remove tf value source support in OpenSearch. And I'd like to discuss with the community on the long-term direction of removing Classic Similarity. For example, in 2.x, we could remove createClassicSimilarity and Classic Similarity from builtIn of Similarity Service since it's no longer used in OpenSearch codebase. In 3.0 and future versions, we could research on whether fully remove classic similarity usage in other places (Term Vectors, MLT query, etc.).

cc: @nknize @msfroh

@msfroh
Copy link
Collaborator

msfroh commented Sep 11, 2023

Should we have added support for the tf function if it depends on TFIDFSimilarity and we don't allow users to use ClassicSimilarity (the only subclass of TFIDFSimilarity that ships with Lucene)? I know it was requested in #7558, but was there any consideration for how it would work in practice? Is there a valid way for users to make use of the tf function with the available similarities in OpenSearch?

@noCharger
Copy link
Contributor

Should we have added support for the tf function if it depends on TFIDFSimilarity and we don't allow users to use ClassicSimilarity (the only subclass of TFIDFSimilarity that ships with Lucene)? I know it was requested in #7558, but was there any consideration for how it would work in practice? Is there a valid way for users to make use of the tf function with the available similarities in OpenSearch?

Good point, unless OpenSearch support ClassicSimilarity users could not really use tf() function in practice. The bugfix should remove the support of tf().

@macohen macohen moved this from 🆕 New to 🏗 In progress in Search Project Board Sep 12, 2023
@macohen macohen moved this from 🏗 In progress to 👀 In review in Search Project Board Sep 12, 2023
@msfroh msfroh closed this as completed Sep 13, 2023
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Search Project Board Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
Archived in project
Development

No branches or pull requests

3 participants