Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] phone-search analyzer: don't emit sip/tel prefix, int'l prefix, extension & unformatted input #17017

Conversation

rursprung
Copy link
Contributor

@rursprung rursprung commented Jan 13, 2025

Description

cherry-pick of #16993 with #17016 squashed into it.

CC @reta

Related Issues

n/a

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rursprung rursprung changed the base branch from main to 2.x January 13, 2025 18:29
@rursprung rursprung changed the title phone-search analyzer: don't emit sip/tel prefix, int'l prefix, extension & unformatted input - 2.x cherry-pick [Backport 2.x] phone-search analyzer: don't emit sip/tel prefix, int'l prefix, extension & unformatted input Jan 13, 2025
@rursprung
Copy link
Contributor Author

why is the DCO failing here? the sign-off is present in the commit-msg 🤔

@reta
Copy link
Collaborator

reta commented Jan 13, 2025

why is the DCO failing here? the sign-off is present in the commit-msg 🤔

Yeah, I can clearly see it, I think we could ignore it for now

…ension & unformatted input (opensearch-project#16993)

* `phone-search` analyzer: don't emit int'l prefix

this was an oversight in the initial implementation: if the tokenizer
emits the international calling prefix in the search analyzer then all
documents with the same international calling prefix will match.

e.g. when searching for `+1-555-123-4567` not only documents with this
number would match but also any other document with a `1` token (i.e.
any other number with this prefix).

thus the search functionality is currently broken for this analyzer,
making it useless.

the test coverage has now been extended to cover these and other
use-cases.

Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>

* `phone-search` analyzer: don't emit extension & unformatted input

if these tokens are emitted it meant that phone numbers with other
international dialling prefixes still matched.

e.g. searching for `+1 1234` would also match a number stored as
`+2 1234`, which was wrong.

the tokens still need to be emited for the `phone` analyzer, e.g. when
the user only enters the extension / local number it should still match,
the same is with the other ngrams: these are needed for
search-as-you-type style queries where the user input needs to match
against partial phone numbers.

Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>

* `phone-search` analyzer: don't emit sip/tel prefix

in line with the previous two commits, this is something else the search
analyzer shouldn't emit since otherwise searching for any number with
such a prefix will match _any_ document with the same prefix.

Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>

---------

Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>
@rursprung rursprung force-pushed the analysis-phonenumber-no-intl-prefix-in-search-2.x branch from 9f8c241 to cb2c36b Compare January 13, 2025 18:37
@rursprung
Copy link
Contributor Author

why is the DCO failing here? the sign-off is present in the commit-msg 🤔

Yeah, I can clearly see it, I think we could ignore it for now

i think i found it: since i cherry-picked the merge commit it had the weird noreply.github.com email address as the author so it didn't match the sign-off. since i anyway had to push again (spotless...) i fixed the author as well, so now it should be good

Copy link
Contributor

❕ Gradle check result for cb2c36b: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.79%. Comparing base (149aec9) to head (cb2c36b).
Report is 4 commits behind head on 2.x.

Additional details and impacted files
@@             Coverage Diff              @@
##                2.x   #17017      +/-   ##
============================================
- Coverage     71.91%   71.79%   -0.12%     
+ Complexity    65577    65520      -57     
============================================
  Files          5318     5318              
  Lines        305777   305779       +2     
  Branches      44581    44583       +2     
============================================
- Hits         219893   219530     -363     
- Misses        67515    67943     +428     
+ Partials      18369    18306      -63     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reta reta merged commit 5abd3e7 into opensearch-project:2.x Jan 13, 2025
38 checks passed
@rursprung rursprung deleted the analysis-phonenumber-no-intl-prefix-in-search-2.x branch January 13, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants