Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected behavior when computing distance with an array #46

Closed
adwasser opened this issue May 6, 2021 · 2 comments
Closed

unexpected behavior when computing distance with an array #46

adwasser opened this issue May 6, 2021 · 2 comments

Comments

@adwasser
Copy link

adwasser commented May 6, 2021

Computing distances with one of the inputs as an array (of anything) does not result in an error, but instead gives a nonsensical result.

For example,

using StringDistances
d = Levenshtein()
d("minimal working example", ["an array", " of strings"])

returns 23, which is the length of the first argument, and not the distance between the first argument and any of the entries in the second argument.

The result is identical (the length of the first argument, at least for Levenshtein) if the array is empty or contains non-string entries.

@matthieugomez
Copy link
Owner

matthieugomez commented May 6, 2021

This is expected. Distances are defined for any iterators: in your example, the first argument is an iterator of Chars while the second argument is an iterator of Strings. They have no common elements so the distance between the two is the maximum length of the arguments.

What you want is to use . broadcasting:

using StringDistances
d = Levenshtein()
d.(Ref("minimal working example"), ["an array", " of strings"])

Alternatively, I also like to use comprehensions:

using StringDistances
d = Levenshtein()
[d("minimal working example", x) for x in ["an array", " of strings"]]

I'll leave the issue open in case other people get confused by this. If so, I may throw an error when eltype of each iterator is different.

@adwasser
Copy link
Author

adwasser commented May 6, 2021

Ah, that makes sense to me. Thanks for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants