Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Damerau-Levenshtein #122

Merged
merged 1 commit into from
Dec 3, 2024
Merged

Conversation

Toflar
Copy link
Contributor

@Toflar Toflar commented Dec 2, 2024

This PR would enable @ausi's new implementation on the Damerau-Levenshtein algorithm.
For those not familiar with it: Compared to regular Levenshtein, transpositions are counted as the cost of 1 typo as opposed to 2 typos in regular Levenshtein. What's a transposition? Scrambled letters right next to one another: e.g. "huose" instead of "house".

It happens quite often when users type. Hence I'd consider Damerau-Levenshtein to be more suitable for a search engine.
However, the PHP version is quite a bit slower than levenshtein() which is a native PHP/C function.

So currently, the benchmark in bin is about 20ms slower (140ms instead of 120ms). However, the typos are a bit more UX friendly...so I guess it's a trade-off here at the moment.

I was fiddling around with Damerau-Levenshtein automatons which might improve this a little but I'm not sure.

@Toflar Toflar added this to the 0.9 milestone Dec 2, 2024
@Toflar Toflar added the enhancement New feature or request label Dec 2, 2024
@daun
Copy link
Contributor

daun commented Dec 2, 2024

Most users are okay waiting 20ms longer if the results make more sense to them once they come in 🤠 And if every millisecond counts, I guess the recommendation would be to migrate to Meilisearch on a high end server anyway.

@ausi
Copy link

ausi commented Dec 2, 2024

So currently, the benchmark in bin is about 20ms slower (140ms instead of 120ms).

With Toflar/state-set-index#9 the benchmark got only slightly slower (on average) from 112ms vs 116ms on my machine.

@Toflar
Copy link
Contributor Author

Toflar commented Dec 3, 2024

I'm merging this even though Toflar/state-set-index#9 is not released yet but I will make sure it is once Loupe 0.9 is published.

@Toflar Toflar merged commit e822618 into develop Dec 3, 2024
18 checks passed
@Toflar Toflar deleted the feature/damerau-levenshtein branch December 3, 2024 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants