Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching tasks finds non-matches #552

Closed
hg2581 opened this issue Dec 6, 2017 · 9 comments
Closed

Searching tasks finds non-matches #552

hg2581 opened this issue Dec 6, 2017 · 9 comments
Labels

Comments

@hg2581
Copy link

hg2581 commented Dec 6, 2017

How are searches done? Is it a combination of exact and fuzzy matching of the search term or is there possibly a bug in the search algorithm since non-matching tasks are also found?

@dmfs
Copy link
Owner

dmfs commented Dec 6, 2017

Searching is entirely fuzzy. We use a combination of 3-grams and 4-grams to get weighted search results. The more or longer search terms you enter the more precise the results should be.

We went for this fuzzy approach to be more tolerant about typos and different forms of a word.

Initially we were planning to give the 3-grams and 4-grams a weight based on their number of occurrences (n-grams which are rare would get a higher weight if they match) but we didn't see any need so far. Also that would increase implementation complexity and runtime overhead significantly.

If you get too many false positives we certainly should get back to that.

@hg2581
Copy link
Author

hg2581 commented Dec 6, 2017

How about first listing exact matches, then fuzzy matches?

@dmfs
Copy link
Owner

dmfs commented Dec 6, 2017

That's certainly possible. We would have to pre-process the results in that case because we could not express this with a single SQL query like we do now.
Maybe we can also improve the fuzzy search ranking like by taking consecutive n-gram matches into account.

@hg2581
Copy link
Author

hg2581 commented Dec 6, 2017

I think I would prefer exact matches at the top of the list, then fuzzy matches, the assumption being that I likely know what I am searching for and don't want the exact hits drowned by fuzzy matches (I have ca 30 calendars and over 1100 items in my task lists.)

dmfs added a commit that referenced this issue Dec 7, 2017
Improve free text search by:
 * Fixing n-gram generation when a space is added in front of a word
 * give 4-grams a higher weight than multiple matches of the same 3-gram by not counting duplicate n-grams (as a result the score can not be >1 anymore)
 * lower the min-score to 0.33 which means at least 1 out of 3 n-grams must match in order for a task to be considered a result

The changes are supposed to favor longer matches over many shorter matches.
@dmfs
Copy link
Owner

dmfs commented Dec 7, 2017

I've just made a few improvements in #556. The most important change is that it favours 4-gram matches over 3-grams matching multiple times. So it should give exact matches a higher score than multiple partial matches.

@dmfs dmfs added the provider label Dec 7, 2017
dmfs added a commit that referenced this issue Dec 7, 2017
Improve free text search by:
 * Fixing n-gram generation when a space is added in front of a word
 * give 4-grams a higher weight than multiple matches of the same 3-gram by not counting duplicate n-grams (as a result the score can not be >1 anymore)
 * lower the min-score to 0.33 which means at least 1 out of 3 n-grams must match in order for a task to be considered a result

The changes are supposed to favor longer matches over many shorter matches.
@hg2581
Copy link
Author

hg2581 commented Dec 8, 2017

Great, looking forward to testing it.

@dmfs dmfs closed this as completed Dec 12, 2017
@dmfs
Copy link
Owner

dmfs commented Dec 12, 2017

@hg2581 this will be released this week in 1.1.13. Please let us know if you still see a need for further improvement with this version. Some concrete text examples would be helpful in that case.

@hg2581
Copy link
Author

hg2581 commented Jan 4, 2018

It seems to work well, I'll get back if I find any problems. Thank you!

@dmfs
Copy link
Owner

dmfs commented Jan 4, 2018

Thanks for the feedback. Good to hear it works better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants