Searching tasks finds non-matches #552

hg2581 · 2017-12-06T10:51:03Z

How are searches done? Is it a combination of exact and fuzzy matching of the search term or is there possibly a bug in the search algorithm since non-matching tasks are also found?

dmfs · 2017-12-06T11:11:45Z

Searching is entirely fuzzy. We use a combination of 3-grams and 4-grams to get weighted search results. The more or longer search terms you enter the more precise the results should be.

We went for this fuzzy approach to be more tolerant about typos and different forms of a word.

Initially we were planning to give the 3-grams and 4-grams a weight based on their number of occurrences (n-grams which are rare would get a higher weight if they match) but we didn't see any need so far. Also that would increase implementation complexity and runtime overhead significantly.

If you get too many false positives we certainly should get back to that.

hg2581 · 2017-12-06T11:31:36Z

How about first listing exact matches, then fuzzy matches?

dmfs · 2017-12-06T13:12:25Z

That's certainly possible. We would have to pre-process the results in that case because we could not express this with a single SQL query like we do now.
Maybe we can also improve the fuzzy search ranking like by taking consecutive n-gram matches into account.

hg2581 · 2017-12-06T19:59:31Z

I think I would prefer exact matches at the top of the list, then fuzzy matches, the assumption being that I likely know what I am searching for and don't want the exact hits drowned by fuzzy matches (I have ca 30 calendars and over 1100 items in my task lists.)

Improve free text search by: * Fixing n-gram generation when a space is added in front of a word * give 4-grams a higher weight than multiple matches of the same 3-gram by not counting duplicate n-grams (as a result the score can not be >1 anymore) * lower the min-score to 0.33 which means at least 1 out of 3 n-grams must match in order for a task to be considered a result The changes are supposed to favor longer matches over many shorter matches.

dmfs · 2017-12-07T09:01:33Z

I've just made a few improvements in #556. The most important change is that it favours 4-gram matches over 3-grams matching multiple times. So it should give exact matches a higher score than multiple partial matches.

Improve free text search by: * Fixing n-gram generation when a space is added in front of a word * give 4-grams a higher weight than multiple matches of the same 3-gram by not counting duplicate n-grams (as a result the score can not be >1 anymore) * lower the min-score to 0.33 which means at least 1 out of 3 n-grams must match in order for a task to be considered a result The changes are supposed to favor longer matches over many shorter matches.

hg2581 · 2017-12-08T10:49:39Z

Great, looking forward to testing it.

dmfs · 2017-12-12T12:10:26Z

@hg2581 this will be released this week in 1.1.13. Please let us know if you still see a need for further improvement with this version. Some concrete text examples would be helpful in that case.

hg2581 · 2018-01-04T02:26:40Z

It seems to work well, I'll get back if I find any problems. Thank you!

dmfs · 2018-01-04T08:29:33Z

Thanks for the feedback. Good to hear it works better now.

dmfs added the provider label Dec 7, 2017

dmfs closed this as completed Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searching tasks finds non-matches #552

Searching tasks finds non-matches #552

hg2581 commented Dec 6, 2017

dmfs commented Dec 6, 2017

hg2581 commented Dec 6, 2017

dmfs commented Dec 6, 2017

hg2581 commented Dec 6, 2017

dmfs commented Dec 7, 2017

hg2581 commented Dec 8, 2017

dmfs commented Dec 12, 2017

hg2581 commented Jan 4, 2018

dmfs commented Jan 4, 2018

Searching tasks finds non-matches #552

Searching tasks finds non-matches #552

Comments

hg2581 commented Dec 6, 2017

dmfs commented Dec 6, 2017

hg2581 commented Dec 6, 2017

dmfs commented Dec 6, 2017

hg2581 commented Dec 6, 2017

dmfs commented Dec 7, 2017

hg2581 commented Dec 8, 2017

dmfs commented Dec 12, 2017

hg2581 commented Jan 4, 2018

dmfs commented Jan 4, 2018