-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searching tasks finds non-matches #552
Comments
Searching is entirely fuzzy. We use a combination of 3-grams and 4-grams to get weighted search results. The more or longer search terms you enter the more precise the results should be. We went for this fuzzy approach to be more tolerant about typos and different forms of a word. Initially we were planning to give the 3-grams and 4-grams a weight based on their number of occurrences (n-grams which are rare would get a higher weight if they match) but we didn't see any need so far. Also that would increase implementation complexity and runtime overhead significantly. If you get too many false positives we certainly should get back to that. |
How about first listing exact matches, then fuzzy matches? |
That's certainly possible. We would have to pre-process the results in that case because we could not express this with a single SQL query like we do now. |
I think I would prefer exact matches at the top of the list, then fuzzy matches, the assumption being that I likely know what I am searching for and don't want the exact hits drowned by fuzzy matches (I have ca 30 calendars and over 1100 items in my task lists.) |
Improve free text search by: * Fixing n-gram generation when a space is added in front of a word * give 4-grams a higher weight than multiple matches of the same 3-gram by not counting duplicate n-grams (as a result the score can not be >1 anymore) * lower the min-score to 0.33 which means at least 1 out of 3 n-grams must match in order for a task to be considered a result The changes are supposed to favor longer matches over many shorter matches.
I've just made a few improvements in #556. The most important change is that it favours 4-gram matches over 3-grams matching multiple times. So it should give exact matches a higher score than multiple partial matches. |
Improve free text search by: * Fixing n-gram generation when a space is added in front of a word * give 4-grams a higher weight than multiple matches of the same 3-gram by not counting duplicate n-grams (as a result the score can not be >1 anymore) * lower the min-score to 0.33 which means at least 1 out of 3 n-grams must match in order for a task to be considered a result The changes are supposed to favor longer matches over many shorter matches.
Great, looking forward to testing it. |
@hg2581 this will be released this week in 1.1.13. Please let us know if you still see a need for further improvement with this version. Some concrete text examples would be helpful in that case. |
It seems to work well, I'll get back if I find any problems. Thank you! |
Thanks for the feedback. Good to hear it works better now. |
How are searches done? Is it a combination of exact and fuzzy matching of the search term or is there possibly a bug in the search algorithm since non-matching tasks are also found?
The text was updated successfully, but these errors were encountered: