-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlight stop words if they appear in the query #132
Conversation
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
@Toflar What are your thoughts on this one? It only becomes an issue when using stop words, but it's quite noticeable on certain queries. |
Oh sorry, I missed this totally! Nice work! This makes total sense to me, I just wonder how MeiliSearch handles this. Did you research maybe and want to put the notes here for future reference? |
@Toflar Meilisearch also highlights stopwords. They have the below note in their docs. I don't think it makes sense to highlight all stop words as you'll have the highlights littered with
|
Thanks a lot for yet another awesome contribution! Thinking about providing some default stop word list. Something like we have for
just thinking out loud. |
@Toflar A default stop word list sounds great for efficiency. Probably as an opt-in setting, though — you can easily get into trouble with multi-language setups. I can imagine somebody indexing french documents and not finding documents about tea ( |
Which is why I think stop words lists should be language dependent ;) |
Slight adjustment to how the highlighter handles stop words.
Currently, stop words are never highlighted. This change will highlight stop words if they are part of the query and they occur next to other words. The goal is to improve the match between what people searched for and what gets highlighted. I found this to be most useful in the movie example, where there is lots of
The
andAn
.Examples
Prior art
Meilisearch also highlights stop words with the following note in their docs: