Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stop words to Meili search index #1307

Closed
wants to merge 1 commit into from

Conversation

LukasKalbertodt
Copy link
Member

Filtering out very common words that carry basically no information improves indexing performance, shrinks the index and most importantly: helps with the problem that searching for common words results in tons of matches in subtitles and such. This doesn't completely solve the latter problem though. And using stop words also makes things worse unfortunately: especially in phrase search, the highlighting is broken and might confuse users. Phrase search still kind of works but from reading the docs, I think with stop search "the" and "a", searching for "foo the bar" will also find documents with the text "foo a bar".

So it's not really clear yet whether we want that at all. Maybe Meili needs to improve first. Or we never send the stop words to Meili and only use them to filter some stuff manually?

Filtering out very common words that carry basically no information
improves indexing performance, shrinks the index and most importantly:
helps with the problem that searching for common words results in tons
of matches in subtitles and such. This doesn't completely solve the
latter problem though. And using stop words also makes things worse
unfortunately: especially in phrase search, the highlighting is broken
and might confuse users. Phrase search still kind of works but from
reading the docs, I think with stop search "the" and "a", searching for
"foo the bar" will also find documents with the text "foo a bar".

So it's not really clear yet whether we want that at all. Maybe Meili
needs to improve first. Or we never send the stop words to Meili and
only use them to filter some stuff manually?
@github-actions github-actions bot temporarily deployed to test-deployment-pr1307 December 18, 2024 14:58 Destroyed
@github-actions github-actions bot added the status:conflicts This PR has conflicts that need to be resolved label Jan 23, 2025
Copy link

This pull request has conflicts ☹
Please resolve those so we can review the pull request.
Thanks.

@LukasKalbertodt
Copy link
Member Author

Replaced by #1319

@LukasKalbertodt LukasKalbertodt deleted the stop-words branch January 23, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:conflicts This PR has conflicts that need to be resolved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant