Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search in bs4_book often missing results #1431

Closed
jtbayly opened this issue May 24, 2023 · 6 comments · Fixed by #1319
Closed

Search in bs4_book often missing results #1431

jtbayly opened this issue May 24, 2023 · 6 comments · Fixed by #1319
Labels
bug an unexpected problem or unintended behavior

Comments

@jtbayly
Copy link
Contributor

jtbayly commented May 24, 2023

Until this change is committed, search is simply broken in bs4book.

Take this book as an example (it's not running the latest version of gitbook, but this hasn't changed):
https://bylaws.evangelpresbytery.com

Do a search for "budget". Search will indicate that there is only one place in the book where the word is found: Appendix 1.

In point of fact, the word "budget" is found in several places in Section 4 and Section 6.

Now change ignoreFieldNorm: true as in the patch linked above, and try your search again. Now you get all three locations as search results.

I'm sure this is biting people when they search for stuff in books and don't realize they aren't seeing all the results. But they don't know what they don't know, so people aren't reporting it.

@jtbayly
Copy link
Contributor Author

jtbayly commented May 24, 2023

For the record, this particular search failure bit me in a meeting last night. I literally trusted the search results for this word and stated that the only place where "budget" was mentioned was in Appendix 1. Yeah, I looked like an idiot. Yeah, I thought this got fixed a year ago.

@cderv cderv added the bug an unexpected problem or unintended behavior label May 25, 2023
@cderv
Copy link
Collaborator

cderv commented May 25, 2023

Sorry for the trouble @jtbayly

We did not spend much time on bookdown last year will all the development on Quarto. I understand this is a problem, especially when using the search for exact matching and all result match.

Current opiniated design of bs4_book() was to filter out the result. So not all result will be returned. Fuse.js a scoring system (https://fusejs.io/concepts/scoring-theory.html) and 0 is the best score. We keep only result that are below 0 and 0.75.

In your example your two other match have 0.83 score. So not shown. It seems Field norm as a high impact.

I don't believe #1319 the right fix for a new default. I understand it fixes your use case, but I believe it could have side effect for other books.

I'll enquire with more knowledgable fuse.js people to understand better the scoring.

However, I think best approach would be to allow customisation for bs4_book() of the fuse.js config by user who desire a specific behavior.

@cderv cderv moved this from Backlog to Next / Ready for Dev in R Markdown Team Projects May 25, 2023
@jtbayly
Copy link
Contributor Author

jtbayly commented May 25, 2023

Allowing customization would be great, but I'm struggling to see why a book using Fuse.js would not want to ignore field length.

Chapters can vary in their length dramatically. Leaving this as-is means the longer a chapter is the lower every single search result will score from that chapter. If you have a long chapter, you are always going to have trouble finding things in it, even if they are exact matches and should obviously be in the search results. It seems to me that this situation is precisely why Fuse.js offers the ability to ignore field length.

If somebody can show me an example of setting "ignoreFieldNorm: true" harming search in a book, or even explain theoretically why they don't want to ignore field length, I'd be grateful. Because if this becomes customizable, I'm going to to change it for our books, and I don't want to do it without understanding why bookdown defaults the other way. It feels like I must be missing a really big downside if people are willing to put up with exact matches of a single word not being found by "search" in a book.

@cderv cderv linked a pull request May 25, 2023 that will close this issue
@github-project-automation github-project-automation bot moved this from Next / Ready for Dev to Done in R Markdown Team Projects May 25, 2023
@cderv
Copy link
Collaborator

cderv commented May 25, 2023

I have discussed today with other, and also considering your good arguments, I believe we can make this change safely.

I merged #1319 now.

Thank a lot @jtbayly for your patience, perseverance and explanation. Sorry again for the delay on this one!

Just a reminder: the default search in bs4_book() is still not outputing all exact match because we are filtering on score. This is to avoid to much results in fuzzy searching - when searching from several terms and not just one, it makes sense.

Happy to tweak again if needed. I am still planning to allow changing configuration but that is for another time.

@jtbayly
Copy link
Contributor Author

jtbayly commented May 25, 2023

Glad to hear it. And I do think scoring results makes sense, and wherever the cutoff is it's always going to be a tradeoff.

Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants