-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
search on https://doc.rust-lang.org/ not doing well #103357
Comments
cc @rust-lang/rustdoc This is a rustdoc search issue In general for the regex thing there's not much to be done since there's no regex in the stdlib (use docs.rs/regex). I suspect the unrelated ones come up due to Levenshtein distance searching (and same for pin) Basically, rustdoc search attempts to catch typos, and that's what you're experiencing. |
One problem here is that our acceptable Levenshtein distance is too high: we allow results with up to 3 edits (character deletion, insertion, or removal). For a query like regex, that means we allow deleting up to 3 characters to match an unreleated name like "ge". Reducing the max Levenshtein distance to 2 would go a long way towards improving these no-match cases. I think its exceedingly rare for something to be a real match at a distance of 3. Also, we should do a better job of explaining to the user what the scope of the search is: the standard library; or a single crate; or a group of crates. Right now on no-results we show the same output, regardless of whether we're on doc.rust-lang.org or docs.rs or something else.
|
I think the acceptable Levenshtein distance really depends on the query length. For example, a single-character query https://doc.rust-lang.org/nightly/std/?search=x returns a bunch of results that don't have an "x" in them at all. For queries of length 1, the obviously correct max distance is 0. Anything else would completely ignore what you wrote. Also, https://doc.rust-lang.org/nightly/std/?search=fn contains "as" in the list of results, which contains neither an "f" nor an "n". Similarly, if your "a" key is broken, https://doc.rust-lang.org/nightly/std/?search=std%3A%3Ahshmp%3A%3Ahshmp currently matches nothing. If your query is 17 characters long, it can probably tolerate a distance higher than 3, because there's a lot more redundancy in what you typed. |
Thanks for all the responses so far! Related to the source of the results: As a newbie definitely, but also as an advanced user I'd probably prefer a dedicated search engine that would search in both std and crates (and maybe also in the reference and the books). It might be a good idea to give preference to results from std over the crates and over the books. It would be probably also good to let the user filter the search based on the source. |
That's what the strsim::levenshtein(other_identifier, identifier) <= std::cmp::max(identifier.len(), 3) / 3 |
The heuristic is pretty close to the name resolver. Fixes rust-lang#103357
…-2023, r=GuillaumeGomez rustdoc: compute maximum Levenshtein distance based on the query Preview: https://notriddle.com/notriddle-rustdoc-demos/search-lev-distance-2023/std/index.html?search=regex The heuristic is pretty close to the name resolver, maxLevDistance = `Math.floor(queryLen / 3)`. Fixes rust-lang#103357 Fixes rust-lang#82131 Similar to rust-lang#103710, but following the suggestion in rust-lang#103710 (comment) to use `floor` instead of `ceil`, and unblocked now that rust-lang#105796 made it so that setting the max lev distance to `0` doesn't cause substring matches to be removed.
I could not figure out a better place to report the issue with https://doc.rust-lang.org/ (which might be an issue on its own, to have a link from the docs to the source code)
The problem is that the search on https://doc.rust-lang.org/ does not seem to function well:
e.g. https://doc.rust-lang.org/std/index.html?search=regex returns lots of hits, none of them related to regex.
OTOH https://doc.rust-lang.org/std/index.html?search=println works quite well.
https://doc.rust-lang.org/std/index.html?search=print starts out quite well, but then also includes things that end with
::pin
and::hint
that seem to be unrelated.The text was updated successfully, but these errors were encountered: