-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc search is excruciatingly slow on very large crates #131156
Comments
We should probably do automatic benchmarking of rustdoc search similar to how its done with rustc, it's very easy to accidentally cause huge perf regressions by adding something significant to an inner loop. i've also often wondered if it would be worth it to start using wasm in rustdoc search... |
There is benchmarking for rustdoc search, but it's only run when we make changes to it (and its also mostly handled by @notriddle ). |
https://gitlab.com/notriddle/rustdoc-js-profile is where the current effort is. |
And here's where we might find some guidance for a faster name matching tactic https://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html https://fossies.org/linux/lucene/gradle/generation/moman/createLevAutomata.py |
…=GuillaumeGomez rustdoc: use a trie for name-based search Potentially rust-lang#131156 — need to try reproducing the problem with `windows` Preview and profiler results ---------------------------- Here's some quick profiling in Firefox done on the rust compiler docs: - Before: https://share.firefox.dev/3UPm3M8 - After: https://share.firefox.dev/40LXvYb Here's the results for the node.js profiler: - https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly. - https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html [the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/ The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary. The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie. [Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf [Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java [moman]: https://gitlab.com/notriddle/moman-rustdoc User-visible changes -------------------- I don't expect anybody to notice anything, but it does cause two changes: - Substring matches, in the middle of a name, only apply if there's three or more characters in the search query. - Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them. - It uses more RAM. - It's faster (assuming you don't swap thrash). [^1]: technically utf-16 code units
Rollup merge of rust-lang#133005 - notriddle:notriddle/trie-search, r=GuillaumeGomez rustdoc: use a trie for name-based search Potentially rust-lang#131156 — need to try reproducing the problem with `windows` Preview and profiler results ---------------------------- Here's some quick profiling in Firefox done on the rust compiler docs: - Before: https://share.firefox.dev/3UPm3M8 - After: https://share.firefox.dev/40LXvYb Here's the results for the node.js profiler: - https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly. - https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html [the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/ The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary. The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie. [Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf [Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java [moman]: https://gitlab.com/notriddle/moman-rustdoc User-visible changes -------------------- I don't expect anybody to notice anything, but it does cause two changes: - Substring matches, in the middle of a name, only apply if there's three or more characters in the search query. - Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them. - It uses more RAM. - It's faster (assuming you don't swap thrash). [^1]: technically utf-16 code units
Steps to reproduce:
Expected behavior:
rustdoc quickly shows matching results while typing and when done typing.
Actual behavior:
Web page slows down to a crawl after every delayed keystroke. This is not just due to the search index download, which takes me 1.4 s and can probably not be improved much, but also due to matching the search query against the search index once the download is finished.
Here is Firefox Profiler output: https://share.firefox.dev/4eOgrcE. Of interest are the yellow areas in the graph (the salmon ones are mostly idle GC). Some observations:
convertNameToId
, which is documented to be used only for the In Parameters and In Return Types tabs. If the user is not going to click on those tabs then perhaps these calls are not necessary.The text was updated successfully, but these errors were encountered: