Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prior to this commit, SDoc's search algorithm was implemented by [`searcher.js`][]. `searcher.js` builds a regular expression for each token in the query. For example, the query "foo bar" generates the regular expressions `/([f])([^f]*?)([o])([^o]*?)([o])([^o]*?)/i` and `/([b])([^b]*?)([a])([^a]*?)([r])([^r]*?)/i`. These regular expressions fuzzy match missing letters, but fail for any other kind of typo, such as added letters or swapped letters. They can also produce surprising results. For example, the query "ActiveRecord::Base" returns `ActiveRecord::AttributeAssignment` as the top result due to matching "activerecord" attri"b"ute"a"s"s"ignm"e"ent, and there are six(!) other results that appear before `ActiveRecord::Base`. This commit implements a new search algorithm based on character-level bigrams. For example, the query "foo bar" will look for results that match "fo", "oo", "o ", " b", "ba", and "ar". Shorthand bigrams for CamelCase names are also included in the search index. For example, entries containing "ActiveRecord" are also associated with the bigram "ar". Bigrams are weighted such that some contribute more to the match score, and results are ordered by match score. Here are some example queries and their top results with rails/rails@7c65a4b both before and after this commit: * "ActiveRecord::Base" * top result before: `ActiveRecord::AttributeAssignment` * top result after: `ActiveRecord::Base` * "ar base" * top result before: `Rails::Generators::Testing::Behavior::ClassMethods#arguments` * top result after: `ActiveRecord::Base` * "hasmany" * top result before: `ActiveRecord::Associations::ClassMethods#has_and_belongs_to_many` * top result after: `ActiveRecord::Associations::ClassMethods#has_many` * "adress" * top result before: `ActiveSupport::HashWithIndifferentAccess` * top result after: `Mail::Address` * "existance" * top result before: no results * top result after: `Pathname#existence` * "foriegn" * top result before: no results * top result after: `String#foreign_key` This commit also redesigns the presentation of search results. Prior to this commit, result names were cut off at ~43 characters, and result descriptions were cut off at ~53 characters. And result descriptions included headings, further reducing relevant visible text. For example, the visible description for `ActionCable::Connection::Base`, which has the heading "Action Cable Connection Base", was "Action Cable Connection Base For every WebSocket". Result descriptions also included code blocks which were then mangled by `Searchdoc.Panel`'s `stripHTML` function. For example, the description for `ActiveModel::API::new` was ```html <p>Initializes a new model with the given <code>params</code>. <pre><code>class Person include ActiveModel::API attr_accessor ... </code></pre> ``` which was transformed to ``` Initializes a new model with the given params. <codeclass Person include ActiveModel::API attr_accessor ... </pre ``` With this commit, search results now always display the full name. Result descriptions are also fully displayed, including non-link HTML, and are now comprised of (up to) the first 130 characters of the leading paragraph of the RDoc comment. For example, the description of `ActiveModel::API::new` becomes "Initializes a new model with the given <code>params</code>." [`searcher.js`]: https://github.com/ruby/rdoc/blob/v6.5.0/lib/rdoc/generator/template/json_index/js/searcher.js
- Loading branch information