Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better search results #1067

Closed
shimizukawa opened this issue Jan 3, 2015 · 9 comments
Closed

Better search results #1067

shimizukawa opened this issue Jan 3, 2015 · 9 comments
Labels
type:enhancement enhance or introduce a new feature type:proposal a feature suggestion

Comments

@shimizukawa
Copy link
Member

This proposal is motivated by the desire to have better search results in the Python Docs. See mail thread

While a Google search will always yield better results, I think that there is room for improvement without increasing the complexity of the sphinx codebase. However, what constitutes a good result might depend on the project using sphinx. Therefore, the proposed solution has a javascript snippet that can be inserted in searchtool.js in a similar way to the language related code (stemmer, stop words).

  1. Modify the indexer to include the words in the section titles in a separate list. Remove these words from the regular list.
    This modification increase the size of the index between 2% and 10% depending on the project docs (I tested Python, sphinx and flask). The time to generate the index does not change significantly, at least when the docs are generated from scratch. See patch 01.

2.- Modify the search tool to create single result set (instead of the current 4: regular, important, unimportantResults, objectResults). Each result has an associated score. Sort by score before presenting the results. This modification does not seem to change the search time significantly, but I will be happy if somebody could provide better stats. See patch 02.

3.- Create a pluggable scoring javascript mechanism that can be easily changed by the projects using sphinx (e.g. in the theme or in conf.py) (ToDo)

This plan seems promising, but I would appreciate some feedback before moving on.

Hernan


@shimizukawa shimizukawa added type:enhancement enhance or introduce a new feature other type:proposal a feature suggestion labels Jan 3, 2015
@shimizukawa
Copy link
Member Author

From dilettant on 2013-01-02 12:25:45+00:00

Hi Hernan,

thanks a lot for transforming the mail thread to improve the local search facility into real code suggestions.

Just one early comment w.r.t. the above linked patch 01-* (the indexer) at hunk:

:::patch
@@ -144,7 +146,8 @@
             raise SkipNode
         if node.__class__ is Text:
             self.found_words.extend(self.lang.split(node.astext()))
-
+        if node.__class__ is title:
+            self.found_title_words.extend(self.lang.split(node.astext()))

I would expect not the logic present, but either:

A) elif node.__class__ is title: ... if a node's class should be allowed to be Text xor title xor none of these, or

B) if node.__class__ is title: ... (but as part of the if node.__class__ is Text: block), if and only if class title is a subclass of Text.

Or am I totally wrong here?

All the best,
Stefan.

@shimizukawa
Copy link
Member Author

From Hernan Grecco on 2013-01-02 13:04:45+00:00

As the code is testing for class (instead of using isinstance), it is either one or the other. But I agree that using elif is better.

if node.__class__ is Text:
    self.found_words.extend(self.lang.split(node.astext()))
elif node.__class__ is title:
    self.found_title_words.extend(self.lang.split(node.astext()))

Thanks for the feedback.

@shimizukawa
Copy link
Member Author

From Hernan Grecco on 2013-01-03 03:49:07+00:00

Patches to change the scoring of the search results.

@shimizukawa
Copy link
Member Author

From Hernan Grecco on 2013-01-03 03:58:11+00:00

I have implemented the last patch, adding support for pluggable scoring mechanism.

A simple scorer for the Python Docs could look like this. The output is current, proposed, comparison.

There is a lot of room for tweaking the values but I think that the results are promising.

@shimizukawa
Copy link
Member Author

From Georg Brandl on 2013-01-03 07:45:54+00:00

Thanks for this effort, Hernan. I will have a look shortly; in the meantime, could you resubmit in the form of a pull request? There it is possible to comment on the patches inline, and easier to update the changes for revisioning.

@shimizukawa
Copy link
Member Author

From Hernan Grecco on 2013-01-03 10:55:26+00:00

Hi George, I am happy to help. I have just seen that you have merged. That was really fast!. I was preparing some reorganization of the commits into more logical parts (a few things are more clear when you finish them).

In case you are still interested, they are in the my fork of the repo.
https://bitbucket.org/hgrecco/sphinx

It is still the same code. The only difference is that the scorer I built for Python Docs is now the default.

@shimizukawa
Copy link
Member Author

From Georg Brandl on 2013-01-03 20:45:13+00:00

Hi Hernan, I've merged the first two patches and fixed a few issues while doing that. I'm currently merging patch 3, so I'll have a look at your repo for the new scorer.

@shimizukawa
Copy link
Member Author

From Hernan Grecco on 2013-01-03 22:39:40+00:00

I think that the Scorer in my branch should be the default one, as it performs much better that the previous one. In addition, using this scorer means that a commit in the cpython tree will not be necessary. Be careful that any issues that you have found (like the one fixed by c1e2c90) will also be there. Let me know if I can help.

@shimizukawa
Copy link
Member Author

From Georg Brandl on 2013-01-04 10:17:59+00:00

Closes #1067: implement pluggable search scorer and tweak scoring to give good results. Patch by Hernan Grecco.

→ <<cset 1832284>>

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type:enhancement enhance or introduce a new feature type:proposal a feature suggestion
Projects
None yet
Development

No branches or pull requests

1 participant