-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML Search results aren't reader friendly #1618
Comments
From Andrea Cassioli on 2014-12-17 10:08:54+00:00 I totally agree, using the rest text as output is misleading and not nice at all! |
+1 for this feature |
The search results would be much better if one:
Is there any downside in doing this by default? |
I'm pretty sure lots of machinery in Sphinx assumes there's only ever one build happening per runtime, so doing |
Thanks for the info. Regarding the build of the txt file, a small customization of the makefile can do the trick. Still, it would be nice if one could produce more "search-friendly" .txts. What do you think? Currently, we run some custom script to remove the remaining markup. |
In case this is a relevant issue for somebody else: I wrote a - so far very basic - extension that fixes this issue and builds the search result snippets without markup. The extension should also provide a fix/workaround for issue #2369. Of course, I welcome feedback & improvement suggestions. |
There's one fairly simply way to fix this without adding additional build steps or output files. Currently, the search displays results snippets by requesting the corresponding source files from the server/local file system and extracting the text from them. It's possible to adjust this functionality so that it requests the HTML files instead of the source files. Then, it's fairly simple to extract the text from the HTML during client/browser runtime. What do you think @tk0miya ? I'll add a PR (which probably needs some refinement/discussion) later. This would make the pretty search results extension obsolete, which is a good thing in my opinion, because the messed up search results are a hard bug in the eyes of the users and it shouldn't be necessary to install an extension to fix a bug :-) |
request results as HTML instead of source files retrieve preview snippet text from HTML
request results as HTML instead of source files retrieve preview snippet text from HTML
Any updates on this issue? |
It would be useful to know the current state of this issue. It's very confusing for users, as the main sphinx docs themselves don't seem to have this issue (e.g. see http://www.sphinx-doc.org/en/master/search.html?q=sphinx&check_keywords=yes&area=default) |
@tk0miya Could we have your opinions on this? There are two alternatives to the approach I use in my PR (requesting the HTML):
Can you live with any of these options? |
I prefer to the first. Certainly, it increases build time. But it can remove markups perfectly, and also can support translation. |
Okay, then I propose the following:
|
I just remembered @timhoffm had sent such script to us at #4857. It might be a good workaround for this problem. Could you check this please? @shimizukawa What do you think about the workaround? |
Worth noting that both sphinx-doc.org and readthedocs.org seem to have fixed this problem already, so there are already solutions to this that are being used in anger. They're presumably happy with their solutions, so would understanding them help inform which route is the most sensible? |
@tstibbs afaik sphinx-doc.org is hosted by ReadTheDocs. And ReadTheDocs provides a custom search back end (using Haystack and Elasticsearch. But for the average self-hosted Sphinx project, a search back end is presumably too much work and fixing the front end-only search in one of the ways I described is necessary. @tk0miya I will take a look at #4857 asap and compare it to the tests I wrote for my sphinx-pretty-searchresults extension. |
As the original reporter of the issue, I would like that you don't forget that a search back end isn't always possible. In some cases, we need the documentation and the search to work offline, ie. the HTML directly opened in a browser on the same machine, without any web server (and also no Internet access). |
I took a look at #4857 (regexp-parsing) and compared it to #4022 (using HTML snippets). IMHO, the regexp approach requires quite some additional work. The only disadvantage of the HTML approach is that it loads significantly more data, but I personally still think it's feasible (probably better than implementing a reStructuredText parser in JavaScript). We could make this configurable (opt-out). Any other opinions on this? |
Good to see that this topic gets attention. I just did a minimum amount of work in #4857 (regexp-parsing) to get something readable. The HTML search has clearly an advantage because it operates on the target document. Can you quantify how much "significantly more data" is? Depending on how large the difference in data is and how important we consider it to be, making this configurable would be a good way. Regexp parsing is a drop-in improvement on the current plain rst search with no disadvantage. If you need even better results and are willing to take the data overhead, use HTML-search. Which one should be the default may depend on the number s of the data overhead. |
I compared the data load for a set of Sphinx documentation pages:
The differences in ratio can be explained by the following factors:
Considering this information, I suggest we use the HTML approach without any configuration to keep things simple. It's just text content; IMHO there is no need to optimize for data load. |
Thanks for digging into the numbers. Considering all aspects, I'm fine with an HTML-only approach. |
@tk0miya Do you agree? Then we could move forward with my PR. |
+1 to the HTML-only approach. I think that the approach is sufficiently useful as workaround until Sphinx core has build output function for search display. |
I resolved the conflicts in the original PR (which is close to a year old). Could someone do the review? |
+1 for HTML approach. It can support other source formats (markdown and others). |
Setting `html_copy_source` no longer affects search results
+1 for the HTML approach since a source code based solution will be almost useless as we currently get the context of a search in the translation as "source in the canonical language" - This was very disappointing. The Patch available here https://github.com/sphinx-doc/sphinx/pull/4022/files was working for us very well: |
…friendly #1618 make search results reader friendly
@TimKam Congrats! Thank you for your work! |
I am using version 1.8.3 and I am still facing this issue. I tried the code on the master branch and it is fixed. When is version 2 planned for public release? |
2.0 is planned for mid-end March (without a definite date), see: #5950. If you want the fix earlier, I recommend not to switch to master, but to instead adjust your template to include the updated version of |
The HTML built-in search is very useful, especially for offline help, but the results content isn't reader friendly.
For example, I get such content:
which isn't understandable by the average user.
As searchtools.js use files in _sources, which are a copy of the Rest sources, this is happening.
But the search only need text files I think, not real sources files.
By replacing the content of _sources with the output of sphinx to text, and when setting
, I get a better result:
It is a lot better, even if the * of bold are still visible.
It would be great if this rendering to text is automated when doing the HTML rendering.
The text was updated successfully, but these errors were encountered: