-
-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generator for a search index #1853
Add generator for a search index #1853
Conversation
The generator is working so far but the implemenation isn't finished yet (see TODO). Is there anything ground-breaking to complain about the generator? QUESTIONS:
WISHLIST:
/cc @rdwatters @bep @spf13 |
@digitalcraftsman This is pretty fantastic. Thanks for working on what I see as a really powerful feature (and one I have kind of been nagging about). Question for you: rather than just the ability to create an index that can be built/not built with a flag, how difficult it would it be to just extend Hugo's abilities to write to any .json file using the same templating logic? Would a feature like this (Jekyll has the ability to write JSON files, which comes in pretty handy) slow down builds to the point of not being worth it? Do you think writing to any-file.json rather than site-index.json would provide the most flexibility (ie, w/r/t using ajax, etc), or is the primary objective to allow for client-side search a la something like Tipue or lunr.js? Also, sorry for the delayed response to your questions (in the order you presented them above):
Again, thanks again, brother. I think HUGO is easily the best SSG around. Cheers! |
I think that would be a powerful addition to the current set of template functions. I'm not sure how much it would slow down the generation of pages. A possible implementation could be the use of a global object shared object, similar to Scratch, but with a few more parameters (filename, destination). Inside a template a coder could specify, if necessary with if-else statements, what should be included in which JSON-file.
This is a good question. Why don't take the best of both worlds. Just setting a config variable to true is the most user-friendly way, in my opinion. Your approach would allow much more flexibility, but the user/theme creator maybe needs to include logic in many different places. Imagine a user has different layouts for different content types. He would need to add the logic in each template file of a content type. Shortcodes would be a handy way to avoid redundant code. But let's wait what the others think about this.
This project has grown a lot in it's rather short lifetime 😄. I'm curious what we will see in the v1.0 release. |
@digitalcraftsman Good points all round. I guess that I am ultimately bringing up two separate feature requests, and you are absolutely right that it could be the best of both worlds. As far as the ability to write to json files in general, you're right that this would have to be a separate process in that forcing devs to write templating to account for all content/section areas in a single site-index.json would be more than a little tedious. I like where you are going with the todo for Oh, and thanks again:smiley: Oh, and @bep I just drastically edited this comment after you already replied to it. Sorry about that. |
There is an open issue somewhere about rendering custom content-types, like JSON, ical ... whatever. We should do that. |
If you, @bep, agree with @rdwatters and me we should consider this as two different issues. See #1128 (for ical, xcal). But there's no issue about writing content to a JSON file. Should I create a new issue? |
These are two different issues. The PR is effectively a sitemap in JSON which will enable lots of nice integrations. A second issue is for Hugo to support rendering into variable and multiple multiple different formats. |
Title string `json:"title"` | ||
Content string `json:"content"` | ||
Permalink string `json:"permalink"` | ||
Tags interface{} `json:"tags"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't assume these taxonomies are being used. I think this is a very limiting approach.
@digitalcraftsman Spitballing on this, but is there utility in implementing a stopwords list when creating the index? Here's a decent default list. If the intention is client-side search, it looks like it's the same stopwords used by Tipue and similar to the stopword filter for lunr.js. That said, if search results were designed to surface, say, the "description" key in front matter, SERPs would look weird if every definite and indefinite article were omitted from the page. Then again, maybe eliminating stopwords from the index before it's sent could make filesize smaller and potentially reduce demand on the client. It goes without saying that internationalization efforts being worked on outside this thread would have a different list. |
If you want to remove stopwords in Go, check out https://github.com/bbalet/stopwords. It's multi-lingual and has already been discussed on the forums for adding a related posts feature (https://github.com/bbalet/gorelated). |
The use of stopwords is within the role of the tokeniser/indekser. This PR is badly named, as it doesn't create a search index, it exports the content in a format suitable for indexing. |
After revisiting some disucssion here and in the forum I agree. As this PR is currently a WIP, it should just output a json file that is intended for searching the content (with lunr.js or similar tools). Using a stopword filter would consequently be the next step for optimizations. As I discussed with @rdwatters before, we should create a seperate |
Since we have a However, I saw that @bep needed to create a new content file just set the url properly. Wouldn't it be better to add a {{ $contentList | jsonify | saveas "/index.json" }} The path would be relative to While keeping an eye on the localization support it would be very easy to create a content index for just a single language. Depending on the current locale scripts like lunr.js could fetch the content index for the current locale and is it as a index. It doesn't make sense include spanish content in the results for a chinese user. But the setup is completely flexibly due to the filter options. /cc @bep @moorereason |
Yes, the extra content file is not good, we need better support for custom file types (json, ical etc.), but the answer isn't |
I would call it from inside a template, like in the example above:
The function itself would have a signature like func saveas(path string, data interface{}) error {} |
49b4f8e
to
93e41a1
Compare
I revisited this issue and implemented the feature with a template as @spf13 suggested. That gives users the miximal flexibility. Kudos to @bep for implementing the I would appreciate a review. According to the contribution guidelines the commit message should mentioned the modified package as prefix. Since I modified multiple packages which should I use? @rdwatters you asked for an option to exclude certain pages. Have a look at the docs 😉 Furthermore, @rdwatters and @moorereason suggested the usage of a stop word filter? Should this be realized with a template function (in a seperate pull request)? Last but not least I would like to keep an eye on the localization support (#1744). Having search results in multiple languages doesn't make sense in my opinion. Should we offer an option to generate a content index per locale? |
First, my handle is Second, for the commit message prefixes, you want to use the primary affected package (I use that phrase in my updated but yet-to-be-merged contributing guide). In this case, commit bb688f7 would use In your subsequent commits, I'd use I get the feeling I'm going to need update the contributing guide to give a fuller explanation and rationale for the subject prefix. |
I'm sorry for misspelling your handle. The commit messages have been updates with their corresponding package as prefix. However, at first I just wasn't sure if the commits should be squashed or not. |
I implemented the search feature in the material-docs theme and it works like a charme. But the usability of the default template could be improved. Currently, we are only linking the pages who match the search query. It would be much better if we also could link the headers of section that contains (parts) of the query. MkDocs uses the headers as dividers for the content and adds each of them as new search result. |
@@ -784,6 +791,8 @@ func (s *Site) initializeSiteInfo() { | |||
GoogleAnalytics: viper.GetString("GoogleAnalytics"), | |||
RSSLink: s.permalinkStr(viper.GetString("RSSUri")), | |||
BuildDrafts: viper.GetBool("BuildDrafts"), | |||
DisableSearchJSON: viper.GetBool("DisableSearchJSON"), | |||
SearchIndexLink: viper.GetString("baseURL") + viper.GetString("searchuri"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bep Is there any helper functions that can prepend the baseurl for the SearchIndexLink
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like s.permalinkStr()
is used above for the RSSLink
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I've done before. I printed the URL in a template and got http://localhost:1313/search/index.json/
instead of http://localhost:1313/search.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bep do you know if this behavior is intended or how it can be avoided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to helper, see what is used by absURL template func.
As to the discussion of stop-words:
This this PR should be about geting the data in a parseable format, aka JSON. |
Is this going to make it into 0.17? |
I'm closing this pull request in favor of #2828. Custom output types would be much more flexible. Users could create content in a format they want by using templates and by specifying the output type (e.g. JSON). My approach would be to specific and de facto deprecated once you can achieve the same with custom output types. Nonetheless, the long discussion about this topic highlighted some points that should be considered in the future when someone creates a search template |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
The generator creates an index of all content
files and it's metadata.
See #1635 #144