How to strip html tags before generating searchIndex.json? #12

abhijeetvramgir · 2016-09-21T06:50:53Z

This is my lunr snippet from the build file:

.use(lunr({
        preprocess: function(content) {
        // Replace all occurrences of __title__ with the current file's title metadata.
        return content.replace(/__title__/g, this.title);
        }
 }))

How do I strip HTML tags ??

The text was updated successfully, but these errors were encountered:

janthonyeconomist · 2017-07-12T16:18:05Z

I'm doing this for: a) strip HTML b) transliteration and c) strip punctuation:

preprocess: function(content) {
          const tr = (str) => {
            const map = {"а":"a" /* truncated for diff */ };
            let new_str = "", char, substitute, n = str.length;
            for(let i = 0; i < n; i++) {
                char = str[i]; substitute = map[char]; new_str += substitute ? substitute : char;
            }
            return new_str;
          };
          return tr(
            content.replace(/<[^>]+>/g, ' ') // Strip HTML
          ) // Transliterate foreign characters
            .replace(/[^\w]/g, ' ') // Strip Punctuation
          ;
        }

That seems to remove the HTML and punctuation from the contents; however, I think some punctuation is still getting through to the index in other fields. Is that right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to strip html tags before generating searchIndex.json? #12

How to strip html tags before generating searchIndex.json? #12

abhijeetvramgir commented Sep 21, 2016

janthonyeconomist commented Jul 12, 2017

How to strip html tags before generating searchIndex.json? #12

How to strip html tags before generating searchIndex.json? #12

Comments

abhijeetvramgir commented Sep 21, 2016

janthonyeconomist commented Jul 12, 2017