Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the escape() function to improve performance 10-20% #975

Merged
merged 1 commit into from
Sep 11, 2018
Merged

Refactor the escape() function to improve performance 10-20% #975

merged 1 commit into from
Sep 11, 2018

Conversation

KostyaTretyak
Copy link
Contributor

No description provided.

This was referenced Dec 21, 2017
@joshbruce joshbruce added this to the 0.5.0 - Architecture and extensibility milestone Dec 25, 2017
@Feder1co5oave
Copy link
Contributor

I know this sounds kinda silly, but can we stick to the present coding style? This almost looks like a different language.

@KostyaTretyak
Copy link
Contributor Author

KostyaTretyak commented Dec 27, 2017

Okey, I changed the style code, it provided me with VS Code through auto formatting. Also, I replaced const to var. Hope it's OK.

lib/marked.js Outdated
"'": '''
};

var escapeTestNoEncode = /(?:[<>"']|&(?!#?\w+;))/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to use grouping to wrap the whole thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I fixed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y'all are awesome! Thank you.

lib/marked.js Outdated
@@ -1084,13 +1084,33 @@ Parser.prototype.tok = function() {
* Helpers
*/

var escapeTest = /[&<>"']/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be declared inside escape() IMO

Copy link
Contributor Author

@KostyaTretyak KostyaTretyak Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, marked do not have to recreate the same RegExp instance every call escape(). This reduces performance and increases memory usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I made them static.

@Feder1co5oave
Copy link
Contributor

Actually, I've found there's no advantage in testing before replacing, you still have to scan the whole thing at least once, either by testing or replacing, so the first phase is pretty useless.

# with current changes:
$ node test --bench
marked completed in 8388ms.
marked (gfm) completed in 9380ms.
marked (pedantic) completed in 8315ms.
Could not bench robotskirt.
Could not bench showdown.
Could not bench markdown.js.

# without testing first:
$ node test --bench
marked completed in 8394ms.
marked (gfm) completed in 9286ms.
marked (pedantic) completed in 8045ms.
Could not bench robotskirt.
Could not bench showdown.
Could not bench markdown.js.

And you can spare some line of code:

function escape(html, encode) {
  if (encode) {
    return html.replace(escape.replace, function (ch) {
      return escape.replacements[ch];
    });
  } else {
    return html.replace(escape.replaceNoEncode, function (ch) {
      return escape.replacements[ch];
    });
  }
}

escape.replace = /[&<>"']/g;
escape.replaceNoEncode = /[<>"']|&(?!#?\w+;)/g;
escape.replacements = {
  '&': '&amp;',
  '<': '&lt;',
  '>': '&gt;',
  '"': '&quot;',
  "'": '&#39;'
};

@KostyaTretyak
Copy link
Contributor Author

My first escape() without regexp.test():

function escape(html, encode) {
  if (encode) {
    return html.replace(escape.escapeReplace, function (ch) { return escape.replacements[ch] });
  }
  else {
    return html.replace(escape.escapeReplaceNoEncode, function (ch) { return escape.replacements[ch] });
  }

  return html;
}

I run this code:

node test -t

Three times:

marked completed in 4146ms
marked completed in 4133ms
marked completed in 4125ms

My second escape() with regexp.test():

function escape(html, encode) {
  if (encode) {
    if (escape.escapeTest.test(html)) {
      return html.replace(escape.escapeReplace, function (ch) { return escape.replacements[ch] });
    }
  }
  else {
    if (escape.escapeTestNoEncode.test(html)) {
      return html.replace(escape.escapeReplaceNoEncode, function (ch) { return escape.replacements[ch] });
    }
  }

  return html;
}

Run three times:

marked completed in 3885ms
marked completed in 3902ms
marked completed in 3893ms

@UziTech
Copy link
Member

UziTech commented Jan 7, 2018

Looks like this would make marked faster than markdown-it in our benchmarks

marked completed in 2200ms.
marked (gfm) completed in 2225ms.
marked (pedantic) completed in 1917ms.
showdown (reuse converter) completed in 21693ms.
showdown (new converter) completed in 22802ms.
markdown-it completed in 3275ms.
markdown.js completed in 11650ms.

@KostyaTretyak
Copy link
Contributor Author

KostyaTretyak commented Jan 7, 2018

In my benchmarks, remarkable is faster and more economical than remarked-it, but both of them can compete with marked when it is necessary to parse large files - more than 2 MB.

@joshbruce
Copy link
Member

Yeah. @worker8's independent benchmark sample has remarkable at the top as well.

@KostyaTretyak: Just to make sure. They can compete with marked with large files >2mb - versus the can not?

I think if we do what in #746 - we will be able to see areas for optimization easier. Right now we kinda have the large class thing happening.

@KostyaTretyak
Copy link
Contributor Author

In markdown.js and showdown, a very noticeable regress when files get larger than 300 KB.

@joshbruce
Copy link
Member

Interesting. Of course, if they're (or we're) targeting web developers - most folks aren't going to need to go above that. Maybe marked is the "large file" parser. :)

Thinking of something like LeanPub - parse an entire book in markdown.

@KostyaTretyak
Copy link
Contributor Author

Maybe marked is the "large file" parser. :)

No, it is a favorite when files are smaller than 2MB. If the files are bigger, then remarkable and markdown-it more faster.

Not for the sake of advertising, just for you to see it clearly. Do the following:

git clone https://github.com/KostyaTretyak/marked-ts.git
cd marked-ts
npm install
npm run compile

And then you can:

npm run bench -- -l 1000

Where -l 1000 - bench 1000 KB file with markdown tests. Here two dash from the front are necessary.

@joshbruce
Copy link
Member

Thanks! That's an interesting trick...might interesting for us to add to the CLI...if I'm understanding correctly:

I can secify how large of a file. Kind of like lipsum https://lipsum.lipsum.com - generate Markdown of X size to run the bench against.

@joshbruce joshbruce removed this from the 0.5.0 - Architecture and extensibility milestone Apr 4, 2018
Copy link
Member

@styfle styfle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@styfle
Copy link
Member

styfle commented Sep 11, 2018

Is there a way to force push to invoke travis unit tests?

@UziTech
Copy link
Member

UziTech commented Sep 11, 2018

@KostyaTretyak if you can rebase this PR we should be able to merge it.

git fetch upstream && git rebase upstream/master && git push -f

@UziTech
Copy link
Member

UziTech commented Sep 11, 2018

I rebased and tested locally, and everything worked fine.

@styfle styfle changed the title Refactoring the old escape() function improved performance on 30-40% Refactor the escape() function to improve performance 30-40% Sep 11, 2018
@styfle
Copy link
Member

styfle commented Sep 11, 2018

Nice! I ran benchmarks locally and this is the before and after:

Before

$ node test --bench
marked completed in 9146ms.
marked (gfm) completed in 11382ms.
marked (pedantic) completed in 8912ms.
commonmark completed in 10832ms.
markdown-it completed in 9832ms.
markdown.js completed in 29112ms.

After

$ node test --bench
marked completed in 8033ms.
marked (gfm) completed in 9946ms.
marked (pedantic) completed in 7905ms.
commonmark completed in 10897ms.
markdown-it completed in 9473ms.
markdown.js completed in 28330ms.

@KostyaTretyak
Copy link
Contributor Author

@UziTech, done:

git fetch upstream && git rebase upstream/master && git push -f

@styfle
Copy link
Member

styfle commented Sep 11, 2018

@KostyaTretyak Thanks! Can you fix this lint error 😄

Copy link
Member

@styfle styfle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! 🎉

@styfle styfle removed the proposal label Sep 11, 2018
Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @KostyaTretyak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants