Expanded whitespace test case to catch more inefficiencies and updated code to optimize. #5

42shadow42 · 2022-06-26T18:35:47Z

No description provided.

index.js

wooorm · 2022-06-26T18:49:59Z

index.js

+    const regex = /[ \t]/
+    let i = value.length
+    while (regex.test(value.charAt(--i)));
+    return value.slice(0, i + 1)
  }

  return value


At this point, it seems like a bad idea to use regexes for this. At least, it’s hard to read. Can you check if something like this works?

let start = 0 let end = value.length - 1 while (start < end) { let code = value.charAt(start); if (code === 9 || code === 32) { start++ } else { break; } } while (end > start) { let code = value.charAt(start); if (code === 9 || code === 32) { end--; } else { break; } } return value.slice(start, end);

(formatted to match project)
(perhaps good to check for some lines that just include whitespace, and line that include nothing at all)

I prefer the regex approach where readability is concerned. It means people don't have to look up the char codes, and I feel it clarifies the intent over the above code.

I understand your opinion, but I disagree. Can you change it?
Regexes are generally slow. Character codes, especially with parsing projects such as all of unified, are common and searchable.

I propose the follow as the function. Please still consider it pseudocode, however, I did test it and it seems to work

/** * @param {string} value * @param {boolean} start * @param {boolean} end * @returns {string} */ function trimLine(value, start, end) { let startIndex = 0 let endIndex = value.length if (!start) { let code = value.charCodeAt(startIndex) while (code === 9 || code === 32) { startIndex++ code = value.charCodeAt(startIndex) } } if (!end) { let code = value.charCodeAt(endIndex - 1) while (code === 9 || code === 32) { endIndex-- code = value.charCodeAt(endIndex - 1) } } return endIndex > startIndex ? value.slice(startIndex, endIndex) : '' }

This:

Implements faster trimming for !start as well, meaning that regexes are no longer needed

Uses charCodeAt which is faster that charAt, and no longer needs small strings. I understand that these codes might be new to you, and hence you do not prefer them, but I consider them common enough in parsing, in the 100s of projects I am maintaining, that I strongly prefer them.

Changes value only once, without reassigning it, or even not at all for empty lines. Reassigning a parameter is slow, because JavaScript “links” arguments[0] and value together. Not slicing at all for empty lines is likely also fast in edge cases of large blank lines.

All reasons why this should be faster.

I suggest using constants for the character codes to make it more readable:

const FOO = 9 //... while (code === FOO || code === BAR) {

And I agree with not using regex, both for readability, performance, and also to avoid the risk of ReDoS.

I suggest using constants for the character codes to make it more readable:

const FOO = 9 //... while (code === FOO || code === BAR) {

And I agree with not using regex, both for readability, performance, and also to avoid the risk of ReDoS.

Just saw this, I think it resolves my concern about readability without compromising performance. I'll make the constants now.

test.js

index.js

42shadow42 · 2022-06-26T18:59:01Z

index.js

+    const regex = /[ \t]/
+    let i = value.length
+    while (regex.test(value.charAt(--i)));
+    return value.slice(0, i + 1)
  }

  return value


I prefer the regex approach where readability is concerned. It means people don't have to look up the char codes, and I feel it clarifies the intent over the above code.

test.js

index.js

42shadow42 · 2022-06-30T13:28:47Z

I'll make both these changes this weekend. I think the code sample you provided is not equivalent, but I can find something similar I'm sure.

…ss nebulous

wooorm · 2022-07-01T18:10:12Z

Thank you <3
At least a slice is missing, but yeah: perhaps some more. Let me know if I can help you!

42shadow42 · 2022-07-01T21:13:20Z

Thank you <3 At least a slice is missing, but yeah: perhaps some more. Let me know if I can help you!

I'm not sure what you mean, can you clarify and/or show me what you mean?

wooorm · 2022-07-02T12:30:04Z

My above code was pseudo code
You said some stuff was missing
I quickly looked and thought at least a slice was indeed missing
Now that I look again, there is a slice that I thought was missing. But there might be other things missing!

Let me know if you need help figuring this out

wooorm · 2022-07-02T12:42:47Z

index.js

+    const regex = /[ \t]/
+    let i = value.length
+    while (regex.test(value.charAt(--i)));
+    return value.slice(0, i + 1)
  }

  return value


I propose the follow as the function. Please still consider it pseudocode, however, I did test it and it seems to work

/** * @param {string} value * @param {boolean} start * @param {boolean} end * @returns {string} */ function trimLine(value, start, end) { let startIndex = 0 let endIndex = value.length if (!start) { let code = value.charCodeAt(startIndex) while (code === 9 || code === 32) { startIndex++ code = value.charCodeAt(startIndex) } } if (!end) { let code = value.charCodeAt(endIndex - 1) while (code === 9 || code === 32) { endIndex-- code = value.charCodeAt(endIndex - 1) } } return endIndex > startIndex ? value.slice(startIndex, endIndex) : '' }

This:

Implements faster trimming for !start as well, meaning that regexes are no longer needed

Uses charCodeAt which is faster that charAt, and no longer needs small strings. I understand that these codes might be new to you, and hence you do not prefer them, but I consider them common enough in parsing, in the 100s of projects I am maintaining, that I strongly prefer them.

Changes value only once, without reassigning it, or even not at all for empty lines. Reassigning a parameter is slow, because JavaScript “links” arguments[0] and value together. Not slicing at all for empty lines is likely also fast in edge cases of large blank lines.

All reasons why this should be faster.

test.js

wooorm · 2022-07-02T12:46:37Z

test.js

+  t.test('internalized whitespace ', (t) => {
+    const timeoutId = setTimeout(() => {
+      t.fail('did not pass in 30ms')
+    }, 30)


Why is this 30 instead of 10?
Should all be 30, and mean “fast enough”?

No, this test is slower because it's the edge case that originally caused problems. It might pass in 10ms with the fully non-regex version though I'd have to try it.

Actually on second test, it appears it does pass in 10 ms, I must have accidentally tested 1ms or something.

On a third revision it appears they all need 20-30ms, they are intermittently failing with 10. Good call.

Also, we’re not testing xms. We’re testing: fast enough. Compared to alternative ways which result in like a second or more.
Timeouts mainly, but any timing tests in general are really hard to get right.
Older Node versions or Windows for example will make everything slightly slower. Because these numbers are really small, they will leed to randomly breaking CIs.

42shadow42 · 2022-07-02T23:30:06Z

Pushed hopefully the last revision, you are right your function works. However the linters requested changes to use codePointAt instead of charCodeAt, I think the numbers remain the same for space and tab so I left the numbers. On another note, I think I finally understand how the function works fully now and if I understand correctly the confusion I had was caused by the start and end boolean values being inverted, I changed start and end so that when start is true it trims the beginning of the string of whitespace, and when end it true it trims the end of the string. I feel like this makes the functions more clear. Let me know if you disagree.

wooorm

The difference between start and end as they previously were, was that start meant at the start of the whole string, and end meant at the end of the whole string.
With your change, they mean different things: start means to trim the start of a line (which must not happen at the start of the string), and end means to trim the end of a line (which must not happen at the end of the string). Either is fine with me.

wooorm · 2022-07-03T12:55:02Z

Thanks for your continued work, released in 3.0.1!

Expanded whitespace test case to catch more inefficiences

6d1d16d

42shadow42 changed the title ~~Expanded whitespace test case to catch more inefficiences and updated "~~ Expanded whitespace test case to catch more inefficiencies and updated code to optimize. Jun 26, 2022

42shadow42 mentioned this pull request Jun 26, 2022

Add improved performance on tons of whitespace syntax-tree/mdast-util-to-hast#62

Merged

5 tasks

Added more time, the CI environment appears to be less efficient

3dc039b

wooorm reviewed Jun 26, 2022

View reviewed changes

42shadow42 commented Jun 26, 2022

View reviewed changes

Inlined regex

6ec5151

42shadow42 commented Jun 26, 2022

View reviewed changes

index.js Outdated Show resolved Hide resolved

Removed extraneous space

04db121

42shadow42 requested a review from wooorm June 29, 2022 11:10

Implemented leading trim without regex, separated test cases to be le…

f1feb12

…ss nebulous

wooorm reviewed Jul 2, 2022

View reviewed changes

Reduce use of regex, use char codes over string characters

6c17733

Use constants to clarify character codes.

66fe885

wooorm approved these changes Jul 3, 2022

View reviewed changes

wooorm merged commit ca1b0bf into wooorm:main Jul 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expanded whitespace test case to catch more inefficiencies and updated code to optimize. #5

Expanded whitespace test case to catch more inefficiencies and updated code to optimize. #5

42shadow42 commented Jun 26, 2022

wooorm Jun 26, 2022

42shadow42 Jun 26, 2022

wooorm Jun 29, 2022

wooorm Jul 2, 2022

sindresorhus Jul 2, 2022

42shadow42 Jul 2, 2022

42shadow42 Jun 26, 2022

42shadow42 commented Jun 30, 2022

wooorm commented Jul 1, 2022

42shadow42 commented Jul 1, 2022

wooorm commented Jul 2, 2022

wooorm Jul 2, 2022

wooorm Jul 2, 2022

42shadow42 Jul 2, 2022 •

edited

Loading

42shadow42 Jul 2, 2022

42shadow42 Jul 2, 2022

wooorm Jul 3, 2022

42shadow42 commented Jul 2, 2022 •

edited

Loading

wooorm left a comment

wooorm commented Jul 3, 2022

Expanded whitespace test case to catch more inefficiencies and updated code to optimize. #5

Expanded whitespace test case to catch more inefficiencies and updated code to optimize. #5

Conversation

42shadow42 commented Jun 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

42shadow42 commented Jun 30, 2022

wooorm commented Jul 1, 2022

42shadow42 commented Jul 1, 2022

wooorm commented Jul 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

42shadow42 Jul 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

42shadow42 commented Jul 2, 2022 • edited Loading

wooorm left a comment

Choose a reason for hiding this comment

wooorm commented Jul 3, 2022

42shadow42 Jul 2, 2022 •

edited

Loading

42shadow42 commented Jul 2, 2022 •

edited

Loading