-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of applyMaskImageData #14766
Conversation
calixteman
commented
Apr 9, 2022
- write some uint32 instead of uint8 to avoid the check before clamping;
- unroll the loop to write data in the buffer
- but keep a loop for the last element of a line: it likely doesn't hurt that much since it's executed only for one time for each line;
- I tested on a macbook with an Apple chip, and on Firefox nightly the new code is almost 3.5x faster than before (~1.8x with Chrome).
I measured the performances improvement with the pdf in https://bugzilla.mozilla.org/show_bug.cgi?id=878397.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
src/shared/image_utils.js
Outdated
mask >>= 1; | ||
for (let i = 0; i < height; i++) { | ||
for (const max = srcPos + widthInSource; srcPos < max; srcPos++) { | ||
const elem = srcPos < src.length ? src[srcPos] : 255; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we computed this once at the start of the function, which seems more efficient overall. Hence, please re-introduce the srcLength
variable and use that here and below.
src/shared/image_utils.js
Outdated
dest[destPos] = elem & 0b10000000 ? oneMapping : zeroMapping; | ||
dest[destPos + 1] = elem & 0b1000000 ? oneMapping : zeroMapping; | ||
dest[destPos + 2] = elem & 0b100000 ? oneMapping : zeroMapping; | ||
dest[destPos + 3] = elem & 0b10000 ? oneMapping : zeroMapping; | ||
dest[destPos + 4] = elem & 0b1000 ? oneMapping : zeroMapping; | ||
dest[destPos + 5] = elem & 0b100 ? oneMapping : zeroMapping; | ||
dest[destPos + 6] = elem & 0b10 ? oneMapping : zeroMapping; | ||
dest[destPos + 7] = elem & 0b1 ? oneMapping : zeroMapping; | ||
destPos += 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Similar to this code in canvas.js
, can we use the dest[destPos++] = ...
format here as well since that (visually) aligns the code a tiny bit nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal is mainly to have some good perfs: a dest++
induces that the new value is stored in dest
which is useless.
With this code we've only one store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried the dest++ and it doesn't make any significative difference, let's go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the counter can probably live in a register while it's in the loop scope and then the new value is stored only at the end of the main loop.
- write some uint32 instead of uint8 to avoid the check before clamping; - unroll the loop to write data in the buffer - but keep a loop for the last element of a line: it likely doesn't hurt that much since it's executed only for one time for each line; - I tested on a macbook with an Apple chip, and on Firefox nightly the new code is almost 3.5x faster than before (~1.8x with Chrome).
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/b60892538e2232d/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/939e6a85ec26ae6/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/b60892538e2232d/output.txt Total script time: 24.78 mins
Image differences available at: http://54.241.84.105:8877/b60892538e2232d/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/939e6a85ec26ae6/output.txt Total script time: 26.42 mins
Image differences available at: http://54.193.163.58:8877/939e6a85ec26ae6/reftest-analyzer.html#web=eq.log |