Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parseNal #53

Merged
merged 1 commit into from
Jan 22, 2024
Merged

Conversation

longnguyen2004
Copy link
Collaborator

@longnguyen2004 longnguyen2004 commented Jan 22, 2024

Profiling the library showed that parseNal is very inefficient. It was performing an O(n^2) search for the NAL start prefix code, and also doing a lot of allocations in the for loop. All of that in a very hot code path leads to poor performance.

I rewritten the function to use indexOf, and get rid of the O(n^2) loop along with the excessive allocation from the subarray() call.

Before
Statistical profiling result from isolate-0x5c006f0-186514-v8.log, (47617 ticks, 29409 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
    558    1.2%          /usr/lib/x86_64-linux-gnu/libc.so.6
     31    0.1%          /home/hp/.local/bin/node
     27    0.1%          /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
      9    0.0%          [vdso]

 [JavaScript]:
   ticks  total  nonlib   name
   4265    9.0%    9.1%  JS: *parseNal /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/media/H264NalSplitter.js:60:13
    342    0.7%    0.7%  JS: *wasm-function[48]
    132    0.3%    0.3%  JS: *wasm-function[112]
    105    0.2%    0.2%  JS: *wasm-function[62]
     78    0.2%    0.2%  JS: *sendFrame /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/client/packet/VideoPacketizerH264.js:61:14
     72    0.2%    0.2%  JS: *Qr /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/node_modules/.pnpm/[email protected]/node_modules/libsodium-wrappers/dist/modules/libsodium-wrappers.js:1:79348
     70    0.1%    0.1%  JS: *concat node:buffer:576:32
     44    0.1%    0.1%  JS: *Socket.send node:dgram:576:33
     35    0.1%    0.1%  JS: *doSend node:dgram:677:16

--- lots of lines omitted for brevity ---

[Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

   ticks parent  name
  29409   61.8%  UNKNOWN
  19959   67.9%    JS: *parseNal /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/media/H264NalSplitter.js:60:13
  18250   91.4%      JS: *_transform /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/media/H264NalSplitter.js:103:15
  17508   95.9%        JS: ^Transform._write node:internal/streams/transform:170:38
  17031   97.3%          JS: *ondata node:internal/streams/readable:783:18
  15735   92.4%            JS: *Readable.read node:internal/streams/readable:421:35
After
Statistical profiling result from isolate-0x6f926f0-187287-v8.log, (56686 ticks, 10312 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
    889    1.6%          /usr/lib/x86_64-linux-gnu/libc.so.6
     33    0.1%          /home/hp/.local/bin/node
     22    0.0%          /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30
     19    0.0%          [vdso]

 [JavaScript]:
   ticks  total  nonlib   name
    927    1.6%    1.7%  JS: *wasm-function[48]
    381    0.7%    0.7%  JS: *wasm-function[112]
    249    0.4%    0.4%  JS: *wasm-function[62]
    183    0.3%    0.3%  JS: *sendFrame /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/client/packet/VideoPacketizerH264.js:61:14
    167    0.3%    0.3%  JS: *concat node:buffer:576:32
    166    0.3%    0.3%  JS: *Qr /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/node_modules/.pnpm/[email protected]/node_modules/libsodium-wrappers/dist/modules/libsodium-wrappers.js:1:79348
    106    0.2%    0.2%  JS: *Socket.send node:dgram:576:33

--- lots of lines omitted for brevity ---

[Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

   ticks parent  name
  38846   68.5%  epoll_pwait@@GLIBC_2.6

  10312   18.2%  UNKNOWN
   1519   14.7%    JS: *doSend node:dgram:677:16
   1519  100.0%      JS: *afterDns node:dgram:662:20
   1519  100.0%        JS: *processTicksAndRejections node:internal/process/task_queues:67:35
     27    1.8%          JS: ^runNextTicks node:internal/process/task_queues:58:22
     27  100.0%            JS: *processTimers node:internal/timers:499:25
    754    7.3%    JS: *lookup node:dns:140:16
    754  100.0%      JS: *Socket.send node:dgram:576:33

cc @aiko-chan-ai

@longnguyen2004
Copy link
Collaborator Author

rbsp() also looks like a prime candidate for optimization, but it's not that big of a problem compared to parseNal()

@dank074
Copy link
Collaborator

dank074 commented Jan 22, 2024

Good work

@dank074 dank074 merged commit ab2d20b into Discord-RE:master Jan 22, 2024
@longnguyen2004 longnguyen2004 deleted the optimize-parseNal branch January 22, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants