Writing JS strings to resources, Deno.encode and Deno.encodeInto #15895

billywhizz · 2022-09-13T19:25:51Z

Issue

I was doing some investigation into network perf using Deno FFI and stdio perf in general and came up against a performance roadblock which is relevant to anywhere we are writing JS strings to ops resources using send or write syscalls and the like.

If you look at the benchmark script here you can see the following scenarios benchmarked. All scenarios write to /dev/null using FFI fast call.

Benchmarks

static_buffer: writing a pre-existing buffer with the string pre-encoded - nothing should be faster than this afaik
encode_dynamic_string: writing a dynamically generated string encoded into a new Uint8Array using Deno.core.encode
encode_static_string: writing a static string encoded into a new Uint8Array using Deno.core.encode
encode_into_dynamic_string: writing a dynamically generated string encoded into an existing Uint8Array using Deno.core.ops.op_encoding_encode_into
encode_into_static_string: writing a static string encoded into an existing Uint8Array using Deno.core.ops.op_encoding_encode_into

Results

Flamegraphs

Flamegraphs to compare the three static string scenarios are here which show clearly the overhead involved in both Deno.core.encode and Deno.core.ops.op_encoding_encode_into compared to the baseline of writing a pre-encoded buffer and using Deno.writeSync, which seems to be fastest option currently exposed to userland.

const tests = {
  static_buffer_writesync: () => Deno.writeSync(rid, payload),
  static_buffer_ffi: () => fs.write(fd, payload, payload.length),
  encode_dynamic_string: () => {
    const payload = encode(`${TEXT}${byteLength(str)}${CRLF2}${str}`)
    fs.write(fd, payload, payload.length)
  },
  encode_static_string: () => {
    const payload = encode(resp)
    fs.write(fd, payload, payload.length)
  },
  encode_into_dynamic_string: () => {
    const { written } = ops.op_encoding_encode_into(`${TEXT}${byteLength(str)}${CRLF2}${str}`, buf)
    fs.write(fd, buf, written)
  },
  encode_into_static_string: () => {
    const { written } = ops.op_encoding_encode_into(resp, buf)
    fs.write(fd, buf, written)
  }
}

static_buffer_writesync

static_buffer_ffi

encode_static_string

encode_into_static_string

Discussion

From the benchmarks we can see:

calling Deno.core.encode on the static string in question causes a ~58% drop in throughput and takes 2.3x longer
calling Deno.core.ops.op_encoding_encode_into causes a ~64% drop in throughput and takes 3.27x longer
JS String concatentation adds ~15% overhead

From the flamegraphs we can see:

with a static buffer, the program spends 88% of time in __libc_write
with Deno.core.encode the program spends 35% in __libc_write
with Deno.core.ops.op_encoding_encode_into the program spends 28% in __libc_write

There are a few of issues I think this raises:

Deno.core.encode is expensive because it does a lot of work - it allocates a BackingStore, creates an ArrayBuffer and wraps it in a Uint8Array and returns that to JS. This is per spec so not much we can do about it other than to avoid using it where we don't need to.
Deno.core.ops.op_encoding_encode_into should be faster than Deno.core.encode as it is working with a pre-existing buffer and only needs to do a memcpy from the string into the buffer. The overhead here is likely down to it being a native op and being wrapped in serde_v8.
We need a method for writing a JS string directly to an ops resource rid. I came across the same issue with just-js and had to do something like this. This is significantly quicker than first encoding into a buffer and then writing the buffer as far as i can remember.

  String::Utf8Value str(isolate, args[1]);
  args.GetReturnValue().Set(Integer::New(isolate, write(fd, 
    *str, str.length())));

Planned Actions

add an [op-v8] api call Deno.core.encode_into (u8, String) into ops_builtin_v8.rs which does encode_into without serde_v8.
add a Deno.core.write_utf8 (rid, string) api call to ops_builtin_v8.rs which will take a copy of the utf8 bytes inside the v8 string passed to it and write them to the resource identified by rid, as in the C++ snippet above.

The disadvantage of the second approach is, afaik, it will not be optimised as a fast call because fast call api has no string support yet. But it would be worth knowing if it is faster than the other methods above even without the fast call so I propose doing it as a draft PR and see if it improves perf. It should be pretty easy to implement.

The text was updated successfully, but these errors were encountered:

aapoalas · 2022-09-13T21:30:34Z

Don't you worry: Fast API can actually work with strings. If you give the parameter type as kValue then it'll accept any type (presumably) and gets passed as the V8 wrapper Value.

billywhizz · 2022-09-13T22:13:29Z

nice. has that landed already @aapoalas? i'll give it a go if so.

aapoalas · 2022-09-14T04:27:51Z

Yeah, it's already supported by Fast API internally. I can't quite remember if the support for that in ops was already added.

You'll need to use the options bag parameter as well to fall back to the slow path when the expected string parameter turns out to be not-a-string.

littledivy · 2022-09-14T13:54:28Z

For the record, here's the link to the conversation on V8 supporting strings in Fast API calls.
https://discord.com/channels/684898665143206084/778060818046386176/1008754326933618728

billywhizz · 2022-09-15T20:33:32Z

i've updated the OP with a new test using Deno.writeSync, which is fastest "official" way I can find to write to a resource. Also uploaded nicer flamegraphs with full Deno/Rust stacktraces.

littledivy · 2022-09-16T05:03:48Z

After applying the encodeInto optimization in #15922

littledivy · 2022-09-16T05:04:19Z

The Builtins_LoadIC is concerning. I think its related to webidl?

billywhizz · 2022-09-17T02:10:39Z

I've implemented a little minimal runtime so i can test these low level ops changes without building all of deno. it only takes ~20 seconds to do a rebuild with this.

I implemented @littledivy's change from this PR for encode_into as well as a similar change with the same logic current Deno uses.

Divy's PR is ~3x the current implementation in throughput on a small http response payload. I will benchmark again over weekend using this in the benchmarks i did for text writing above.

Here are the flamegraphs

current encode_into

my change based on divy's PR

divy's PR change

billywhizz · 2022-09-17T02:24:07Z

btw - i just noticed the fast implementation is still using GetBackingStore. i'll have a look at that tomorrow. I might not be picking up latest rusty_v8.

littledivy · 2022-09-17T02:52:10Z

@billywhizz you need to use deno_ops and deno_core from my PR branch. The GetBackingStore changes are unreleased.

littledivy · 2022-09-17T02:52:35Z

great results though!

billywhizz added the perf performance related label Sep 13, 2022

billywhizz self-assigned this Sep 13, 2022

littledivy mentioned this issue Sep 16, 2022

perf(web): optimize encodeInto() #15922

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing JS strings to resources, Deno.encode and Deno.encodeInto #15895

Writing JS strings to resources, Deno.encode and Deno.encodeInto #15895

billywhizz commented Sep 13, 2022 •

edited

Loading

aapoalas commented Sep 13, 2022

billywhizz commented Sep 13, 2022

aapoalas commented Sep 14, 2022

littledivy commented Sep 14, 2022

billywhizz commented Sep 15, 2022

littledivy commented Sep 16, 2022

littledivy commented Sep 16, 2022 •

edited

Loading

billywhizz commented Sep 17, 2022

billywhizz commented Sep 17, 2022

littledivy commented Sep 17, 2022

littledivy commented Sep 17, 2022

Writing JS strings to resources, Deno.encode and Deno.encodeInto #15895

Writing JS strings to resources, Deno.encode and Deno.encodeInto #15895

Comments

billywhizz commented Sep 13, 2022 • edited Loading

Issue

Benchmarks

Results

Flamegraphs

static_buffer_writesync

static_buffer_ffi

encode_static_string

encode_into_static_string

Discussion

Planned Actions

aapoalas commented Sep 13, 2022

billywhizz commented Sep 13, 2022

aapoalas commented Sep 14, 2022

littledivy commented Sep 14, 2022

billywhizz commented Sep 15, 2022

littledivy commented Sep 16, 2022

littledivy commented Sep 16, 2022 • edited Loading

billywhizz commented Sep 17, 2022

current encode_into

my change based on divy's PR

divy's PR change

billywhizz commented Sep 17, 2022

littledivy commented Sep 17, 2022

littledivy commented Sep 17, 2022

billywhizz commented Sep 13, 2022 •

edited

Loading

littledivy commented Sep 16, 2022 •

edited

Loading