-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing JS strings to resources, Deno.encode and Deno.encodeInto #15895
Comments
Don't you worry: Fast API can actually work with strings. If you give the parameter type as |
nice. has that landed already @aapoalas? i'll give it a go if so. |
Yeah, it's already supported by Fast API internally. I can't quite remember if the support for that in ops was already added. You'll need to use the options bag parameter as well to fall back to the slow path when the expected string parameter turns out to be not-a-string. |
For the record, here's the link to the conversation on V8 supporting strings in Fast API calls. |
i've updated the OP with a new test using Deno.writeSync, which is fastest "official" way I can find to write to a resource. Also uploaded nicer flamegraphs with full Deno/Rust stacktraces. |
After applying the encodeInto optimization in #15922 |
The |
I've implemented a little minimal runtime so i can test these low level ops changes without building all of deno. it only takes ~20 seconds to do a rebuild with this. I implemented @littledivy's change from this PR for encode_into as well as a similar change with the same logic current Deno uses. Divy's PR is ~3x the current implementation in throughput on a small http response payload. I will benchmark again over weekend using this in the benchmarks i did for text writing above. Here are the flamegraphs current encode_intomy change based on divy's PRdivy's PR change |
@billywhizz you need to use |
great results though! |
Issue
I was doing some investigation into network perf using Deno FFI and stdio perf in general and came up against a performance roadblock which is relevant to anywhere we are writing JS strings to ops resources using send or write syscalls and the like.
If you look at the benchmark script here you can see the following scenarios benchmarked. All scenarios write to /dev/null using FFI fast call.
Benchmarks
Deno.core.encode
Deno.core.encode
Deno.core.ops.op_encoding_encode_into
Deno.core.ops.op_encoding_encode_into
Results
Flamegraphs
Flamegraphs to compare the three static string scenarios are here which show clearly the overhead involved in both
Deno.core.encode
andDeno.core.ops.op_encoding_encode_into
compared to the baseline of writing a pre-encoded buffer and using Deno.writeSync, which seems to be fastest option currently exposed to userland.static_buffer_writesync
static_buffer_ffi
encode_static_string
encode_into_static_string
Discussion
From the benchmarks we can see:
Deno.core.encode
on the static string in question causes a ~58% drop in throughput and takes 2.3x longerDeno.core.ops.op_encoding_encode_into
causes a ~64% drop in throughput and takes 3.27x longerFrom the flamegraphs we can see:
__libc_write
Deno.core.encode
the program spends 35% in__libc_write
Deno.core.ops.op_encoding_encode_into
the program spends 28% in__libc_write
There are a few of issues I think this raises:
Deno.core.encode
is expensive because it does a lot of work - it allocates a BackingStore, creates an ArrayBuffer and wraps it in a Uint8Array and returns that to JS. This is per spec so not much we can do about it other than to avoid using it where we don't need to.Deno.core.ops.op_encoding_encode_into
should be faster than Deno.core.encode as it is working with a pre-existing buffer and only needs to do a memcpy from the string into the buffer. The overhead here is likely down to it being a native op and being wrapped in serde_v8.Planned Actions
Deno.core.encode_into (u8, String)
into ops_builtin_v8.rs which does encode_into without serde_v8.Deno.core.write_utf8 (rid, string)
api call to ops_builtin_v8.rs which will take a copy of the utf8 bytes inside the v8 string passed to it and write them to the resource identified by rid, as in the C++ snippet above.The disadvantage of the second approach is, afaik, it will not be optimised as a fast call because fast call api has no string support yet. But it would be worth knowing if it is faster than the other methods above even without the fast call so I propose doing it as a draft PR and see if it improves perf. It should be pretty easy to implement.
The text was updated successfully, but these errors were encountered: