Callback interface speedup #1497

bendk · 2023-03-22T23:42:13Z

Attempt number 2 on this one. It's pretty much the same as the first attempt, but without the feature flag which simplifies things quite a bit.

Previous attempt: #1481

bendk · 2023-03-22T23:50:22Z

I had a few other ideas that I wanted to try, but didn't end up going with them.

The first one was removing the handle and method arguments and instead packing those into the RustBuffer. The hypothesis was that JNA has poor performance, so the less arguments we use the better. However, this didn't turn out to be true. My guess here is that primitive arguments have good performance, while structs and other compound arguments are slow. The same was true of python.

The second speedup would be to keep around a RustBuffer for the purpose of argument packing and never deallocate it. I still think this could be a nice improvement, but it would be a significant amount of work.

The last speedup would be to buffer the calls in a threadlocal buffer and flush the buffer at the end of the scaffolding function. I think this would be fairly simple and probably would be a huge speedup. However, there's no guarantee that the code is running inside a scaffolding function. It could be running in a spawned thread or a tokio loop or something. In that case we would never end up actually making the scaffolding calls. There are ways to work around this, but again it started getting to be too complex. Maybe this could be interesting for a future project.

mhammond

This is very nice! Please wait for @badboy though :)

mhammond · 2023-03-23T04:27:57Z

CHANGELOG.md

@@ -14,6 +14,9 @@

 [All changes in [[UnreleasedVersion]]](https://github.com/mozilla/uniffi-rs/compare/v0.23.0...HEAD).

+### ⚠️ Breaking Changes ⚠️
+- Implemented a new callback-interface ABI that signifacantly improves performance on Python and Kotlin.


typo: signifacantly->significantly

Also wonder if we should call out that it's not breaking in a "break your code" sense, but most certainly breaking if you happen to maintain external bindings. On that note, there doesn't seem to be much guidance or even comments which might help these people update to use this scheme. I don't really want to block you on that though, so maybe you could open an issue which asks for such guidance to be given "soon", and link to that here?

I expanded this comment with more details. I don't think users need to do anything other than upgrade to the new version for this one.

fixtures/benchmarks/src/benchmarks.udl

uniffi_core/src/ffi/foreigncallbacks.rs

badboy

This is looking good! Couple of smaller things inline, but the overall implementation seems sound.

Do you happen to have the benchmark numbers of the before and after? Would be nice to have them posted here as a reference.

fixtures/benchmarks/README.md

fixtures/benchmarks/benches/bindings/run_benchmarks.kts

fixtures/benchmarks/benches/bindings/run_benchmarks.swift

uniffi_core/src/ffi/foreigncallbacks.rs

fixtures/benchmarks/README.md

- Added the `benchmarks` fixture. This runs a set of benchmark tests and uses `criterion` to analyze the results. Added some benchmarks for calling functions and callback interfaces. - Added `run_script()` function, which is a more generic version of `run_test()`. - uniffi_testing: Don't add `--test` when running `cargo build` to find the dylibs. It's not needed for the fixture/example tests and it breaks the benchmark tests.

The main goal here is to make the logic live in regular, non-templated, code. This makes it easier to understand what's going on and hopefully easier to change. One issue with this approach is that we need 3 functions depending on the method return/throws types. I think it's worth it to have clearer code and I also hope to merge these into 1 function. We could use the same move as in PR#1469: define a new `FfiConverter` method like `callback_interface_method_return()` that handles the specifics then have a default implementation, with a specialized implementation for `Result<>` and `()`.

The goal here is to lower the overhead of callback interface calls, specifically targeting things like logging calls which can be called many times in a short timespan. Since this is an ABI change, it's not backwards compatible and therefore requires a breaking version bump before it can be the default. The initial ABI change was fairly minor: the argument RustBuffer is now passed by reference instead of by value. This means the foreign code doesn't need to use an FFI call to destroy it. For Kotlin, and maybe other languages, it also seems to be slightly faster to use a reference because that's less bits that need to be pushed to/popped from the stack. This brought some speedups to python and kotlin, and a minor speedup to Swift that only passes the noise thresholdin the no-args-void-return case: callbacks/python-callbacks-basic time: [37.277 µs 37.299 µs 37.320 µs] change: [-10.692% -10.532% -10.379%] (p = 0.00 < 0.05) Performance has improved. callbacks/kotlin-callbacks-basic time: [15.533 µs 15.801 µs 16.069 µs] change: [-35.654% -31.481% -27.245%] (p = 0.00 < 0.05) Performance has improved. callbacks/swift-callbacks-no-args-void-return time: [454.96 ns 455.64 ns 456.71 ns] change: [-35.872% -35.792% -35.710%] (p = 0.00 < 0.05) Performance has improved.

If a callback interface method has a void return, there's no need to overwrite the value in the output buffer. Rust has already initialized an empty buffer. Also, there's no need to construct a RustBuffer in the first place. Refactored the callback interface code to make this possible without too much spaghettification. This gives us a decent speedup on Python/Kotlin: Benchmarking callbacks/python-callbacks-void-return Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations) Benchmarking callbacks/python-callbacks-void-return: Analyzing callbacks/python-callbacks-void-return time: [12.184 µs 12.219 µs 12.266 µs] change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe Benchmarking callbacks/kotlin-callbacks-void-return Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations) Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing callbacks/kotlin-callbacks-void-return time: [5.1736 µs 5.2086 µs 5.2430 µs] change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe

This changes things so we send the arguments as a data/len component rather than a RustBuffer struct directly. For some reason this is faster on Python/Kotlin, although it's not clear why. Here's some possibilities: - cytpes and JNA take a performance hit when dealing with structs vs primitive values. - RustBuffer has a third component: capacity, that we don't need. By not sending this component, it reduces the stack size. In any case, this really does speed up Python/Kotlin. Swift doesn't seem to be affected. Benchmarking callbacks/python-callbacks-void-return Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations) Benchmarking callbacks/python-callbacks-void-return: Analyzing callbacks/python-callbacks-void-return time: [12.184 µs 12.219 µs 12.266 µs] change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe Benchmarking callbacks/kotlin-callbacks-void-return Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations) Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing callbacks/kotlin-callbacks-void-return time: [5.1736 µs 5.2086 µs 5.2430 µs] change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe

bendk · 2023-03-23T20:18:34Z

Thanks for the review @badboy I think I address all the things you brought up with the latest commit.

Do you happen to have the benchmark numbers of the before and after? Would be nice to have them posted here as a reference.

Yes let's do that. Here's the results of running the benchmarks on my workstation. The benchmarks were always fairly noisy for Python/Kotlin, I would guess a margin of error of about 5%. But the overall effect seems clear:

Benchmarking callbacks/python-callbacks-basic
Benchmarking callbacks/python-callbacks-basic: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-basic: Collecting 100 samples in estimated 10.073 s (278k iterations)
Benchmarking callbacks/python-callbacks-basic: Analyzing
callbacks/python-callbacks-basic
                        time:   [36.224 µs 36.242 µs 36.259 µs]
                        change: [-20.429% -19.597% -18.867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe
Benchmarking callbacks/python-callbacks-void-return
Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.037 s (1.1M iterations)
Benchmarking callbacks/python-callbacks-void-return: Analyzing
callbacks/python-callbacks-void-return
                        time:   [8.8727 µs 8.8992 µs 8.9484 µs]
                        change: [-57.711% -57.572% -57.369%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high severe
Benchmarking callbacks/python-callbacks-no-args-void-return
Benchmarking callbacks/python-callbacks-no-args-void-return: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-no-args-void-return: Collecting 100 samples in estimated 10.010 s (4.3M iterations)
Benchmarking callbacks/python-callbacks-no-args-void-return: Analyzing
callbacks/python-callbacks-no-args-void-return
                        time:   [2.3429 µs 2.3476 µs 2.3525 µs]
                        change: [-57.244% -57.095% -56.963%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking callbacks/kotlin-callbacks-basic
Benchmarking callbacks/kotlin-callbacks-basic: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-basic: Collecting 100 samples in estimated 10.025 s (742k iterations)
Benchmarking callbacks/kotlin-callbacks-basic: Analyzing
callbacks/kotlin-callbacks-basic
                        time:   [13.192 µs 13.430 µs 13.659 µs]
                        change: [-42.337% -39.112% -35.106%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking callbacks/kotlin-callbacks-void-return
Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.005 s (2.9M iterations)
Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing
callbacks/kotlin-callbacks-void-return
                        time:   [3.2238 µs 3.3074 µs 3.3915 µs]
                        change: [-79.606% -78.129% -76.674%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking callbacks/kotlin-callbacks-no-args-void-return
Benchmarking callbacks/kotlin-callbacks-no-args-void-return: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-no-args-void-return: Collecting 100 samples in estimated 10.004 s (3.9M iterations)
Benchmarking callbacks/kotlin-callbacks-no-args-void-return: Analyzing
callbacks/kotlin-callbacks-no-args-void-return
                        time:   [2.5148 µs 2.5967 µs 2.6783 µs]
                        change: [-81.435% -80.294% -79.059%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

callbacks/swift-callbacks-basic
                        time:   [8.2316 µs 8.2596 µs 8.2898 µs]
                        change: [-5.7728% -5.2739% -4.8039%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
callbacks/swift-callbacks-void-return
                        time:   [6.1453 µs 6.1978 µs 6.2637 µs]
                        change: [-4.5813% -4.1072% -3.6176%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
callbacks/swift-callbacks-no-args-void-return
                        time:   [419.52 ns 420.48 ns 421.65 ns]
                        change: [-41.218% -40.224% -38.874%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

Most of these are doc fixes. The one behavior change is that the callback return values were updated. If we're breaking the ABI we might as well make these be a bit more logical.

badboy · 2023-03-24T10:20:22Z

uniffi_core/src/ffi/foreigncallbacks.rs

+        }
+    }
+
+    // TODO: Refactor the callback code so that we don't need to have 3 different functions here


bendk requested review from badboy and mhammond March 22, 2023 23:42

bendk requested a review from a team as a code owner March 22, 2023 23:42

mhammond approved these changes Mar 23, 2023

View reviewed changes

badboy approved these changes Mar 23, 2023

View reviewed changes

badboy reviewed Mar 23, 2023

View reviewed changes

fixtures/benchmarks/README.md Show resolved Hide resolved

bendk added 9 commits March 23, 2023 16:02

Clippy fixes

c402ae7

Adding changelog entry

caca27f

Doc fixes from review

e19e265

More review fixes

76331d7

bendk force-pushed the callback-interface-speedup branch from b24be40 to bf77b38 Compare March 23, 2023 20:16

More fixes from review.

47b494a

Most of these are doc fixes. The one behavior change is that the callback return values were updated. If we're breaking the ABI we might as well make these be a bit more logical.

bendk force-pushed the callback-interface-speedup branch from bf77b38 to 47b494a Compare March 24, 2023 00:04

badboy approved these changes Mar 24, 2023

View reviewed changes

bendk merged commit 47b494a into mozilla:main Mar 24, 2023

typfel mentioned this pull request Jul 24, 2023

support native async calls introduced in uniffi 0.24 wireapp/uniffi-kotlin-multiplatform-bindings#5

Open

8 tasks

bendk mentioned this pull request Oct 4, 2023

RustBuffer should be a data pointer #1779

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callback interface speedup #1497

Callback interface speedup #1497

bendk commented Mar 22, 2023 •

edited by badboy

Loading

bendk commented Mar 22, 2023

mhammond left a comment

mhammond Mar 23, 2023

bendk Mar 23, 2023

badboy left a comment

bendk commented Mar 23, 2023

badboy Mar 24, 2023

Callback interface speedup #1497

Callback interface speedup #1497

Conversation

bendk commented Mar 22, 2023 • edited by badboy Loading

bendk commented Mar 22, 2023

mhammond left a comment

Choose a reason for hiding this comment

mhammond Mar 23, 2023

Choose a reason for hiding this comment

bendk Mar 23, 2023

Choose a reason for hiding this comment

badboy left a comment

Choose a reason for hiding this comment

bendk commented Mar 23, 2023

badboy Mar 24, 2023

Choose a reason for hiding this comment

bendk commented Mar 22, 2023 •

edited by badboy

Loading