Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback interface speedup #1497

Merged
merged 10 commits into from
Mar 24, 2023
Merged

Conversation

bendk
Copy link
Contributor

@bendk bendk commented Mar 22, 2023

Attempt number 2 on this one. It's pretty much the same as the first attempt, but without the feature flag which simplifies things quite a bit.


Previous attempt: #1481

@bendk bendk requested review from badboy and mhammond March 22, 2023 23:42
@bendk bendk requested a review from a team as a code owner March 22, 2023 23:42
@bendk
Copy link
Contributor Author

bendk commented Mar 22, 2023

I had a few other ideas that I wanted to try, but didn't end up going with them.

The first one was removing the handle and method arguments and instead packing those into the RustBuffer. The hypothesis was that JNA has poor performance, so the less arguments we use the better. However, this didn't turn out to be true. My guess here is that primitive arguments have good performance, while structs and other compound arguments are slow. The same was true of python.

The second speedup would be to keep around a RustBuffer for the purpose of argument packing and never deallocate it. I still think this could be a nice improvement, but it would be a significant amount of work.

The last speedup would be to buffer the calls in a threadlocal buffer and flush the buffer at the end of the scaffolding function. I think this would be fairly simple and probably would be a huge speedup. However, there's no guarantee that the code is running inside a scaffolding function. It could be running in a spawned thread or a tokio loop or something. In that case we would never end up actually making the scaffolding calls. There are ways to work around this, but again it started getting to be too complex. Maybe this could be interesting for a future project.

Copy link
Member

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice! Please wait for @badboy though :)

CHANGELOG.md Outdated
@@ -14,6 +14,9 @@

[All changes in [[UnreleasedVersion]]](https://github.com/mozilla/uniffi-rs/compare/v0.23.0...HEAD).

### ⚠️ Breaking Changes ⚠️
- Implemented a new callback-interface ABI that signifacantly improves performance on Python and Kotlin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: signifacantly->significantly

Also wonder if we should call out that it's not breaking in a "break your code" sense, but most certainly breaking if you happen to maintain external bindings. On that note, there doesn't seem to be much guidance or even comments which might help these people update to use this scheme. I don't really want to block you on that though, so maybe you could open an issue which asks for such guidance to be given "soon", and link to that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded this comment with more details. I don't think users need to do anything other than upgrade to the new version for this one.

fixtures/benchmarks/src/benchmarks.udl Outdated Show resolved Hide resolved
uniffi_core/src/ffi/foreigncallbacks.rs Outdated Show resolved Hide resolved
Copy link
Member

@badboy badboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! Couple of smaller things inline, but the overall implementation seems sound.

Do you happen to have the benchmark numbers of the before and after? Would be nice to have them posted here as a reference.

fixtures/benchmarks/README.md Outdated Show resolved Hide resolved
fixtures/benchmarks/README.md Outdated Show resolved Hide resolved
fixtures/benchmarks/benches/bindings/run_benchmarks.swift Outdated Show resolved Hide resolved
uniffi_core/src/ffi/foreigncallbacks.rs Show resolved Hide resolved
uniffi_core/src/ffi/foreigncallbacks.rs Outdated Show resolved Hide resolved
uniffi_core/src/ffi/foreigncallbacks.rs Outdated Show resolved Hide resolved
uniffi_core/src/ffi/foreigncallbacks.rs Outdated Show resolved Hide resolved
uniffi_core/src/ffi/foreigncallbacks.rs Show resolved Hide resolved
bendk added 9 commits March 23, 2023 16:02
- Added the `benchmarks` fixture.  This runs a set of benchmark tests
  and uses `criterion` to analyze the results.  Added some benchmarks
  for calling functions and callback interfaces.
- Added `run_script()` function, which is a more generic version of
  `run_test()`.
- uniffi_testing: Don't add `--test` when running `cargo build` to find
  the dylibs.  It's not needed for the fixture/example tests and it
  breaks the benchmark tests.
The main goal here is to make the logic live in regular, non-templated,
code.  This makes it easier to understand what's going on and hopefully
easier to change.

One issue with this approach is that we need 3 functions depending on
the method return/throws types.  I think it's worth it to have clearer
code and I also hope to merge these into 1 function.  We could use the
same move as in PR#1469: define a new `FfiConverter` method like
`callback_interface_method_return()` that handles the specifics then
have a default implementation, with a specialized implementation for
`Result<>` and `()`.
The goal here is to lower the overhead of callback interface calls,
specifically targeting things like logging calls which can be called
many times in a short timespan.

Since this is an ABI change, it's not backwards compatible and therefore
requires a breaking version bump before it can be the default.

The initial ABI change was fairly minor: the argument RustBuffer is now
passed by reference instead of by value.  This means the foreign code
doesn't need to use an FFI call to destroy it.  For Kotlin, and maybe
other languages, it also seems to be slightly faster to use a reference
because that's less bits that need to be pushed to/popped from the
stack.

This brought some speedups to python and kotlin, and a minor speedup to
Swift that only passes the noise thresholdin the no-args-void-return
case:

callbacks/python-callbacks-basic
                        time:   [37.277 µs 37.299 µs 37.320 µs]
                        change: [-10.692% -10.532% -10.379%] (p = 0.00 < 0.05)
                        Performance has improved.

callbacks/kotlin-callbacks-basic
                        time:   [15.533 µs 15.801 µs 16.069 µs]
                        change: [-35.654% -31.481% -27.245%] (p = 0.00 < 0.05)
                        Performance has improved.

callbacks/swift-callbacks-no-args-void-return
                        time:   [454.96 ns 455.64 ns 456.71 ns]
                        change: [-35.872% -35.792% -35.710%] (p = 0.00 < 0.05)
                        Performance has improved.
If a callback interface method has a void return, there's no need to
overwrite the value in the output buffer.  Rust has already initialized
an empty buffer.  Also, there's no need to construct a RustBuffer in the
first place.

Refactored the callback interface code to make this possible without too
much spaghettification.

This gives us a decent speedup on Python/Kotlin:

Benchmarking callbacks/python-callbacks-void-return
Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations)
Benchmarking callbacks/python-callbacks-void-return: Analyzing
callbacks/python-callbacks-void-return
                        time:   [12.184 µs 12.219 µs 12.266 µs]
                        change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

Benchmarking callbacks/kotlin-callbacks-void-return
Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations)
Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing
callbacks/kotlin-callbacks-void-return
                        time:   [5.1736 µs 5.2086 µs 5.2430 µs]
                        change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low severe
  6 (6.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
This changes things so we send the arguments as a data/len component
rather than a RustBuffer struct directly. For some reason this is faster
on Python/Kotlin, although it's not clear why.  Here's some
possibilities:

 - cytpes and JNA take a performance hit when dealing with structs vs
   primitive values.
 - RustBuffer has a third component: capacity, that we don't need.  By
   not sending this component, it reduces the stack size.

In any case, this really does speed up Python/Kotlin.  Swift doesn't
seem to be affected.

Benchmarking callbacks/python-callbacks-void-return
Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations)
Benchmarking callbacks/python-callbacks-void-return: Analyzing
callbacks/python-callbacks-void-return
                        time:   [12.184 µs 12.219 µs 12.266 µs]
                        change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

Benchmarking callbacks/kotlin-callbacks-void-return
Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations)
Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing
callbacks/kotlin-callbacks-void-return
                        time:   [5.1736 µs 5.2086 µs 5.2430 µs]
                        change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low severe
  6 (6.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
@bendk bendk force-pushed the callback-interface-speedup branch from b24be40 to bf77b38 Compare March 23, 2023 20:16
@bendk
Copy link
Contributor Author

bendk commented Mar 23, 2023

Thanks for the review @badboy I think I address all the things you brought up with the latest commit.

Do you happen to have the benchmark numbers of the before and after? Would be nice to have them posted here as a reference.

Yes let's do that. Here's the results of running the benchmarks on my workstation. The benchmarks were always fairly noisy for Python/Kotlin, I would guess a margin of error of about 5%. But the overall effect seems clear:

Benchmarking callbacks/python-callbacks-basic
Benchmarking callbacks/python-callbacks-basic: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-basic: Collecting 100 samples in estimated 10.073 s (278k iterations)
Benchmarking callbacks/python-callbacks-basic: Analyzing
callbacks/python-callbacks-basic
                        time:   [36.224 µs 36.242 µs 36.259 µs]
                        change: [-20.429% -19.597% -18.867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe
Benchmarking callbacks/python-callbacks-void-return
Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.037 s (1.1M iterations)
Benchmarking callbacks/python-callbacks-void-return: Analyzing
callbacks/python-callbacks-void-return
                        time:   [8.8727 µs 8.8992 µs 8.9484 µs]
                        change: [-57.711% -57.572% -57.369%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high severe
Benchmarking callbacks/python-callbacks-no-args-void-return
Benchmarking callbacks/python-callbacks-no-args-void-return: Warming up for 3.0000 s
Benchmarking callbacks/python-callbacks-no-args-void-return: Collecting 100 samples in estimated 10.010 s (4.3M iterations)
Benchmarking callbacks/python-callbacks-no-args-void-return: Analyzing
callbacks/python-callbacks-no-args-void-return
                        time:   [2.3429 µs 2.3476 µs 2.3525 µs]
                        change: [-57.244% -57.095% -56.963%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking callbacks/kotlin-callbacks-basic
Benchmarking callbacks/kotlin-callbacks-basic: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-basic: Collecting 100 samples in estimated 10.025 s (742k iterations)
Benchmarking callbacks/kotlin-callbacks-basic: Analyzing
callbacks/kotlin-callbacks-basic
                        time:   [13.192 µs 13.430 µs 13.659 µs]
                        change: [-42.337% -39.112% -35.106%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking callbacks/kotlin-callbacks-void-return
Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.005 s (2.9M iterations)
Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing
callbacks/kotlin-callbacks-void-return
                        time:   [3.2238 µs 3.3074 µs 3.3915 µs]
                        change: [-79.606% -78.129% -76.674%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking callbacks/kotlin-callbacks-no-args-void-return
Benchmarking callbacks/kotlin-callbacks-no-args-void-return: Warming up for 3.0000 s
Benchmarking callbacks/kotlin-callbacks-no-args-void-return: Collecting 100 samples in estimated 10.004 s (3.9M iterations)
Benchmarking callbacks/kotlin-callbacks-no-args-void-return: Analyzing
callbacks/kotlin-callbacks-no-args-void-return
                        time:   [2.5148 µs 2.5967 µs 2.6783 µs]
                        change: [-81.435% -80.294% -79.059%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
callbacks/swift-callbacks-basic
                        time:   [8.2316 µs 8.2596 µs 8.2898 µs]
                        change: [-5.7728% -5.2739% -4.8039%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
callbacks/swift-callbacks-void-return
                        time:   [6.1453 µs 6.1978 µs 6.2637 µs]
                        change: [-4.5813% -4.1072% -3.6176%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
callbacks/swift-callbacks-no-args-void-return
                        time:   [419.52 ns 420.48 ns 421.65 ns]
                        change: [-41.218% -40.224% -38.874%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

Most of these are doc fixes.  The one behavior change is that the
callback return values were updated.  If we're breaking the ABI we might
as well make these be a bit more logical.
@bendk bendk force-pushed the callback-interface-speedup branch from bf77b38 to 47b494a Compare March 24, 2023 00:04
}
}

// TODO: Refactor the callback code so that we don't need to have 3 different functions here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants