-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callback interface speedup #1497
Conversation
I had a few other ideas that I wanted to try, but didn't end up going with them. The first one was removing the The second speedup would be to keep around a The last speedup would be to buffer the calls in a threadlocal buffer and flush the buffer at the end of the scaffolding function. I think this would be fairly simple and probably would be a huge speedup. However, there's no guarantee that the code is running inside a scaffolding function. It could be running in a spawned thread or a tokio loop or something. In that case we would never end up actually making the scaffolding calls. There are ways to work around this, but again it started getting to be too complex. Maybe this could be interesting for a future project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice! Please wait for @badboy though :)
CHANGELOG.md
Outdated
@@ -14,6 +14,9 @@ | |||
|
|||
[All changes in [[UnreleasedVersion]]](https://github.com/mozilla/uniffi-rs/compare/v0.23.0...HEAD). | |||
|
|||
### ⚠️ Breaking Changes ⚠️ | |||
- Implemented a new callback-interface ABI that signifacantly improves performance on Python and Kotlin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: signifacantly->significantly
Also wonder if we should call out that it's not breaking in a "break your code" sense, but most certainly breaking if you happen to maintain external bindings. On that note, there doesn't seem to be much guidance or even comments which might help these people update to use this scheme. I don't really want to block you on that though, so maybe you could open an issue which asks for such guidance to be given "soon", and link to that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded this comment with more details. I don't think users need to do anything other than upgrade to the new version for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! Couple of smaller things inline, but the overall implementation seems sound.
Do you happen to have the benchmark numbers of the before and after? Would be nice to have them posted here as a reference.
- Added the `benchmarks` fixture. This runs a set of benchmark tests and uses `criterion` to analyze the results. Added some benchmarks for calling functions and callback interfaces. - Added `run_script()` function, which is a more generic version of `run_test()`. - uniffi_testing: Don't add `--test` when running `cargo build` to find the dylibs. It's not needed for the fixture/example tests and it breaks the benchmark tests.
The main goal here is to make the logic live in regular, non-templated, code. This makes it easier to understand what's going on and hopefully easier to change. One issue with this approach is that we need 3 functions depending on the method return/throws types. I think it's worth it to have clearer code and I also hope to merge these into 1 function. We could use the same move as in PR#1469: define a new `FfiConverter` method like `callback_interface_method_return()` that handles the specifics then have a default implementation, with a specialized implementation for `Result<>` and `()`.
The goal here is to lower the overhead of callback interface calls, specifically targeting things like logging calls which can be called many times in a short timespan. Since this is an ABI change, it's not backwards compatible and therefore requires a breaking version bump before it can be the default. The initial ABI change was fairly minor: the argument RustBuffer is now passed by reference instead of by value. This means the foreign code doesn't need to use an FFI call to destroy it. For Kotlin, and maybe other languages, it also seems to be slightly faster to use a reference because that's less bits that need to be pushed to/popped from the stack. This brought some speedups to python and kotlin, and a minor speedup to Swift that only passes the noise thresholdin the no-args-void-return case: callbacks/python-callbacks-basic time: [37.277 µs 37.299 µs 37.320 µs] change: [-10.692% -10.532% -10.379%] (p = 0.00 < 0.05) Performance has improved. callbacks/kotlin-callbacks-basic time: [15.533 µs 15.801 µs 16.069 µs] change: [-35.654% -31.481% -27.245%] (p = 0.00 < 0.05) Performance has improved. callbacks/swift-callbacks-no-args-void-return time: [454.96 ns 455.64 ns 456.71 ns] change: [-35.872% -35.792% -35.710%] (p = 0.00 < 0.05) Performance has improved.
If a callback interface method has a void return, there's no need to overwrite the value in the output buffer. Rust has already initialized an empty buffer. Also, there's no need to construct a RustBuffer in the first place. Refactored the callback interface code to make this possible without too much spaghettification. This gives us a decent speedup on Python/Kotlin: Benchmarking callbacks/python-callbacks-void-return Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations) Benchmarking callbacks/python-callbacks-void-return: Analyzing callbacks/python-callbacks-void-return time: [12.184 µs 12.219 µs 12.266 µs] change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe Benchmarking callbacks/kotlin-callbacks-void-return Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations) Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing callbacks/kotlin-callbacks-void-return time: [5.1736 µs 5.2086 µs 5.2430 µs] change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe
This changes things so we send the arguments as a data/len component rather than a RustBuffer struct directly. For some reason this is faster on Python/Kotlin, although it's not clear why. Here's some possibilities: - cytpes and JNA take a performance hit when dealing with structs vs primitive values. - RustBuffer has a third component: capacity, that we don't need. By not sending this component, it reduces the stack size. In any case, this really does speed up Python/Kotlin. Swift doesn't seem to be affected. Benchmarking callbacks/python-callbacks-void-return Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations) Benchmarking callbacks/python-callbacks-void-return: Analyzing callbacks/python-callbacks-void-return time: [12.184 µs 12.219 µs 12.266 µs] change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe Benchmarking callbacks/kotlin-callbacks-void-return Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations) Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing callbacks/kotlin-callbacks-void-return time: [5.1736 µs 5.2086 µs 5.2430 µs] change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe
b24be40
to
bf77b38
Compare
Thanks for the review @badboy I think I address all the things you brought up with the latest commit.
Yes let's do that. Here's the results of running the benchmarks on my workstation. The benchmarks were always fairly noisy for Python/Kotlin, I would guess a margin of error of about 5%. But the overall effect seems clear:
|
Most of these are doc fixes. The one behavior change is that the callback return values were updated. If we're breaking the ABI we might as well make these be a bit more logical.
bf77b38
to
47b494a
Compare
} | ||
} | ||
|
||
// TODO: Refactor the callback code so that we don't need to have 3 different functions here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Attempt number 2 on this one. It's pretty much the same as the first attempt, but without the feature flag which simplifies things quite a bit.
Previous attempt: #1481