Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit decision to not support destructors for interface objects on Kotlin. #1394

Closed
bendk opened this issue Nov 2, 2022 · 10 comments
Closed
Labels
help wanted Extra attention is needed

Comments

@bendk
Copy link
Contributor

bendk commented Nov 2, 2022

Early on, we made a decision to use explicit destructors for Kotlin (#8). I think we should revisit that decision and consider if it's still what we want.

The main biggest issue with the destructors is that they are quite a footgun. If any consumer forgets to call destroy(), then we're leaking memory. As a concrete example, what kind of care are we taking to ensure that app-services places connections are properly destroyed? I've never once thought about it while working on this code. I did some quick grepping and AFAICT, the android-components code never calls destroy(), but luckily it caches the connections which ends up preventing the leaks

This gets more footgunny when you deal with nested objects. If you have a record that has on interface object field, then you also need to call destroy(). The same would go for Enums, except we have code that panics if you actually try define an Enum with a variant that contains an interface object (#1372). I wonder if the reason for that panic is because we didn't want to deal with this issue. Errors would be even more problematic, since errors might only get caught by a top-level catch-all function (OTOH, I think preventing errors from containing objects is reasonable). Lastly, if you had a function that returned a sequence/map of interface objects, then the caller would need to iterate and call destroy() on each element.

The secondary issue is that it adds complexity to our code and our decision making. Tracking which objects could maybe contain an interface object is not trivial. Also, each time we consider lifting a restriction on what can contain an interface object, it's harder because we need to consider what to do about destroy() on Kotlin.

I suspect the limitations from 2020 might not apply anymore. Can we use the Cleaner object on Android now? Or at least have a non-buggy finalize() method? I don't know, but maybe someone with knowledge can chime in here.

I personally think we should move to a more flexible system and let the user choose. In uniffi.toml they could choose if they wanted the current system, a Cleaner-based system, or a finalize() based system. I would also consider allowing one of the latter systems + an explicit destroy() method as an option.

What do others think?

┆Issue is synchronized with this Jira Task
┆Issue Number: UNIFFI-210
┆Link To Issue: https://mozilla-hub.atlassian.net/browse/UNIFFI-210

@mhammond
Copy link
Member

mhammond commented Nov 3, 2022

Thanks for bringing this up - I don't have a strong opinion yet, but think we do need to tread carefully.

I found this article quite informative - I assume it's generally accurate and best I can tell, it's relevant to us given the Kotlin we care about is running on a JVM.

The main biggest issue with the destructors is that they are quite a footgun. If any consumer forgets to call destroy(), then we're leaking memory.

As that article points out, relying on finalizers to reclaim memory might just offer a false sense of security in some cases - and in particular, might run fine in our dev environments, but still fail on actual devices. There's also a scary anecdote that "it is about 430 times slower to create and destroy objects with finalizers."

Another footgun with finalizers is that they are apparently not chained by default - so if we add finalizers, it will be a suble breaking change, because if anyone subclasses our objects and uses a finalizer, they will now need to do a super.finalize() - and even for new users of uniffi, it remains a memory-leak footgun if they neglect to make that call.

Can we use the Cleaner object on Android now?

It seems that Cleaner runs on its own thread. Another anecdote from that article is that this assumes that thread has a priority such that it can actually make progress.

I personally think we should move to a more flexible system and let the user choose.

How does this interact with the other complications you mention (eg, with one enum variant having an object)? ie, if we could make things work correctly in that case when the user chooses the "explicit destroy()" option, then we could just stick with that by default, right?

One thing that article blesses is destructors for "objects with native peers. A native peer is a native object to which a normal object delegates via native methods" - which seems to accurately describe most of our objects - but it still cautions "If the native peer holds resources that must be terminated promptly, the class should have an explicit termination method".

In other words, for your example of the database connection, relying on a finalizer to close the file just makes a less obvious footgun than neglecting to call an explicit destroy method - a code audit can determine if destroy is called, but can never tell you exactly when a finalizer will run in practice. So for these kinds of objects, if you need a destroy method to ensure deterministic behaviour, you don't really need a finalizer.

IOW, I suspect using finalizers to, eg, ensure "cheap" rust resources are dropped might be OK (assuming the 430% quoted above is either hyperbole or not a problem in practice for our consumers) but probably isn't what we should rely on in all cases.

However, I'm far from qualified to have any opinions about this - we probably need to rope @csadilek and/or the Fenix team in here. However, for that to be fruitful we probably need more concrete examples (eg, the database connection is an obvious one, but less obvious are things like the enums and/or other short-lived objects (eg, where places returns a large array of objects) - I know there's a concrete use-case which triggered you to raise this, but can't recall exactly what it is)

@bendk
Copy link
Contributor Author

bendk commented Nov 3, 2022

Thanks for the feedback, it really makes me think that there's no one-size-fits-all solution for this, so it should be configurable. I think you're right that the default should be the current, explicit destroy() system.

If things were configurable, then we would still need to deal with the code complications of tracking which types could contain nested interface objects and adding a destroy method for them. In fact, the code would get more complicated since depending on the configuration, you'd either be adding destroy(), adding finalize(), or registering it with the Cleaner. However, hopefully the decision making would get simpler, since then we could have a policy that if it's technically feasible to let some type contain an interface object then we should allow it.

That's a good point about interface objects that hold other resources. Maybe one other piece of configuration would be supporting an attribute that specifies an object type should be explicitly destroyed. Then even if you choose the finalize or cleaner system, you would still get an explicit destroy() method. Then we could either remove the automatic destroy() or log a warning when it's used instead of the explicit destroy(). This could also be done with user-written Rust code and no special support from UniFFI. You could store the resource in an Option, define a close() method, and have all the other methods return an error if called when the option isn't set. However, this seems like it requires too much work for the user for something so fundamental.

Lastly, I think that section about native peers and critical resources is misleading. It seems to argue that if a native object needs an explicit destroy if and only if it holds a file handle, network socket, or something like that. But those resources are often not scarce. For example, I'd guess that Fenix runs out of memory much more often than it hits a file handle limit, so why is it okay to rely on the GC to reclaim memory, but not file handles? I think the advice only makes sense if you interpret "critical resource" as a significant amount of anything that's scarce. For Fenix, my naive guess is that would include a large memory allocation and a network connection, not a file handle, but that's just a guess. Maybe this is a long-winded way of saying I agree, we need to talk with the Fenix team to figure out some concrete examples that matter to them.

@csadilek
Copy link

csadilek commented Nov 3, 2022

Thanks for the ping, I mostly wanted to quickly caution and leave a -1 for relying on finalizers and destructors, or any kind of non-deterministic clean-up tasks. The article and summary Mark posted above already did that though :).

I need to understand your motivation better, and maybe it's best to talk this through, but my suggestion here is to treat this similar to how any resource would be handled if it was all Java/Kotlin, see "use" and (Auto)Closeable. Essentially using a defined scope to determine when the resource can be released and working with your explicit destroy() mechanism/system.

@arg0d
Copy link
Contributor

arg0d commented Mar 6, 2023

I think there are a lot of dimensions to this problem:

  • GC finalizer is problematic, because it only runs once memory exceeds a threshold. Given a tight loop that allocates handles and relies on GC for cleanup, but doesn't allocate any memory, the GC would not run, and the system would quickly run out of handles.
  • Releasing handles in a finalizer makes sense for long living applications. This is because the majority of handle leaks aren't urgent (like in a busy loop). Less urgent leaks would only cause problems given a sufficiently long runtime. Even a non-deterministic GC is enough to release them, before they start causing major issues. To get the best of both worlds, it seems natural to both expose a destroy() method for use in busy-loops, and to use a finalizer to release handles for less urgent use cases. For example, Go is doing just that. https://github.com/golang/go/blob/master/src/os/file_unix.go#L206
  • Not all languages are equally well equiped for managing resources. C/C++ has better resource management tools, at the cost of expensive memory bugs. GC languages have restricted resource mangement tools, with the promise of no memorgy bugs, but at the expense of bloated runtimes and resource leaks. I definitely don't think that we can "solve" state management issues in Kotlin. The best we can do is employ the tools offered by each language, provide a natural and recognizable API within the language's ecosystem, and hope that the consumers are qualified to work with such an API.
  • Managing objects is complicated and costly. While we want to believe that shared objects between languages is more convenient than alternatives, the reality says otherwise. It seems like the best option is to avoid objects between languages entirely. When designing the API, keep objects to a minimum. For example, instead of exposing a short lived, database connection object, only expose a method to query the database from a pre-existing connection that is managed by the library. To manage the connection, expose well-defined methods, such as Db.Initialize(userId), or Db.SwitchUser(userId). I'm not saying that its always possible to avoid objects, and objects should definitely be used when there are no alternatives. What I'm trying to say is don't fall into the "everything is an object" mentality, when other equally good or even better alternatives are available.

@mhammond
Copy link
Member

mhammond commented Mar 9, 2023

I think there are a lot of dimensions to this problem:

Yeah, I agree with everything you said there. Given UniFFI is a tool rather than an app (ie, a means to an end rather than the end itself), I think some of those points (eg, "When designing the API") are outside the scope of UniFFI - we need to do all we can to make it easier, but how consumers of UniFFI design their API is up to them and their specific use-case.

So I think the UniFFI team has come to the position that:

  • We should not rely on GC as a finalizer whereever possible (which is roughly the status quo, at least for Kotlin)
  • We might as well lean on the GC when it makes sense still - eg, there's no reason (IIUC) that we can't also try and make the finalizer free as many resources as possible, even if we document consumers should not rely on it.

IOW, we are considering looking into adding finalizer support to make a "best effort" attempt at freeing what it can while leaving the auto-closeable pattern in place and as the documented preferred usage pattern.

@bendk
Copy link
Contributor Author

bendk commented Mar 9, 2023

I agree with that proposed behavior change: adding finalizer support as a fallback for not explicitly freeing.

I'm still not sure that we should say that explicit freeing is always better. I think that it's perfectly reasonable to rely on the GC to free objects that only contain heap allocations. I'll even take the heretical view that it can be reasonable to do that if the object contains things like file handles -- sometimes file handles aren't a scarce resource.

So I think the UniFFI position should be that we support both methods and consumers should decide how they want to free the objects. We can link to some best-practices articles, and say that that auto-closeable is the typical choice for most projects, but in the end it's up to the consumer to pick a strategy for each of their objects.

I also think some configuration could be good. Users could pick if they want to log a warning when the finalizer runs or not. Maybe they could disable the finalizer or the auto-closeable code altogether, although I'm not sure how useful that is.

@ganfra
Copy link

ganfra commented Jul 17, 2023

Any news about that? We would love to not have to manage the finalizers everywhere from ourself.

@mhammond mhammond added the help wanted Extra attention is needed label Jul 17, 2023
@mhammond
Copy link
Member

This isn't something we plan on doing in the short term, but would welcome a PR.

bendk added a commit to bendk/uniffi-rs that referenced this issue Oct 6, 2023
This adds support for the foreign bindings side to implement Rust traits
and for both sides to pass `Arc<dyn [Trait]>` objects back and forth.

I thought this would be a good time to redo the callback interface code.
The trait interface code uses a foreign-supplied vtable rather than a
single callback and method indexes like the old code used.

Renamed `CALL_PANIC` to `CALL_UNEXPECTED_ERROR` in the foreign bindings
templates.  This matches the name in the Rust code and makes more sense
for foreign trait implementations.

-------------------- TODO -----------------------------

This currently requires "wrapping" the object every time it's passed
across the FFI.  If the foreign code receives a trait object, then
passes it back to Rust.  Rust now has a handle to the foreign impl and
that foreign impl just calls back into Rust.  I think mozilla#1730 could help
solve this.

I think there should be better tests for reference counts, but I'm
waiting until we address the previous issue to implement them.

Python requires a hack to return RustBuffer from callbacks
(RubiconCtypesPatch.py) because of an old ctypes bug
(https://bugs.python.org/issue5710).  The options I can see are:
  - Go back to the callback interface style, where we input a RustBuffer
    and write the output to it.
  - Pass the return type as an out pointer.  This requires some special
    handling for void returns though.
  - Implement mozilla#1779, which changes RustBuffer to a simple `*mut u8` and
    then Python could return it.  This is my preference, I don't like
    the idea of making the ABI worse just because of Python.

TODO: If I make the Python tests fail, I see `RuntimeWarning: memory
leak in callback function` in the console.  I believe this is because of
the rubicon hack, but I'm not sure.  We should double check this before
merging.

Kotlin is not implemented yet.  I think when we do implement it, we're
going to want to face mozilla#1394 and make a decision there.  I don't think
it's reasonable to have consumers have to call a `close()` method on
these interfaces.  I also don't know how we would deal with this in
Rust.  ISTM that the same logic that says Kotlin objects need to be
manually closed dictates that if you use a Kotlin object in Rust, there
should be a `close()` method or something analogous to a to
`AutoCloseable`.

The Ruby coverall tests are now failing because they don't implement
trait interfaces.  We need to figure out a way for them to pass the test
suite.
bendk added a commit to bendk/uniffi-rs that referenced this issue Oct 6, 2023
This adds support for the foreign bindings side to implement Rust traits
and for both sides to pass `Arc<dyn [Trait]>` objects back and forth.

I thought this would be a good time to redo the callback interface code.
The trait interface code uses a foreign-supplied vtable rather than a
single callback and method indexes like the old code used.

Renamed `CALL_PANIC` to `CALL_UNEXPECTED_ERROR` in the foreign bindings
templates.  This matches the name in the Rust code and makes more sense
for foreign trait implementations.

-------------------- TODO -----------------------------

This currently requires "wrapping" the object every time it's passed
across the FFI.  If the foreign code receives a trait object, then
passes it back to Rust.  Rust now has a handle to the foreign impl and
that foreign impl just calls back into Rust.  I think mozilla#1730 could help
solve this.

I think there should be better tests for reference counts, but I'm
waiting until we address the previous issue to implement them.

Python requires a hack to return RustBuffer from callbacks
(RubiconCtypesPatch.py) because of an old ctypes bug
(https://bugs.python.org/issue5710).  The options I can see are:
  - Go back to the callback interface style, where we input a RustBuffer
    and write the output to it.
  - Pass the return type as an out pointer.  This requires some special
    handling for void returns though.
  - Implement mozilla#1779, which changes RustBuffer to a simple `*mut u8` and
    then Python could return it.  This is my preference, I don't like
    the idea of making the ABI worse just because of Python.

TODO: If I make the Python tests fail, I see `RuntimeWarning: memory
leak in callback function` in the console.  I believe this is because of
the rubicon hack, but I'm not sure.  We should double check this before
merging.

Kotlin is not implemented yet.  I think when we do implement it, we're
going to want to face mozilla#1394 and make a decision there.  I don't think
it's reasonable to have consumers have to call a `close()` method on
these interfaces.  I also don't know how we would deal with this in
Rust.  ISTM that the same logic that says Kotlin objects need to be
manually closed dictates that if you use a Kotlin object in Rust, there
should be a `close()` method or something analogous to a to
`AutoCloseable`.

The Ruby coverall tests are now failing because they don't implement
trait interfaces.  We need to figure out a way for them to pass the test
suite.

Remove all the debugging printouts
bendk added a commit to bendk/uniffi-rs that referenced this issue Oct 6, 2023
This adds support for the foreign bindings side to implement Rust traits
and for both sides to pass `Arc<dyn [Trait]>` objects back and forth.

I thought this would be a good time to redo the callback interface code.
The trait interface code uses a foreign-supplied vtable rather than a
single callback and method indexes like the old code used.

Renamed `CALL_PANIC` to `CALL_UNEXPECTED_ERROR` in the foreign bindings
templates.  This matches the name in the Rust code and makes more sense
for foreign trait implementations.

-------------------- TODO -----------------------------

This currently requires "wrapping" the object every time it's passed
across the FFI.  If the foreign code receives a trait object, then
passes it back to Rust.  Rust now has a handle to the foreign impl and
that foreign impl just calls back into Rust.  I think mozilla#1730 could help
solve this.

I think there should be better tests for reference counts, but I'm
waiting until we address the previous issue to implement them.

Python requires a hack to return RustBuffer from callbacks
(RubiconCtypesPatch.py) because of an old ctypes bug
(https://bugs.python.org/issue5710).  The options I can see are:
  - Go back to the callback interface style, where we input a RustBuffer
    and write the output to it.
  - Pass the return type as an out pointer.  This requires some special
    handling for void returns though.
  - Implement mozilla#1779, which changes RustBuffer to a simple `*mut u8` and
    then Python could return it.  This is my preference, I don't like
    the idea of making the ABI worse just because of Python.

TODO: If I make the Python tests fail, I see `RuntimeWarning: memory
leak in callback function` in the console.  I believe this is because of
the rubicon hack, but I'm not sure.  We should double check this before
merging.

Kotlin is not implemented yet.  I think when we do implement it, we're
going to want to face mozilla#1394 and make a decision there.  I don't think
it's reasonable to have consumers have to call a `close()` method on
these interfaces.  I also don't know how we would deal with this in
Rust.  ISTM that the same logic that says Kotlin objects need to be
manually closed dictates that if you use a Kotlin object in Rust, there
should be a `close()` method or something analogous to a to
`AutoCloseable`.

The Ruby coverall tests are now failing because they don't implement
trait interfaces.  We need to figure out a way for them to pass the test
suite.

Remove all the debugging printouts
Document this
bendk added a commit to bendk/uniffi-rs that referenced this issue Oct 9, 2023
This adds support for the foreign bindings side to implement Rust traits
and for both sides to pass `Arc<dyn [Trait]>` objects back and forth.

I thought this would be a good time to redo the callback interface code.
The trait interface code uses a foreign-supplied vtable rather than a
single callback and method indexes like the old code used.

Unfortunately, we can't directly return the return value on Python
because of an old ctypes bug (https://bugs.python.org/issue5710).
Instead, input an out param for the return type.  The other main
possibility would be to change `RustBuffer` to be a simple `*mut u8`
(mozilla#1779), which would then be returnable by Python.  However, it seems
bad to restrict ourselves from ever returning a struct in the future.
Eventually, we want to stop using `RustBuffer` for all complex data
types and that probably means using a struct instead in some cases.

Renamed `CALL_PANIC` to `CALL_UNEXPECTED_ERROR` in the foreign bindings
templates.  This matches the name in the Rust code and makes more sense
for foreign trait implementations.

-------------------- TODO -----------------------------

This currently requires "wrapping" the object every time it's passed
across the FFI.  If the foreign code receives a trait object, then
passes it back to Rust.  Rust now has a handle to the foreign impl and
that foreign impl just calls back into Rust.  I think mozilla#1730 could help
solve this.

I think there should be better tests for reference counts, but I'm
waiting until we address the previous issue to implement them.

Kotlin is not implemented yet.  I think when we do implement it, we're
going to want to face mozilla#1394 and make a decision there.  I don't think
it's reasonable to have consumers have to call a `close()` method on
these interfaces.  I also don't know how we would deal with this in
Rust.  ISTM that the same logic that says Kotlin objects need to be
manually closed dictates that if you use a Kotlin object in Rust, there
should be a `close()` method or something analogous to a to
`AutoCloseable`.

The Ruby coverall tests are now failing because they don't implement
trait interfaces.  We need to figure out a way for them to pass the test
suite.

Remove all the debugging printouts
Document this
@bendk
Copy link
Contributor Author

bendk commented Oct 10, 2023

#1787 has a proposal to handle this.

@jplatte
Copy link
Collaborator

jplatte commented Dec 7, 2023

#1787 was merged, should this be closed?

@bendk bendk closed this as completed Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants