Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cancelable may leak resources on unsuccessful cancelation #3474

Open
armanbilge opened this issue Mar 1, 2023 · 13 comments
Open

cancelable may leak resources on unsuccessful cancelation #3474

armanbilge opened this issue Mar 1, 2023 · 13 comments

Comments

@armanbilge
Copy link
Member

armanbilge commented Mar 1, 2023

Spinning this out of a Discord observation so that it doesn't get lost. It relates to the new cancelable combinator added in #3460.

This issue is important, because it has led me to believe we cannot currently implement the non-leaking race proposed in #3456.

For motivation, consider this case:

Resource.makeFull { poll =>
  poll(IO.blocking(server.accept()).cancelable(server.shutdown()))
}(socket => IO(socket.close()))

There, we have a blocking call to accept a socket on a server. We can cancel it by shutting down the server. However, if the blocking call succeeds (perhaps in a race condition with shutdown()), and we do get an open socket, we really want to be sure that we close it.

To demonstrate the leak I have added a test in 84e7ef1.

"cancelable should not leak" in ticked { implicit ticker =>
val test = IO.deferred[Unit] flatMap { latch =>
latch.get.uncancelable.cancelable(latch.complete(()).void)
}
test.start.flatMap(f => f.cancel *> f.join).flatMap(_.embedError) must completeAs(())
}

In this case, "cancelation" is simply causing the Deferred to complete normally. So that means the get is completing with (). If that's a resource, we don't want to lose that, so the outcome should be successful, not canceled.

It's a bit weird, but conceptually this is equivalent to a race condition where you request cancellation, but the effect in question succeeds anyway. In that case, it didn't really cancel, and if you pretend like it did, you can leak a resource.

Now, what worries me is that I don't see how we can fix this. And if we can't fix this, I don't see how we can fix #3456.

def cancelable[A](fa: F[A], fin: F[Unit]): F[A] =
uncancelable { poll =>
start(fa) flatMap { fiber =>
poll(fiber.join).onCancel(fin *> fiber.cancel).flatMap(_.embed(poll(canceled *> never)))
}
}

So, the problem is that if cancelation occurs while this fiber is in poll(fiber.join), then it is assumed that it canceled! That means that cancelable can no longer have a successful outcome.

The problem is that until fiber completes post fiber.cancel, and we join it and find out how it completed, then we can't really say for sure whether it actually canceled or not. And if it didn't cancel, and it actually returned a successful result, we have no way to get this out of cancelable anymore. So, leak.

What we are currently lacking is a way to request cancelation, but be prepared for the fact that it might not happen. This is a common pattern: Java's Future#cancel returns a boolean indicating precisely this information. So does io_uring's cancelation mechanism.

In fact, I ran into this same problem when working on fs2-io_uring, and was forced to add a bespoke API to safely lift an async syscall into a resource. For example, a syscall opening/accepting a socket, file, or any other resource.

Not surprisingly, our Async#fromCompletableFuture suffers the exact same problem.

val await = G.onCancel(
poll(get),
// if cannot cancel, fallback to get
G.ifM(lift(delay(cf.cancel(false))))(G.unit, G.void(get))
)

cf.cancel() may return false, indicating that cancelation was not possible. So then we fallback to waiting for its completion via get ... but what if this is a CompletableFuture that is asynchronously waiting for an accepted socket? Leak. So it would not be safe to use this API in a Resource.makeFull(...)(...).

Sorry for wall of problems, no solutions 😕

@durban
Copy link
Contributor

durban commented Mar 3, 2023

1.

I realize there is a bigger picture here, but I tried to make your first example (with the server.accept()) non-leaky. This is what I came up with. Does this solve (this specific) problem?

  def myResource(server: ServerSocketChannel): Resource[IO, SocketChannel] = {
    Resource.makeFull[IO, SocketChannel] { poll =>
      IO.ref[SocketChannel](null).flatMap { ref =>
        poll(
          IO.blocking { server.accept() }.guaranteeCase{
            case Succeeded(sock) => sock.flatMap(ref.set)
            case _ => IO.unit
          }.cancelable(IO { server.close() })
        ).guaranteeCase {
          case Succeeded(_) =>
            IO.unit
          case _ =>
            ref.get.flatMap {
              case null => IO.unit
              case sock => IO { sock.close() }
            }
        }
      }
    } { sock =>
      IO { sock.close() }
    }
  }

2.

I think the test in 84e7ef1 (with the Deferred) is contradictory (I assume it fails right now): you want the fiber executing test to both (1) complete successfully, even if cancelled, and (2) be cancellable. I think for (1), it must be uncancelable (so it contradicts (2)). There is a novella in the MonadCancel scaladoc about cancellation, and it has things like "cancellation is effective" and it can only be suppressed. That's why I think this is contradictory.

However, I don't think you actually need this. What you actually need, is the "if a resource was allocated, it will be closed" guarantee. And i think the socket example shows, that there might be a way to guarantee this without needing the behavior in 84e7ef1.

@armanbilge
Copy link
Member Author

armanbilge commented Mar 3, 2023

Thanks for reading :)


However, I don't think you actually need this.

Ah, well, this is sort of the crux :) You are right, to restore safety in any specific situation, we don't need any of this. You demonstrated how we can workaround it, and my bespoke Uring#bracket API also demonstrates a perfectly safe workaround.

However, this is fundamentally a problem of composition. As they are currently working, APIs like cancelable, race, fromCompletableFuture, (and Uring#call) cannot safely be composed with their finalizer using APIs like Resource.makeFull(...)(...) or even in a custom uncancelable { poll => block. And IMHO this is a big gaping hole.


and it has things like "cancellation is effective" and it can only be suppressed. That's why I think this is contradictory.

Hm, I don't think it's contradictory. In these situations, cancelation would still be effective, we just need additional suppression. Maybe one way to think about it is, if cancelation was attempted, but the effect did not or could not cancel, then that block must be treated as if it was uncancelable i.e. cancelation is suppressed, and the result is passed along.

@durban
Copy link
Contributor

durban commented Mar 5, 2023

I think I'm starting to understand the problem. But I have more questions :-)

  1. The composability thing: I feel I vaguely understand it, but it's still a little unclear. For example, if we look at the server.accept() example: what are, specifically the 2 things, which are non-leaky by themselves, and become leaky when they're composed?
  2. Thinking about how a solution would look like: in the case when server.accept() returns a socket with a concurrent cancel, do you want that socket to be used, or is it sufficient if that socket will be closed immediately after opening?
  3. Isn't this whole thing just about cancelable "acquire"s? (Probably not just, but...) If you have a cancelable acquire, you must be very careful to clean up after yourself when cancelled... which is not surprising!
    3b. What the socket example shows is that cancelable (by itself) is not enough to make a cancelable acquire from a non-cancelable acquire, right? What if cancelable had this signature (although, this would possibly make it too specific):
def cancelable(cancel: IO[Unit], release: A => IO[Unit]): IO[A]

@armanbilge
Copy link
Member Author

what are, sepcifically the 2 things, which are non-leaky by themselves, and become leaky when they're composed?

Hm, let me try to summarize it like this.

  1. The 2 important things are (1) the cancelable acquiring effect and (2) the releasing effect.

  2. The canonical way to compose a cancelable acquiring effect with its finalizer is Resource.makeFull (but bracketFull and raw uncancelable { poll => are relevant as well).

  3. As demonstrated above, acquisitions based on APIs such as cancelable, fromCompletableFuture, race, and io_uring, cannot safely be composed with their finalizer in a Resource.makeFull.

  4. As a workaround, we could add cancelableWithFinalizer, fromCompletableFutureWithFinalizer, raceWithFinalizers1 type of things, for each of the various effects that need special treatment.

  5. An explosion of APIs highly-specific to the effects involved, suggests that the existing APIs don't compose with each other, which suggests that we have a composition problem.


in the case when server.accept() returns a socket with a concurrent cancel, do you want that socket to be used, or is it sufficient if that socket will be closed immediately after opening?

I don't want to prescribe either behavior :) e.g. consider this snippet:

val cancelableAccept: IO[Socket] = ...
IO.uncancelable { poll =>
  poll(cancelableAccept).flatMap { socket =>
    val doUncancelableStuff = ...
    doUncancelableStuff.guarantee(IO(socket.close())
  }
}

If a user writes code like that, then if the socket is accepted, since the rest of the block is uncancelable, they should be able to do whatever they wanted with it.

Although, I agree that in practice, most likely the socket will be closed immediately in this situation.


What the socket example shows is that cancelable (by itself) is not enough to make a cancelable acquire from a non-cancelable acquire, right?

Yes :) essentially, that cancelable may leak (acquired) resources.

What if cancelable had this signature (although, this would possibly make it too specific):

Right, this is possible way to restore safety, similar to my Uring#bracket.

And yes, I do think it makes things pretty specific. Now, any action that must happen if the resource is acquired has to be duplicated in two places: the cancelation handler and the subsequently sequenced effects. This snowballs into a composition problem, because it becomes difficult to sequence uncancelable effects that must be run if the resource is acquired, without passing that all the way down to the original acquiring effect.


If you have a cancelable acquire, you must be very careful to clean up after yourself when cancelled... which is not surprising!

Not entirely sure that I agree :)

Instead I would say: if you have a cancelable acquire, you must be very careful to mask cancelation as soon as you have acquired the resource, so that it is guaranteed that it will be passed along to any subsequently sequenced effects.

Basically, I don't think that cancelable effects should get into the business of knowing how to cleanup the value that they are returning. They should absolutely cleanup any resources created during the acquisition process (e.g. registered callbacks) since those are internal details.

If we disagree, and say that it is reasonable for a cancelable effect to know how to cleanup the value it is returning, then I would say that this cancelable effect should actually be a Resource, because this is the canonical way to pair an acquisition effect with its finalizer. i.e. your fixed API should look like this:

def cancelable(cancel: IO[Unit], release: A => IO[Unit]): Resource[IO, A]

So then, we are basically introducing a bunch of additional Resource constructors (e.g. also for race, fromCompletableFuture, io_uring), since apparently Resource.makeFull(...)(...) and stuff are not sufficient to represent what we need here. And to me that screams that we have a composition problem.

Footnotes

  1. as an exercise: try to design and use an API that lets you do a 3-way race raceWithFinalizers(raceWithFinalizers(..., ...), ...) which finalizes everything appropriately!

@djspiewak
Copy link
Member

So ultimately this distills down to the behavior of Fiber. Something like IO.sleep(1.second).uncancelable.start.flatMap(f => f.join.onCancel(f.cancel)).start.flatMap(_.cancel) is very illustrative. What ends up happening here is the inner-inner fiber completes successfully, but the inner fiber is canceled and thus the result value (in this case, ()) is leaked. We can theoretically detect and recover from this situation in the inner fiber, but not by using the normal mechanisms. Cancelation is already "committed" in a sense, which means that the conventional poll(fa).flatMap(a => ...) approach will not work (the flatMap isn't going to happen).

In a sense, this is just fundamental to how fibers work. You have to be careful to not leak results if you care about them. cancelable does reify this quirk in a very first-class way though, so I get why it's a concern.

cancelable[A](fa: F[A], fin: F[Unit])(onCase: Outcome[F, E, A] => F[Unit]): F[A] is probably the way to close this hole, which is unfortunate.

@armanbilge
Copy link
Member Author

I agree that this is fundamental, but I lean more towards fundamental hole in our model 😅 do you propose to patch fromCompletableFuture, race, io_uring, and any other affected APIs in the same way, by adding an onCase handler? I just don't think this composes well. So forgive me for feeling dissatisfied :)

If I may, here's a naïve sketch of how we might fix it. The rough idea is that:

  1. cancelable should be its own "primitive", that pairs an effect fa: F[A] with some external cancel mechanism fin: F[Unit].

  2. The assumption is that fa will always complete somehow (e.g. externally canceling it will cause it to complete in error). In that case, fa can actually be treated as uncancelable, so that the fiber is masked while it is executing.

  3. Meanwhile, its external cancelation fin: F[Unit] should be stored somewhere in the fiber, such that invoking fiber.cancel would actually invoke fin (in addition to setting canceled = true). And then because the fiber is actually masked, the canceler would simply join and await the final outcome.

The goal of all this is to reuse the existing mechanism for uncancelability, which ensures that the final outcome will be passed along, regardless of whether cancelation was requested or not.

Note that under this scheme, cancelable could become a method on MonadCancel, which seems natural :)

From a backwards-compatibility standpoint, this change would be a bit on shaky footing. I think a reasonable default implementation could be for cancelable(ioa, fin) == ioa.

@djspiewak
Copy link
Member

That proposal won't fix it though, because once a fiber is canceled, you can't get access to results anymore. Note in my example from previously. When the inner fiber is canceled, it can only access the results of the inner inner fiber by directly joining that fiber. We can't sequence the results through flatMap anymore… because we were canceled!

@armanbilge
Copy link
Member Author

armanbilge commented Mar 8, 2023

That proposal won't fix it though, because once a fiber is canceled, you can't get access to results anymore

Hm, not sure I understand :)

IO.uncancelable { _ =>
  gate.get *> IO.pure(42)
}

And the finalizer is gate.complete(()).

To be clear, are you saying that the 42 is lost forever if that uncancelable fiber is canceled?

Edit: to be very very clear: I'm proposing that cancelable be implemented without fibers. It is a primitive, that runs an action uncancelably, and exposes some finalizer as the cancel method of the fiber its running on. So like my example above would be cancelable(gate.get *> IO.pure(42), gate.complete(()).

@armanbilge
Copy link
Member Author

That proposal won't fix it though

Here you go :)

@durban
Copy link
Contributor

durban commented Apr 3, 2023

@armanbilge

essentially, that cancelable may leak (acquired) resources

I don't think that's entirely fair to say. It's like saying that flatMap may leak resources: sure, but that's why you put it inside an uncancelable. But I realize there is an actual use case here, which has problems...

Instead I would say: if you have a cancelable acquire, you must be very careful to mask cancelation as soon as you have acquired the resource, so that it is guaranteed that it will be passed along to any subsequently sequenced effects.

Yes, this is a very clear explanation, and this seems to be the essence of the issue (as I understand it). (And note, that this also does not guarantee, that there will be no leak: it just makes it possible for the user to handle it safely; but if they just flatMap it, it can still leak... that's just how it is.)

I've also looked at #3491, and it's not clear to me, how it would help with race. (Or it may not want to do that... is it just for cancelable?) I think I understand how it helps cancelable; I'm just not sure this can be generalized to race and others... e.g., if I race(acquire1, acquire2), when any of the acquires actually acquires, it would have to not just make itself uncancelable, but also the parent fiber atomically. (Which may be possible... somehow... but I'm not sure.)

@armanbilge
Copy link
Member Author

(And note, that this also does not guarantee, that there will be no leak: it just makes it possible for the user to handle it safely; but if they just flatMap it, it can still leak... that's just how it is.)

💯 yes, very well put. The key thing is to make it possible to be handled safely e.g. by putting it inside of a Resource.makeFull(...)(...).


I'm just not sure this can be generalized to race and others...

It doesn't need to be further generalized :) it's exactly what we need to fix race. I just pushed 1b20cb7 and f0a4c16 demonstrating what this looks like for racePair.

@djspiewak
Copy link
Member

Coming back to this…

As a baseline, I'm very leery adding something like cancelable as a primitive. Having an arbitrary inversion in the mask algebra is really difficult to reason about, because suddenly the lattice is no longer monotone. I'm not at all confident that we could think our way through all the different corner cases on this one, certainly not in any reasonable amount of time. It's a thing we could ponder for the far-future (CE4?), but not the immediate one.

So that leaves us with a couple options:

  • We could remove cancelable again, though in a sense some of this hole fundamentally remains, since it's pretty easy to demonstrate the leak just by nesting fibers, and thus we have similar subtleties in other fiber-related combinators like racePair and (probably) memoize
  • Another option is to add the onCase parameter I suggested above. This fixes cancelable (at the cost of some composability), though arguably it does nothing for the other scenarios where this could arise. This might be okay though, since if you're working with fibers directly, it's not at all hard to avoid this problem by having an out-of-band mechanism for threading results. For this reason, I suspect Fs2's implementation is mostly immune to this issue, since it almost never uses join.
  • We could add a more tightly scoped primitive, onCancelMasked (subject to bikeshed), which unlike onCancel would indicate when the current fiber is canceled regardless of whether it is masked. Really, this is the crux of the difficulty here, since it's impossible to observe cancelation without making it effective (and vice versa), so we have no way of allowing cancelation to be "requested" and allowed/denied based on some more complex semantic (such as whether or not the socket was actually opened). onCancelMasked would, in theory, allow us to do that, since it would allow us to implement cancelable without fibers as a derived combinator. Same issues with bincompat though.

The final possible option is simply this: we could ignore this whole mess. As @durban pointed out, using cancelable on a resource acquisition is really dangerous and you can't necessarily expect that it's going to Just Work™. If you really must make a an acquisition cancelable, you should be prepared to take extra steps to ensure safety. This is true with things like bracketFull/makeFull (where you often need to put finalizers within your acquisition logic to detect whether you were canceled early). We can simply say that this is also true about cancelable. For example, fixing your socket logic above:

Resource.makeFull { poll =>
  IO.deferred[Outcome[IO, Throwable, Socket]] flatMap { d =>
    val polled = poll { 
      IO.blocking(server.accept()).guaranteeCase(d.complete(_).void).cancelable(server.shutdown())
    }

    polled onCancel {
      d.get flatMap {
        case Outcome.Completed(ios) => ios.flatMap(s => IO(s.close()))
        case _ => IO.unit
      }
    }
  }
}(socket => IO(socket.close()))

Put another way, this would be accepting (and obviously documenting) that cancelable does not have the same atomicity that poll has, and thus users need to be especially careful with how they treat results when those results involve resources that must be released. I think this is not an entirely unintuitive limitation, since cancelable is effectively empowering the user to build their own cancelation protocol, and so they're effectively very much on their own when it comes to making sure this protocol is safe.

@armanbilge
Copy link
Member Author

armanbilge commented Apr 15, 2023

As a baseline, I'm very leery adding something like cancelable as a primitive.

Yep, I totally get this :) but here's the thing: my proposal is no longer just about cancelable. Edit: sorry, re-reading I see you recognized and acknowledged that above, I read too hastily 😇

  1. If you look at my PR in POC: plugging the cancelable leak #3491, I actually demonstrate existing leaks in racePair and fromCompletableFuture and patch them using the improved cancelable.

  2. Furthermore without this fix, then making the changes proposed in Introduce "inclusive" race as new default semantic #3456 is actually worthless, because as soon as you have a race(race(..., ...), ...) then you will be leaking all over again.

  3. Finally, this model of cancelation matches how the system I/O primitives work, and since doing I/O better is a big point of Cats Effect, I think this is rather important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants