Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added TestControl, with first-class support for time mocking #2276

Merged
merged 31 commits into from
Sep 20, 2021

Conversation

djspiewak
Copy link
Member

Fixes #2220
Fixes #1678

Time mocking is back and better than ever! Yay!!! Thoughts very much welcome. Folks who have expressed an interest in this include @SystemFw, @ChristopherDavenport, @benhutchison, and many others. Now is the time to give feedback! I tried to be comprehensive in the TestControl scaladoc.

Note that this also includes a binary-compatible (but source- and semantically-incompatible) reworking of TestContext. The previous API conflated too many concerns and was very error-prone. Time advancement and schedule ticking is now fully separated. Also, as a bonus, I found (and fixed!) a bug in tickAll which was causing all of our test suites to take dramatically longer than necessary, and now they run much faster both in CI and on my laptop. Drinks all around.

The primary concept here is that TestControl represents a relatively low level mechanism for controlling the runtime. Higher level mechanisms should be built on top of this and integrated within specific downstream test frameworks (such as munit). Note that, just as before, there's still no substitute for running your program on the real runtime, and there are several programs which will deadlock on TestContext but not on the real thing. However, for anything involving sleep, this kind of thing is basically indispensable.

In theory this should be a lot more convenient than what we had to do in CE2 (creating artificial Clock instances), since the runtime is automatically threaded through at all points, meaning that you can just use the normal Clock and Temporal instances and everything will just work out of the box. In fact, time mocking even works on the bare IO functions (e.g. IO.sleep, IO.monotonic, and IO.realTime).

As a final goodie, I added support for configuring the source of randomness used in TestContext, similar to how ScalaCheck supports optionally setting the seed as a base64-encoded value. I would imagine that downstream test framework support will print out the seed in any error messages, as well as provide an API for optionally configuring that seed and passing it to the TestControl they use internally to run your examples.

As I said, now is the time for feedback! This is a pretty meaningful API addition and I'd like to make sure we get it right.

@djspiewak
Copy link
Member Author

Published as 3.3-7-579f468 so that we can experiment in downstream libraries like https://github.com/typelevel/munit-cats-effect and https://github.com/typelevel/cats-effect-testing.

Copy link
Member

@vasilmkd vasilmkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited. Marking this as request changes but the changes are pretty minor.

Regarding the binary compatibility, because of the change in behavior, at least to me, the whole situation seems to give a false sense of security that this could be a drop in replacement.

Is there a way to make the private[testkit] implementations delegate to the new implementations in a way that is semantically correct in the new model? For example, do the tick() then advanceAndTick() trick or something.

If the answer is no, maybe it would indeed be better to not guarantee binary compatibility at all and just advise people to upgrade, recompile, fix source changes and change to the new semantics at the same time. Of course, you're much more experienced than me in this regard, so I won't fight any final decision.

@djspiewak
Copy link
Member Author

djspiewak commented Aug 30, 2021

Noodled a bit on some Specs2 support built on top of this API: typelevel/cats-effect-testing#193

"execute a simple test with mock time" in {
  var first = false
  var second = false
  var third = false
  val wait = IO.sleep(1.hour)

  val program = for {
    _ <- IO { first = true }
    _ <- wait
    _ <- IO { second = true }
    _ <- wait
    _ <- IO { third = true }
  } yield 42

  program must execute { (control, result) =>
    result() must beNone

    first must beFalse
    second must beFalse
    third must beFalse

    control.tick()

    first must beTrue
    second must beFalse
    third must beFalse

    control.advanceAndTick(1.hour)

    first must beTrue
    second must beTrue
    third must beFalse

    control.advanceAndTick(1.hour)

    first must beTrue
    second must beTrue
    third must beTrue

    result() must beSome(beRight(42))
  }
}

@djspiewak djspiewak added this to the v3.3.0 milestone Aug 30, 2021
* where the scheduler ticks see different task lists despite
* identical configuration.
*/
def apply(seed: String): TestContext =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusion of a seed is a very good idea 👍

@djspiewak djspiewak marked this pull request as ready for review August 31, 2021 12:15
* rely on `realTime` and `monotonic`, either directly on `IO` or via the
* typeclass abstractions.
*
* WARNING: ''Never'' use this runtime on programs which use the [[IO#evalOn]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth adding a sentence about behaviour of blocking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. To more directly answer, it behaves like JavaScript: the blocking pool and the compute pool are both TestContext.

@djspiewak
Copy link
Member Author

Bumping this a bit, @SystemFw. Any follow-ups needed here? Should we look at the munit integration first?

@SystemFw
Copy link
Contributor

SystemFw commented Sep 8, 2021

Yeah munit integration would be good, and tbh I think it'd be great to have a snapshot to try it out. I'm happy with it though, I hadn't realised that wasn't clear

@djspiewak
Copy link
Member Author

Actually published a snapshot quite a while ago. 😃 That's what the Specs2 integration is based on. I'll dig into munit.

@djspiewak
Copy link
Member Author

New snapshot published as 3.3-115-fab668b

@djspiewak
Copy link
Member Author

Odd failure:

[info] IO monad should
2869
[error]   ! evaluate fibers correctly in presence of a parasitic execution context
2870
[error]    java.util.concurrent.TimeoutException: null (Runners.scala:109)
2871
[error] cats.effect.Runners.$anonfun$timeout$1(Runners.scala:109)
2872
[info] Total for specification ParasiticECSpec

@vasilmkd
Copy link
Member

vasilmkd commented Sep 8, 2021

Might not be an odd failure. The test uses ticker to evaluate equality between generated IO values.

Copy link
Member

@vasilmkd vasilmkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am amazed by this PR and the scaladoc is very informative. Thank you for your effort. 🙏🏻

One question still remains, do people think we need to build a higher level ADT for making the different outcomes more explicit. I'm personally worried about Left(CancelationException) not being great from an ergonomics point of view.

Comment on lines +31 to +36
val test = {
implicit val ticker = Ticker()

val test = IO(implicitly[Arbitrary[IO[Int]]].arbitrary.sample.get).flatMap { io =>
IO.delay(io.eqv(io))
IO(implicitly[Arbitrary[IO[Int]]].arbitrary.sample.get).flatMap { io =>
IO.delay(io.eqv(io))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this change can be reverted.

@Baccata
Copy link
Contributor

Baccata commented Sep 11, 2021

One question still remains, do people think we need to build a higher level ADT for making the different outcomes more explicit. I'm personally worried about Left(CancelationException) not being great from an ergonomics point of view.

I agree. Even Option[Outcome[Id, Throwable, A]] would be more consistent with the rest of CE

@vasilmkd
Copy link
Member

I don't hate that. I'm not sure if Daniel considers it a solution if we only do a last-mile wrapping of the Option[Either[Throwable, A]] into a Option[Outcome[Id, Throwable, A]].

@Baccata
Copy link
Contributor

Baccata commented Sep 11, 2021

As long as cancellation is not surfaced as a cancellation exception, I'm happy 😉

@vasilmkd
Copy link
Member

Most definitely, I'm really not comfortable with that.

@djspiewak
Copy link
Member Author

Okay okay. 😛 I'll try the Option if Outcome approach…

@SystemFw
Copy link
Contributor

Well, remember that the UX here is writing assertions, and I think Outcome might actually be the worst UX, given that writing assertions that involve pattern matching is generally a bit annoying. In some ways the most immediate return type is actually IO[Option[A]] (and CancelationException plus other exceptions getting thrown), which integrates with every test framework and favours the most common case

P.S. I'm integrating this into upperbound and the experience is otherwise great, especially the fact that you can just call TestControl.executeFully with no other changes to munit-cats-effect nor a specific test setup beyond basic IO suites

@djspiewak
Copy link
Member Author

See Discord for a long diatribe on why we replaced executeFully with executeEmbed. :-)

@djspiewak
Copy link
Member Author

Published as 3.3-161-09d830c

@SystemFw
Copy link
Contributor

Ok, so I've tried executeEmbed

TL;DR: it's much better, but we should remove the Option from there as well.

Note that I'm only talking executeEmbed here, not stepping through. Here are my arguments (some of them might be repeated from Discord):

  • Most tests will only care about the actual results, which makes dealing with an Option pointless boilerplate.
  • If you are calling executeEmbed, you are running the IO to completion, and nontermination is an exceptional outcome, and should therefore be represented as an exception. Note that this isn't the case with results inside execute, since you are legitimately sampling there.
  • It's ok for the resulting Exception to be defined inside TestControl, e.g. as TestControl.NonTerminationException. This may appear bespoke, but being able to materialise nontermination (currently as an Option) is bespoke to TestControl, so a bespoke exception actually seems very appropriate.
  • Option is bad UX on failed tests . This might actually be the strongest argument. With Option, a non terminating program will fail as either The result was None or NoSuchElementException. Compare to a proper exception: TestControl.NonTerminationException: the program you have run through TestControl never completed, which might indicate deadlock etc, see these docs etc.

Hence, having executeEmbed: F[A] makes successful tests easier to write, and failed tests easier to understand, and should be preferred to the current API.

@djspiewak
Copy link
Member Author

If you are calling executeEmbed, you are running the IO to completion, and nontermination is an exceptional outcome, and should therefore be represented as an exception. Note that this isn't the case with results inside execute, since you are legitimately sampling there.

I was thinking about it over the past couple days and came round to a similar opinion. I think this is probably the most compelling argument. Also I agree with the counterpoint on "bespokeness".

Option is bad UX on failed tests . This might actually be the strongest argument. With Option, a non terminating program will fail as either The result was None or NoSuchElementException. Compare to a proper exception: TestControl.NonTerminationException: the program you have run through TestControl never completed, which might indicate deadlock etc, see these docs etc.

I think it gets even worse than that if someone writes their assertions within the IO, but then that IO deadlocks before a failed assertion gets hit, and then they discard the Option (e.g. by just returning the results of executeEmbed from test). In that scenario, there's no failure, but the test is definitely failing. Arguably that's a test where someone failed to account for every possible failure scenario (in this case, nontermination), but the UX should guide them towards checking for such.

So overall I think I'm in alignment. I'll make the change.

@djspiewak
Copy link
Member Author

Changes released in 3.3-162-2022ef9

@djspiewak
Copy link
Member Author

Final thoughts @SystemFw or others?

@SystemFw
Copy link
Contributor

Really excited this is pretty much done :)
I'm happy with the api, are we set on names? I was thinking about run and stepThrough, but I wouldn't be mad if we kept the current names at all

@djspiewak
Copy link
Member Author

I actually really like the names. Honestly, run strikes me as a bit too innocuous. 🙂 Given the weirdness it's performing, I want people to be aware that they're flirting with dragons. Also the documentation pretty consistently refers to "execute" as the verb when running an IO (despite the fact that the production prefix is unsafeRun), so it fits pretty well.

The "Embed" suffix also works really well when comparing with the Fiber api, which is not an incorrect comparison. It also nudges people a bit to consider the stepping API first, which is good IMO even if it isn't the most commonly optimal function, simply because it invites people to understand a bit better what's going on under the surface.

I'm bikeshedding. 😃 Names matter but probably not as much as I'm implicitly asserting here. I just rather like the current ones.

@SystemFw
Copy link
Contributor

I just rather like the current ones.

That's ok, let's go ahead with the current ones :)

@djspiewak djspiewak dismissed vasilmkd’s stale review September 20, 2021 20:38

I think this is a little stale now, and all comments have been addressed

@djspiewak djspiewak merged commit a2f2fb7 into typelevel:series/3.x Sep 20, 2021
@bastewart
Copy link
Contributor

bastewart commented Oct 22, 2021

Useless comment: just wanted to say I really like this change (i've "borrowed" it temporarily, pending a release of 3.3)! It's a huge improvement over the hand-rolling required in the past to use TestContext. Very unrelated as well but I think it's a big validation of the design of CE3 that TestControl just exposes a function IO ~> IO and end-users don't have to do anything like overwriting the ContextShift and Timer.

@yanns
Copy link
Contributor

yanns commented Nov 28, 2021

This is a really great addition!!! 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants