Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to tokio v0.2 and std::future::Future #1142

Closed
LucioFranco opened this issue Nov 4, 2019 · 20 comments · Fixed by #1922
Closed

Upgrade to tokio v0.2 and std::future::Future #1142

LucioFranco opened this issue Nov 4, 2019 · 20 comments · Fixed by #1922
Labels
domain: networking Anything related to Vector's networking domain: topology Anything related to Vector's topology code type: enhancement A value-adding code change that enhances its existing functionality. type: task Generic non-code related tasks type: tech debt A code change that does not add user value.

Comments

@LucioFranco
Copy link
Contributor

LucioFranco commented Nov 4, 2019

Background

Current runtime and futures version

Originally, when we first started working on vector only futures 0.1 and tokio 0.1 was released. There was no language level support for futures nor async/await. In that time since we first started working on vector, we have seen the addition of std::future::Future, std::task::{Context, Poll}, and the async_await feature (which lands on stable Nov 7th).

Async/await itself is a huge shift in how we write futures and in my experience has reduced the amount of code by up to 40-60%. The biggest benefit is that it allows us to write more "rusty" code because we can now borrow accross yield points which was not possible before without a lot of work.

Tokio 0.2

As some of you may have seen tokio has recently implemented a new scheduler along with its upgrade from futures 0.1 to std::future::Future. This new scheduler has a couple of improvements that impact us directly. Notably less aggressive work-stealing and better optimizations around sendings on channels. You can read more in this blog post.

The biggest issue with moving to tokio 0.2 will be the fact that we have a lot of code that uses tokio-reactor 0.1. All tokio 0.2 runtimes do not provide access to this reactor and thus our code will not work if we just swap out runtimes. To fix this issue the tokio team has written a tokio-compat runtime which uses the tokio 0.2 executor/scheduler, tokio-net (which is the new reactor for 0.2), tokio-timer 0.2 and a background tokio-reactor 0.1/tokio-timer 0.1. This allows us to execute futures that are written in both futures 0.1 and std::future::Future as well as futures that are from tokio 0.1 and tokio 0.2.

Performance

During my benchmarking this weekend I was able to see up to 6x performance increases and on average at least a 100% increase in performance in just upgrading the executor to tokio 0.2.

Benchmarks comparing tokio v0.1.22 and tokio-compat rev f64aee/tokio rev e699d4653

Note: tests with a change +/-5 were ignored.

group                                                     tokio01                                 tokio02
-----                                                     -------                                 -------
batch/no compression 10mb with 2mb batches                1.11      8.3±0.24ms  1154.7 MB/sec     1.00      7.4±0.14ms  1284.5 MB/sec
elasticsearch_indexes/dynamic                             1.12    606.0±0.99ns        ? B/sec     1.00    542.1±3.66ns        ? B/sec
elasticsearch_indexes/static                              1.23    110.2±0.29ns        ? B/sec     1.00     89.8±0.22ns        ? B/sec
http/http_gzip                                            2.01    231.9±6.54ms    41.1 MB/sec     1.00    115.6±3.89ms    82.5 MB/sec
http/http_no_compression                                  2.00   227.6±10.62ms    41.9 MB/sec     1.00    114.0±3.08ms    83.7 MB/sec
interconnected                                            1.12    274.5±6.67ms    69.5 MB/sec     1.00    244.9±3.68ms    77.9 MB/sec
partitioned_batch/no compression 10mb with 2mb batches    1.08     12.5±0.33ms   760.4 MB/sec     1.00     11.6±0.33ms   819.5 MB/sec
pipe                                                      3.19    361.0±1.70ms    26.4 MB/sec     1.00    113.3±2.55ms    84.2 MB/sec
pipe_with_huge_lines                                      1.16   1028.3±4.74ms   185.5 MB/sec     1.00   883.8±14.71ms   215.8 MB/sec
pipe_with_many_writers                                    6.09   717.5±18.69ms    13.3 MB/sec     1.00    117.9±1.14ms    80.9 MB/sec
pipe_with_tiny_lines                                      3.43    305.7±6.16ms   319.5 KB/sec     1.00     89.1±0.65ms  1096.0 KB/sec
transforms                                                3.09    556.9±3.29ms    18.8 MB/sec     1.00    180.0±2.99ms    58.3 MB/sec
unflatten array                                           1.06  1636.0±136.63ns        ? B/sec    1.00   1540.8±9.39ns        ? B/sec

A c5.4xlarge was used. Code can be found here https://github.com/timberio/vector/tree/lucio/compat-bench

Some of these performance numbers are extremely good and slightly confirm some of the theories we've had about certain performance issues that we've seen with vector.

Proposal

I suggest that we start moving to the tokio-compat runtime which will allow us to support tokio 0.1 and tokio 0.2 items. This will also enable us to spawn std::future::Future and future 0.1 futures at the same time. This should then allow us to incrementally move to the new futures version and async/await.

The actual code change can be done in about 10 lines of code since we already refactored the runtimes out in #1098. That said there are a few blockers:

  • The file sink will need a rewrite since it uses tokio 0.1 blocking annotations which are not currently supported in tokio-compat. This is probably the biggest change that we will need but the sink is very new and I don't think its in a very mature state like some of our others. I have a couple ideas how to refactor this sink easily but it should not be much work since the sink is very simple.
  • We have two places where tokio_threadpool::blocking is used which will not work in tokio-compat. We can just upgrade these to the new tokio::executor::thread_pool::blocking fn that was added recently.
  • The last blocking issue is refactoring our Topologly::stop to use our own shutdown mechanism. tokio 0.2 will not provide the same shutdown api that tokio 0.1 did. Our current shutdown code and Topology::stop implementation has already show some issues and this would be good to combine with Implement a proper topology stop strategy #1091.

Beyond these few blockers, I was able to run almost all our tests and most of our benchmarks this weekend with no changes done to them.

Before we move to this new runtime I would also like to ensure that we run it under a week of validation using @lukesteensen's reliability work and run it against our test harness that @binarylogic wrote.

You can look here to see how easy it was to move to the new runtime d56a945

This content would also be a great technical blog post talking about how the new executor improved performance and other novel rust futures-based things.

cc @a-rodin @Jeffail @lukesteensen @binarylogic

@LucioFranco LucioFranco added type: enhancement A value-adding code change that enhances its existing functionality. domain: topology Anything related to Vector's topology code type: tech debt A code change that does not add user value. type: task Generic non-code related tasks domain: networking Anything related to Vector's networking labels Nov 4, 2019
@LucioFranco
Copy link
Contributor Author

cc @hawkw I know you've been working on some of these blocking issues but there might be more info here that you find interesting and could be helpful.

@lukesteensen
Copy link
Member

Sounds great! I think the compat runtime is definitely the way to go. If we can slip it in with minimal changes, we can start to run some tests against that diff alone before tackling any conversions.

You can look here to see how easy it was to move to the new runtime master...lucio/compat-benchdiff-8a2d636766ae1fa1c9124070ce519445R3-R71

It looks like this link is broken.

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 25, 2020

I've started working on this.

So far everything's going great! tokio-compat reduces the footprint required for initial migration, pretty cool stuff.

  • We have two places where tokio_threadpool::blocking is used which will not work in tokio-compat. We can just upgrade these to the new tokio::executor::thread_pool::blocking fn that was added recently.

Looks like in the last version of tokio that's gone. We can use tokio::task::spawn_blocking, but there's a complication: instead of storing the state internally, as tokio_threadpool::blocking used to do, that function now simply returns a future that we have to keep somewhere and poll. This would require us to covert the Sink and Stream impls that rely on tokio_threadpool::blocking into a state-machine (if they're not yet converted) to keep the future. Even though it's more work, I like it much better than the implicit approach of tokio_threadpool::blocking, so I'll just apply the described update. If someone has a more elegant solution in mind - please share!

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 25, 2020

While working on the update, I realized we also need to update src/sinks/util/* and src/sinks/util/* to offer futures 0.3 variants of our trait exts.

@MOZGIII MOZGIII mentioned this issue Feb 25, 2020
7 tasks
@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 25, 2020

In the interest of determining the scope, I propose we add an outline of a complete transition to futures 0.3 and tokio 0.2, and determine which parts of the we should address in the near future (i.e. before/after 1.0). The proposal written by @LucioFranco is a good starting point, but I think we need a kind of roadmap to be able to plan ahead.

@binarylogic
Copy link
Contributor

Agree. There will be some uphill work, which you’ve already done some of. But we need to outline, at least, the first few small changes that are isolated and will result in focused PRs.

@lukesteensen
Copy link
Member

The strategy I've been thinking about is roughly the following:

  1. Get on tokio-compat so we can run 0.3 futures "natively"
  2. Migrate code that's blocked on upgrading important dependencies (primarily hyper) to 0.3 futures
  3. Migrate everything else incrementally, prioritized first by blocked dependency upgrades, then components that can be significantly simplified with async/await, until there are no 0.1 futures left and we're totally on tokio 0.2

Items (1) and (2) seem like the only ones that should be done before 1.0. The last item will happen over time and is less urgent.

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 26, 2020

@lukesteensen I feel like (2) and (3) are the same. I fell like most of the dependencies we'll be able to upgrade pretty easily with futures::compat, but overall the code will still be primarily operating on futures 0.1.

So, this is my suggestion for the plan:

  1. Adopt tokio-compat, do thorough testing for regressions. Do not rewrite anything.

  2. Eliminate tokio 0.1 dependencies that were included in tokio 0.2. Things like tokio-threadpool, tokio-signal and tokio-tls. Also, update timers usage with 0.2 timers. Rewrite the parts minimally, using futures::compat where applicable.

  3. Upgrade tokio dependent crates:

    • tracing
    • trust-dns
    • hyper + h2 and friends
    • hotmic
    • etc...

    At this point we want to eliminate tokio 0.1 as part of this migration process. Otherwise we'll be forced to pull cross-dependencies for tokio 0.1 and 0.2, and this well be a mess - I think we should try avoiding that.
    Rewrite things with future::compat and try switching to using native futures where it makes sense.

  4. Update the lib/codec crate and it's usage. Rewrite the relevant code as necessary, still use futures::compat.

  5. At this point, we have the dependencies updated but our code still uses futures 0.1 and futures::compat mostly. This is when we start with incrementally eliminating futures::compat and switching to futures 0.3. Before we start massively going over the sources/transforms/sinks we should prepare the architectural parts that hold everything together to be compatible with our things being implemented with futures 0.3 and 0.1 - cause that's what we'll have during the transition process. This means, things like fn main, boot/shutdown process and etc.
    Then we can finally rewrite our own components to futures 0.3 and async. While doing so, try to do minimal changes to the implementation logic, unless we decide otherwise on a per-case basis. This is so that the process doesn't take forever and we don't do a full rewrite.

  6. Here we don't have any more code using futures 0.1 at all. Our code is futures 0.3, but it doesn't reach the benefits from having the ability to use async at full potential. Redesign the code where we see fit. This step is optional.

We do this by doing small PRs - smaller than each step - to avoid stalling code. Migrations like this are extremely painful if done entirely in a feature-branch, but if they're merged incrementally - they're actually easy. Just a bit more "turbulence" than usual.

Steps don't have to be sequential, we can actually interleave them. They're more like milestones.

Steps 1 and partially 2 are already in progress - I'm doing some tests to ensure there will be no regressions.

What do you think?

@lukesteensen
Copy link
Member

Yeah, (2) is definitely a subset of (3), I just know that it's the higher priority subset that we want to make sure to tackle sooner rather than later.

I like what you've laid out here! I definitely appreciate the focus on small, incremental steps. Other than making sure we prioritize the hyper stack upgrade, I think you've cover it very well.

@MOZGIII
Copy link
Contributor

MOZGIII commented Feb 26, 2020

As a side topic: I think it'd be a good idea to have a stable list of reviewers for this task.

Let's keep it in this message. Please edit this message and add yourself if you're interested - I'll make sure to add you as a reviewer at all the relevant PRs.

@LucioFranco
Copy link
Contributor Author

Yeah, def link me on migration PRs please.

@lukesteensen
Copy link
Member

I'll leave the in-depth reviews to @LucioFranco, but would definitely like to keep an eye on these.

@LucioFranco
Copy link
Contributor Author

Yeah, this plan looks mostly good. The big thing is our transition of http/hyper/bytes/tower that will cause plenty of issues. This will need to be done quick as well or else we might hit tons of merge conflicts with other work.

@Hoverbear Hoverbear linked a pull request Mar 10, 2020 that will close this issue
7 tasks
@MOZGIII
Copy link
Contributor

MOZGIII commented Mar 18, 2020

Reopening this as it's more of a tracking issue for global progress.

@MOZGIII
Copy link
Contributor

MOZGIII commented Mar 24, 2020

Progress update: we've completed steps 1 and 2 from #1142 (comment). I'm now working on hyper upgrade (#2117) and listing the tokio 0.1 dependent crates for step 3 (#2118).

@MOZGIII
Copy link
Contributor

MOZGIII commented Mar 24, 2020

A list of crates for step 3 is done! See it here: #2118.

@ktff
Copy link
Contributor

ktff commented Apr 7, 2020

Is there anything remaining to be done here before doing #2108?

@LucioFranco
Copy link
Contributor Author

@ktff not that I know of, my plan in general was to start with sinks, so touching the sources now should be fine.

@binarylogic binarylogic assigned ktff and unassigned LucioFranco May 26, 2020
@binarylogic binarylogic removed this from the Tech-Debt Payment #1: Move to Tokio 0.2/Futures 0.3 milestone Jul 20, 2020
@binarylogic
Copy link
Contributor

There's a lot going on here. I'm curious if this isssue is still relevent given the rwork in #2945. @MOZGIII @fanatid can either of you chime and let me know if we can close this? Thanks.

@fanatid
Copy link
Contributor

fanatid commented Aug 10, 2020

While #2945 do not make update to tokio:0.2, I think we can close this issue in favor of #2945 and new issues about removing tokio-compat / eliminating futures:0.1 / etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: networking Anything related to Vector's networking domain: topology Anything related to Vector's topology code type: enhancement A value-adding code change that enhances its existing functionality. type: task Generic non-code related tasks type: tech debt A code change that does not add user value.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants