-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Tokio compat #1922
perf: Tokio compat #1922
Conversation
I've got a test that failed once, and didn't faile the second time I ran
This is going to be fun. |
7f6637a
to
647b5d1
Compare
@MOZGIII the max size test does fail occasionally but we should investigate that. As for the statsd test we should enable tracing in ci via |
Ok! For the purposes of this PR I'm ignoring it.
Do you want this only for this branch, or everywhere? |
So, the plan for this PR is the following:
After that, we should be ready to merge! |
#1935 this is a candidate for merging in after we fix file sink. |
I think its fine to add the trace init anywhere we have issues, its fine to also leave it since they can be useful. |
a8515e6
to
45a2148
Compare
45a2148
to
b7addb5
Compare
Now |
Successful run: https://circleci.com/gh/timberio/vector/86075 |
a5889e0
to
9f01906
Compare
I got lucky and reproduced the fds error locally.
I looked at the bench, and it looks like the bench itself is causing the error, not |
I got to a point where I see some consistency: bench on |
|
This might have to do with CI noisiness? |
Locally I have observed the same before - sometimes it worked, sometimes it failed. Can't reproduce it locally anymore. 😞
It is the reason the process crashes, however it's not necessarily the root cause. Something else might be eating up all the fds. |
I ran the process under the debugger properly. This is Definitely looks like it's the pipes that are leaking. Looking at the output of the bench command, this makes a lot of sense.
This doesn't necessarily means the pipes are special - maybe TCP sockets leak too, but at least with pipes we could observe this. I'll try to test whether it's specifically the pipes that cause us these issues. |
@LucioFranco I have this view: Would it be useful to us to team up and have a call to look at the stacks? You know what to expect from |
I managed to isolate the issue! 🎉 https://github.com/timberio/tokio-fds-issue The code in that repo demonstrates exactly the same behavior:
|
Created tokio-rs/tokio-compat#27 |
@MOZGIII sounds good, feel free to set this PR to ready for review and I'll give it a review :) |
Very happy about that. Thanks for pushing through. Looking forward to finishing off this migration. |
I think we should double-check the data that we have I also think we should change the approach to doing such benchmarking tests. Switching to a baremetal server should produce more consistent results. @Hoverbear mentioned PingCap have had issues with benching in the cloud. I can double that, and I've had inconsistencies with benching on different on-premise datacenter nodes, so I agree that using a single (or a couple) bare-metal server for benches is a better way. |
Great PR! Please pay attention to the following items before merging: Files matching
Files matching
This is an automatically generated QA checklist based on modified files |
We can discuss that separately. I personally think that is overkill for our current needs. I haven't seen large inconsistencies in the results. If you can demonstrate the problem then that can serve as the basis for the discussion. Let's work with actual data and examples. The reason the test harness didn't contain the results is that the partitions were not discovered. This requires an
This is quite a bit slower. I'm going to try and test a result off of this branch. |
I agree!
Oh wow, clearly |
Is this the correct version?
|
No, that would be a run from vectordotdev/vector-test-harness-github-actions-test-repo#3 |
After rebase:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks 👍, just need to rebase against master and we can merge!
Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: MOZGIII <[email protected]>
…_statsd Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: MOZGIII <[email protected]>
Great PR! Please pay attention to the following items before merging: Files matching
Files matching
This is an automatically generated QA checklist based on modified files |
I'm about to merge this! Yesterday the test harness was broken, but it's now fixed and I've gathered some results with it too for completeness.
Note how the base performance is much worse than |
This PR follows the proposal for #1142
It switches our runtime to tokio 0.2 with very few changes, and the value in this is that we can run benchmarks and performance tests to ensure there's no degradation from upgrading to the new tokio reactor.
Addresses #1695 and #1696.
Current state (updated as we go):
test_max_size
andtest_max_size_resume
tests failing (it was happening from before, perf: Tokio compat #1922 (comment))test_source_panic
(we concluded it was failing before, perf: Tokio compat #1922 (comment))sinks::statsd::test::test_send_to_statsd
failing (see tokio-compat: sinks::statsd::test::test_send_to_statsd fails tracking issue #2016 and chore(statsd sink): Fix race condition in test #2026)There's a
tokio-compat-debug
branch that I use to dump the trashed code version altered with extensive debugging. I'm only using it to run the code against the CI, since my local setup doesn't reproduce the issues., and that branch isn't supposed to be merged. Rather, we'll just take the end results from it, if there are any.