Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reuse KsqlClient instance for inter node requests #5742

Merged
merged 5 commits into from
Jul 6, 2020

Conversation

AlanConfluent
Copy link
Member

Description

Fixes: #5600

Previously a new KsqlClient instance was being created for each inter node request (e.g. pull query forwarding). There is a KsqlClient constructor which takes a sharedClient instance but this was never being used in the production code path.

This PR does the following:

Each server now maintains a KsqlClient instance which is used for all inter node requests
The server's Vert.x instance is passed to the KsqlClient instance.

I'm completing this existing PR #5624

Testing done

Describe the testing strategy. Unit and integration tests are expected for any behavior changes.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@AlanConfluent AlanConfluent requested a review from a team as a code owner June 30, 2020 23:14
@vpapavas
Copy link
Member

vpapavas commented Jul 1, 2020

How has the fix been tested? @purplefox in his PR said we should test with a long running pull query soak test. Moreover, can we add a unit test that issues a few pull and push queries and make sure that only a single ksql client is created?

Copy link
Contributor

@purplefox purplefox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I also think we should do some kind of load test to make sure this fix works under load.
The original problem that this should fix manifests under a lot of concurrency (thousands of concurrent pull query requests) where some are forwarded to other nodes. A simple wrk test and dial up the concurrency should do: First reproduce the issue with ksqlDB built without this PR, then verify it's gone with this PR.

@AlanConfluent
Copy link
Member Author

How has the fix been tested? @purplefox in his PR said we should test with a long running pull query soak test. Moreover, can we add a unit test that issues a few pull and push queries and make sure that only a single ksql client is created?

I'm not sure entirely how we could very easily verify that only one client is created in a unit test. That might get a little complex. From my human static analysis, I can see there's just one place in the code now that creates a KsqlClient. I think the more interesting and useful thing to do is verify that using one client has the effect we want, to remove a bottleneck.

LGTM, I also think we should do some kind of load test to make sure this fix works under load.
The original problem that this should fix manifests under a lot of concurrency (thousands of concurrent pull query requests) where some are forwarded to other nodes. A simple wrk test and dial up the concurrency should do: First reproduce the issue with ksqlDB built without this PR, then verify it's gone with this PR.

I'll do this manually to begin with. It might also be good to have a wrk soak test that turns up a lot of concurrency @vpapavas . We have that in our plans anyway. It can also verify and ensure this doesn't get regressed.

Copy link
Member

@vpapavas vpapavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AlanConfluent
Copy link
Member Author

I ran many local tests, including using cpulimit to try to hit more of these thread blocking issues, since they're often hard to repro on a recent Mac.

I have verified that with the change, I see no WARNING: You're already on a Vert.x context, are you sure you want to create a new Vertx instance?.

I still see some WARNING: Thread Thread[vert.x-eventloop-thread-0,5,main]=Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 12331 ms, time limit is 2000 ms, but I believe fewer of them.

I'm going to merge this change.

@AlanConfluent AlanConfluent merged commit cd7f540 into confluentinc:master Jul 6, 2020
AlanConfluent added a commit to AlanConfluent/ksql that referenced this pull request Jul 17, 2020
AlanConfluent added a commit that referenced this pull request Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thread blocks and multiple vert.x context issues when using ksqldb in multi-node cluster
3 participants