Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to change the worker thread number in-flight #1855

Merged
merged 16 commits into from
Oct 30, 2023

Conversation

git-hulk
Copy link
Member

@git-hulk git-hulk commented Oct 24, 2023

This closes #1802

Scaling up is as simple as starting new threads and adding them to the worker pool.
But scaling down is a bit complex since we need to migrate existing connections from
the to-be-removed workers to survived workers. This process is NOT so easy for the sake
of the blocking commands that will store its context(includes fd and owner pointer) in
different global variables and we need to find out and remove them. As far as I know,
it contains the below commands:

  • All LIST blocking command
  • Subscribe/PSubscribe
  • XREAD

To make it simple, we would like to close those connections if the last command is
a blocking command. And we can improve this if any users complain about this.

@git-hulk git-hulk force-pushed the feature/dynamic-workers branch from a6d8ee6 to c30ce43 Compare October 24, 2023 13:27
@guojidan
Copy link
Contributor

nice, long connect will not disconnect when change workers, I use redis-benchmark continuous write

@git-hulk
Copy link
Member Author

nice, long connect will not disconnect when change workers, I use redis-benchmark continuous write

Thank you! I'm working on test cases.

@mapleFU
Copy link
Member

mapleFU commented Oct 26, 2023

Should we prevent from setting the thread-cnt to a dangerous value( .e.g 0)?

@git-hulk
Copy link
Member Author

Should we prevent from setting the thread-cnt to a dangerous value( .e.g 0)?

@mapleFU Yes, IntField will check the value range before setting.

{"workers", true, new IntField(&workers, 8, 1, 256)},

Comment on lines +364 to +369
{"workers",
[](Server *srv, const std::string &k, const std::string &v) -> Status {
if (!srv) return Status::OK();
srv->AdjustWorkerThreads();
return Status::OK();
}},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only those lines are newly added, rest is formatted by clang-format

@@ -417,8 +419,6 @@ void Server::SubscribeChannel(const std::string &channel, redis::Connection *con
std::lock_guard<std::mutex> guard(pubsub_channels_mu_);

auto conn_ctx = ConnContext(conn->Owner(), conn->GetFD());
conn_ctxs_[conn_ctx] = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No where is using conn_ctxs_, so delete it.

@git-hulk git-hulk force-pushed the feature/dynamic-workers branch from aa99ecc to d99e14e Compare October 27, 2023 14:12
@git-hulk git-hulk marked this pull request as ready for review October 27, 2023 14:12
@git-hulk git-hulk marked this pull request as draft October 27, 2023 15:30
@git-hulk git-hulk marked this pull request as ready for review October 28, 2023 07:46
Copy link
Contributor

@torwig torwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but the rest LGTM. @git-hulk Good job!

tests/gocase/unit/config/config_test.go Outdated Show resolved Hide resolved
src/server/worker.cc Show resolved Hide resolved
@git-hulk git-hulk force-pushed the feature/dynamic-workers branch from 8297bc4 to cb917e6 Compare October 29, 2023 00:56
@git-hulk git-hulk requested a review from torwig October 29, 2023 07:42
torwig
torwig previously approved these changes Oct 29, 2023
Copy link
Contributor

@torwig torwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

src/server/server.cc Show resolved Hide resolved
src/server/server.cc Show resolved Hide resolved
src/server/server.cc Show resolved Hide resolved
src/server/server.cc Show resolved Hide resolved
src/server/server.cc Show resolved Hide resolved
src/server/server.cc Show resolved Hide resolved
@mapleFU
Copy link
Member

mapleFU commented Oct 29, 2023

I'm a bit confused about the ctor ordering here:

  1. Server::~Server calls cleanupExitedWorkerThreads()
  2. cron calls cleanupExitedWorkerThreads()

However, when Join, would the exited worker thread be cleaned?

@git-hulk
Copy link
Member Author

I'm a bit confused about the ctor ordering here:

  1. Server::~Server calls cleanupExitedWorkerThreads()
  2. cron calls cleanupExitedWorkerThreads()

However, when Join, would the exited worker thread be cleaned?

The cron thread periodic call cleanupExitedWorkerThreads() but it can't promise it will call cleanupExitedWorkerThreads() before exiting the cron thread. So for the server destructor, it MUST check again to prevent exceptions.

src/server/server.cc Outdated Show resolved Hide resolved
@mapleFU
Copy link
Member

mapleFU commented Oct 29, 2023

Got it, so finally resource is handled by dtor.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@git-hulk
Copy link
Member Author

Thanks all, merging..

@git-hulk git-hulk merged commit b29ad21 into apache:unstable Oct 30, 2023
git-hulk added a commit that referenced this pull request Nov 17, 2023
#1855 introduced the data race even only migrating the non-blocking connections. For example:

T1: C0(connection) is running the command Config::Set on W0(worker)
and C1 is running another command on W1. C0 got the exclusive lock for executing commands
and C1 will wait for the lock inside ExecuteCommands.

T2: C0 migrates C1 to W0 after reducing the number of worker threads, but the ExecuteCommands function will
continue executing after migrating. So both W0 and W1 will access the C1 at the same time.

To avoid this, I simply don't migrate the connection if it has any running command and
delay to shutdown workers, so that old connections have a chance to finish the running command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the support of changing worker threads in-flight
5 participants