This repository has been archived by the owner on Jun 23, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 59
perf: optimizing write latency using independent IO queues replace of libaio #633
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hycdong
previously approved these changes
Oct 12, 2020
levy5307
reviewed
Oct 15, 2020
levy5307
reviewed
Oct 15, 2020
levy5307
reviewed
Oct 19, 2020
neverchanje
reviewed
Oct 19, 2020
neverchanje
previously approved these changes
Oct 19, 2020
levy5307
reviewed
Oct 19, 2020
levy5307
previously approved these changes
Oct 19, 2020
foreverneverer
force-pushed
the
newaio-with-tracer
branch
from
October 19, 2020 10:40
622e7a7
to
5cf7373
Compare
neverchanje
approved these changes
Oct 19, 2020
levy5307
approved these changes
Oct 19, 2020
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is based on #569, thanks for @neverchanje offering an initial implementation of the new async-io.
The original async-io is based on Linux AIO which may cause some bottlenecks especially in Learning and Compaction situations which will slow down the write-path. #568 has changed the callback task pool of LibAIO to optimize the latency during Compaction, but we still find latency spikes occasionally occur during Compaction, and it generally makes the cluster unavailable during Learning (the replica migration process typically happen during scaling-in-out nodes).
To avoid the influence of different AIO tasks sharing the same LibAIO queue, this PR removes the LibAIO module and introduces a new async-IO implementation based on rdsn task-queue. It handles different IO tasks in separate task queues, isolates low & high priority tasks.
In the latest design, the Learning IO task is assigned into
THREAD_POOL_DEFAULT
queue, the private-log IO task is assigned toTHREAD_POOL_REPLICATION_LONG
queue, shared-log IO task is assigned toTHREAD_POOL_SLOG
queue. With the isolation of different IO queues, every io task can be executed efficiently.In addition, for the Learning request logic may be assigned
THREAD_POOL_REPLICATION
when after end_get_file_size(theend_get_file_size
is assigned intoTHREAD_POOL_REPLICATION
, because for the rpc ack, the thread pool is equal withe current pool, see rpc_request_task), which may block the write operation especially using NFS-RateLimiter set low learning rate. Since the change is little, I fix it in this PR:And
LPC_NFS_COPY_FILE
is assigned intoTHREAD_POOL_DEFAULT
Experiments
Here I tested cases during Learning and Compaction to see the effect of optimization. The pegasus configuration is 2 meta-server and 5 replica-server.
Case 1: Compaction-30thread*3client, load, 1KB length
#568 has shown the effect of most cases. This test shows the average result of multiple tests. In this test, I execute three times using YCSB.
We can find them achieving almost the same results below P9999. However, at P9999 and MAX results,
NewAIO
provides a better result.Case 2: Learning of adding node-15thread*3client, write:read=3:1, 1KB length, data=16GB * 32 Partitions
Case 3: Learning of offline node-15thread*3client, write:read=3:1, 1KB length, data=16GB * 32 Partitions
From the above results, we can confirm that the new async-io implementation almost avoids the spikes during adding node, and greatly reduced the influence while offlining node.
Notice that not all the tests can have same exact latency value, but the conclusions of multiple tests are consistent.