Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

perf: optimizing write latency using independent IO queues replace of libaio #633

Merged
merged 74 commits into from
Oct 19, 2020

Conversation

foreverneverer
Copy link
Contributor

@foreverneverer foreverneverer commented Sep 27, 2020

This PR is based on #569, thanks for @neverchanje offering an initial implementation of the new async-io.

The original async-io is based on Linux AIO which may cause some bottlenecks especially in Learning and Compaction situations which will slow down the write-path. #568 has changed the callback task pool of LibAIO to optimize the latency during Compaction, but we still find latency spikes occasionally occur during Compaction, and it generally makes the cluster unavailable during Learning (the replica migration process typically happen during scaling-in-out nodes).

To avoid the influence of different AIO tasks sharing the same LibAIO queue, this PR removes the LibAIO module and introduces a new async-IO implementation based on rdsn task-queue. It handles different IO tasks in separate task queues, isolates low & high priority tasks.

In the latest design, the Learning IO task is assigned into THREAD_POOL_DEFAULT queue, the private-log IO task is assigned to THREAD_POOL_REPLICATION_LONG queue, shared-log IO task is assigned to THREAD_POOL_SLOG queue. With the isolation of different IO queues, every io task can be executed efficiently.

In addition, for the Learning request logic may be assigned THREAD_POOL_REPLICATION when after end_get_file_size(the end_get_file_size is assigned into THREAD_POOL_REPLICATION, because for the rpc ack, the thread pool is equal withe current pool, see rpc_request_task), which may block the write operation especially using NFS-RateLimiter set low learning rate. Since the change is little, I fix it in this PR:

nfs_client_impl::end_get_file_size(){
...
- continue_copy();
+ tasking::enqueue(LPC_NFS_COPY_FILE, nullptr, [this]() { continue_copy(); }, 0);
}

And LPC_NFS_COPY_FILE is assigned into THREAD_POOL_DEFAULT

Experiments

Here I tested cases during Learning and Compaction to see the effect of optimization. The pegasus configuration is 2 meta-server and 5 replica-server.

Case 1: Compaction-30thread*3client, load, 1KB length

#568 has shown the effect of most cases. This test shows the average result of multiple tests. In this test, I execute three times using YCSB.

io/latency min average p95 p99 p999 p9999 max
LibAIO 340 1578 3157 6423 11119 325897 1291624
NewAIO 335 1558 3139 6343 10175 18079 177663

We can find them achieving almost the same results below P9999. However, at P9999 and MAX results, NewAIO provides a better result.

Case 2: Learning of adding node-15thread*3client, write:read=3:1, 1KB length, data=16GB * 32 Partitions
LibAIOAddNodeLatency
NewAIOAddNodeLatency

Case 3: Learning of offline node-15thread*3client, write:read=3:1, 1KB length, data=16GB * 32 Partitions
LibAIOOfflineLatency
NewAIO500MBP999

From the above results, we can confirm that the new async-io implementation almost avoids the spikes during adding node, and greatly reduced the influence while offlining node.

Notice that not all the tests can have same exact latency value, but the conclusions of multiple tests are consistent.

@foreverneverer foreverneverer marked this pull request as ready for review October 12, 2020 06:57
hycdong
hycdong previously approved these changes Oct 12, 2020
src/aio/aio_provider.h Outdated Show resolved Hide resolved
neverchanje
neverchanje previously approved these changes Oct 19, 2020
levy5307
levy5307 previously approved these changes Oct 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/performance Issues or PRs related to performacne
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants