Refact MPPTunnel class to encapsulate different tunnel mode #5286

yibin87 · 2022-07-05T05:21:09Z

What problem does this PR solve?

Issue Number: close #5095

Problem Summary:

What is changed and how it works?

Separate 'Consumer' logic apart from MPPTunnel, which now named TunnelSender, and have three possible modes
Local Receiver, EstablishCall now owns TunnelSender instead of MPPTunnel
Refact MPPTunnel's status to an enum, and distinguish finished and consumerFinish

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

…to one enum 2.Comment rewrite Signed-off-by: yibin <[email protected]>

Signed-off-by: yibin <[email protected]>

ti-chi-bot · 2022-07-05T05:21:10Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

fuzhe1989
windtalker

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

yibin87 · 2022-07-05T08:39:59Z

/run-unit-tests

sre-bot · 2022-07-05T08:58:35Z

Coverage for changed files

Filename                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/MPMCQueue.h                           76                 0   100.00%          29                 0   100.00%         166                 1    99.40%          44                 2    95.45%
Flash/EstablishCall.h                         2                 2     0.00%           2                 2     0.00%           2                 2     0.00%           0                 0         -
Flash/Mpp/GRPCReceiverContext.cpp            51                51     0.00%          21                21     0.00%         129               129     0.00%          28                28     0.00%
Flash/Mpp/MPPTunnel.cpp                     368               108    70.65%          21                 1    95.24%         299                32    89.30%         150                52    65.33%
Flash/Mpp/MPPTunnel.h                        20                 6    70.00%          20                 6    70.00%          36                 8    77.78%           0                 0         -
Flash/Mpp/PacketWriter.h                      4                 2    50.00%           4                 2    50.00%           4                 2    50.00%           0                 0         -
Flash/Mpp/tests/gtest_mpptunnel.cpp        1673               461    72.44%          52                 0   100.00%         462                 0   100.00%         468               242    48.29%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                      2194               630    71.29%         149                32    78.52%        1098               174    84.15%         690               324    53.04%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18422      9640             47.67%    207129  96449        53.44%

full coverage report (for internal network access only)

Signed-off-by: yibin <[email protected]>

windtalker · 2022-07-11T03:41:34Z

dbms/src/Flash/EstablishCall.cpp

@@ -117,7 +117,7 @@ void EstablishCallData::writeDone(const ::grpc::Status & status)
    state = FINISH;
    if (stopwatch)
    {
-        LOG_FMT_INFO(mpp_tunnel->getLogger(), "connection for {} cost {} ms.", mpp_tunnel->id(), stopwatch->elapsedMilliseconds());
+        LOG_FMT_INFO(async_tunnel_sender->getLogger(), "connection cost {} ms.", stopwatch->elapsedMilliseconds());


Why not log the tunnel id?

In previous version, tunnel_sender doesn't have tunnel_id field, and the Logger itself already has tunnel_id info as prefix, so just remove the tunnel_id.
Since now add tunnel_id field in tunnel_sender, will add it back.

windtalker · 2022-07-11T05:17:33Z

dbms/src/Flash/Mpp/MPPTunnel.cpp

+            async_tunnel_sender = std::make_shared<AsyncTunnelSender>(mode, send_queue, writer, log, tunnel_id);
+            tunnel_sender = async_tunnel_sender;
+            RUNTIME_ASSERT(writer != nullptr, log, "Async writer shouldn't be null");
+            RUNTIME_ASSERT(tunnel_sender != nullptr, log, "Tunnel sender shouldn't be null");


Why add this assert, line 210 already make sure that tunnel_sender is not nullptr

You're right. I'll remove the check.

windtalker · 2022-07-11T05:26:42Z

dbms/src/Flash/Mpp/MPPTunnel.cpp

-                    send_queue.push(std::make_shared<mpp::MPPDataPacket>(getPacketWithError(reason)));
-                    if (!is_local && is_async)
-                        writer->tryFlushOne();
+                    send_queue->push(std::make_shared<mpp::MPPDataPacket>(getPacketWithError(reason)));


When in WaitingForSenderFinish state, I think the send_queue is already finished or cancelled?

Seems in WaitingForSenderFinish state, MPPTunnel should just ignore the close event.

windtalker · 2022-07-11T05:33:03Z

dbms/src/Flash/Mpp/MPPTunnel.cpp

+    if (!err_msg.empty())
+    {
+        err_msg = fmt::format("{} meet error: {}", tunnel_id, err_msg);
+        LOG_ERROR(log, err_msg);


I suggest to trim the stack trace information in err_msg after it is logged. Otherwise the error message will be too long and hard to read. Please refer to #5304 for the details

Got it, nice suggestions!

windtalker · 2022-07-11T05:34:38Z

dbms/src/Flash/Mpp/MPPTunnel.cpp

+    if (!err_msg.empty())
+    {
+        err_msg = fmt::format("{} meet error: {}", tunnel_id, err_msg);
+        LOG_ERROR(log, err_msg);


windtalker · 2022-07-11T05:44:34Z

dbms/src/Flash/Mpp/MPPTunnel.h

- * - Consumer may close `send_queue` to notify MPPTunnel that an error occurs.
- * - After `connect` only the consumer can set `finished` to `true`.
- * - Consumer's state is saved in `consumer_state` and be available after consumer finished.
+ * - MPPTunnel may close `send_queue` to notify Sender normally finish.


There are two ways to close send_queue: finish and cancel, is there any difference between the two close?

Change 'close' to 'finish'.

Signed-off-by: yibin <[email protected]>

fuzhe1989 · 2022-07-11T11:15:18Z

dbms/src/Common/MPMCQueue.h

@@ -76,7 +76,8 @@ class MPMCQueue

    /// Block until:
    /// 1. Pop succeeds with a valid T: return true.
-    /// 2. The queue is cancelled or finished: return false.
+    /// 2. The queue is cancelled: return false.
+    /// 3. The queue is finished: return true if the queue is not empty.


The first case already contains "queue is finished but not empty".

How about make rule1 like: "The queue is normal or finished, Pop succeeds with a valid T: return true" to make it more clear.

It's kind of reverse to me in that I (the user) needn't to know the queue state until the pop failed. That means a normal queue and a finished queue seem no difference to me if they are both non-empty.

It's not a big issue though. It's ok if you think the new desc is more friendly to you.

I see, previous comment is a little confusing to me, make me think pop finished queue will always return false because cancel/finish has no differences in the previous comments.
To make it simple and clear, I'll add a table to describe the behavior for both pop/push.

fuzhe1989 · 2022-07-11T11:16:16Z

dbms/src/Flash/EstablishCall.h

@@ -115,7 +116,7 @@ class EstablishCallData : public PacketWriter
        FINISH
    };
    CallStatus state; // The current serving state.
-    std::shared_ptr<DB::MPPTunnel> mpp_tunnel = nullptr;
+    std::shared_ptr<DB::AsyncTunnelSender> async_tunnel_sender = nullptr;


needn't to give it a default value.

You're right, I'll remove it.

fuzhe1989 · 2022-07-11T13:52:10Z

dbms/src/Flash/Mpp/MPPTunnel.h

+        Unconnected, // Not connect to any writer, not able to accept new data
+        Connected, // Connected to some writer, accepting data
+        WaitingForSenderFinish, // Accepting all data already, wait for sender to finish
+        Finished // Final state, no more work to do


How about differentiate Finished (finish after connect) and Cancelled (close before connect)?

Yeah, they are different states, I think they are just not so different, since cancel event may lead to Finished state too:)
In the other side, it might be more meaningful to differentiate cancel and finished, I'll consider it later.

Signed-off-by: yibin <[email protected]>

windtalker · 2022-07-12T03:44:53Z

dbms/src/Flash/Mpp/MPPTunnel.h

+        {
+        }
+
+        // before finished, must be called without protection of mu


Looks like this comment is out of date, there is no mu in TunnelSender

Yeah, I'll remove it.

windtalker · 2022-07-12T03:50:03Z

dbms/src/Flash/Mpp/MPPTunnel.h

+        }
+        void setMsg(const String & msg)
+        {
+            promise.set_value(msg);


Not releated to this pr, but do we need some protections to make sure promise.set_value is not called multiple times? Since here says "An exception is thrown if there is no shared state or the shared state already stores a value or exception."

Currently, the suggested way to set consumer_state's msg is to call consumerFinish. In consumerFinish, there is the protection, check if msg already set, if so, just do nothing and return.

How about we enhance setMsg as

void setMsg(const String & msg) { bool old_value = false; if (!msg_has_set.compare_exchange_strong(old_val, true, std::memory_order_seq_cst, std::memory_order_relaxed)) return; promise.set_value(msg); }

And consumerFinish don't need to involve lock.

The tricky part of this is that, it's an obvious bug if promise.set_value is called more than once. Do we need to protect for potential bug and pretend nothing happened, or just leave it throw then we could quickly know there's a bug?

Done, Good point from Fu, I would choose to protect this, since we used to allow multiple invocations of consumeFinish to ensure the MPPTunnel not blocked, and I don't see any significant bad effects here.

windtalker · 2022-07-12T03:57:35Z

dbms/src/Flash/Mpp/MPPTunnel.h

    /// close() finishes the tunnel, if the tunnel is connected already, it will
    /// write the error message to the tunnel, otherwise it just close the tunnel
    void close(const String & reason);

    // a MPPConn request has arrived. it will build connection by this tunnel;
-    void connect(Writer * writer_);
+    virtual void connect(PacketWriter * writer);


Why need to be a virtual function, seems I don't found override version of it

Good catch, it is a mistake, I'll fix it.

Signed-off-by: yibin <[email protected]>

…mpptunnel_refactor

windtalker

Others LGTM

dbms/src/Flash/Mpp/MPPTunnel.h

Signed-off-by: yibin <[email protected]>

yibin87 · 2022-07-12T09:31:01Z

/run-unit-tests

Signed-off-by: yibin <[email protected]>

yibin87 · 2022-07-14T00:50:00Z

/merge

ti-chi-bot · 2022-07-14T00:50:00Z

@yibin87: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2022-07-14T00:50:03Z

This pull request has been accepted and is ready to merge.

Commit hash: 02a26f2

ti-chi-bot · 2022-07-14T00:50:15Z

@yibin87: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

sre-bot · 2022-07-14T01:08:03Z

Coverage for changed files

Filename                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/MPMCQueue.h                           82                 0   100.00%          31                 0   100.00%         178                 1    99.44%          44                 2    95.45%
Flash/EstablishCall.h                         2                 2     0.00%           2                 2     0.00%           2                 2     0.00%           0                 0         -
Flash/Mpp/GRPCReceiverContext.cpp            51                51     0.00%          21                21     0.00%         129               129     0.00%          28                28     0.00%
Flash/Mpp/MPPTunnel.cpp                     371               110    70.35%          21                 1    95.24%         302                34    88.74%         150                53    64.67%
Flash/Mpp/MPPTunnel.h                        24                 7    70.83%          21                 7    66.67%          41                11    73.17%           2                 0   100.00%
Flash/Mpp/PacketWriter.h                      4                 2    50.00%           4                 2    50.00%           4                 2    50.00%           0                 0         -
Flash/Mpp/tests/gtest_mpptunnel.cpp        1673               461    72.44%          52                 0   100.00%         462                 0   100.00%         468               242    48.29%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                      2207               633    71.32%         152                33    78.29%        1118               179    83.99%         692               325    53.03%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18527      9572             48.33%    208792  96370        53.84%

full coverage report (for internal network access only)

yibin87 · 2022-07-14T01:30:51Z

/run-all-tests

sre-bot · 2022-07-14T01:41:43Z

Coverage for changed files

Filename                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/MPMCQueue.h                           82                 0   100.00%          31                 0   100.00%         178                 1    99.44%          44                 2    95.45%
Flash/EstablishCall.h                         2                 2     0.00%           2                 2     0.00%           2                 2     0.00%           0                 0         -
Flash/Mpp/GRPCReceiverContext.cpp            51                51     0.00%          21                21     0.00%         129               129     0.00%          28                28     0.00%
Flash/Mpp/MPPTunnel.cpp                     371               110    70.35%          21                 1    95.24%         302                34    88.74%         150                53    64.67%
Flash/Mpp/MPPTunnel.h                        24                 7    70.83%          21                 7    66.67%          41                11    73.17%           2                 0   100.00%
Flash/Mpp/PacketWriter.h                      4                 2    50.00%           4                 2    50.00%           4                 2    50.00%           0                 0         -
Flash/Mpp/tests/gtest_mpptunnel.cpp        1673               461    72.44%          52                 0   100.00%         462                 0   100.00%         468               242    48.29%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                      2207               633    71.32%         152                33    78.29%        1118               179    83.99%         692               325    53.03%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18527      9572             48.33%    208792  96351        53.85%

full coverage report (for internal network access only)

…5286) close pingcap#5095

yibin87 added 2 commits July 5, 2022 09:51

MPPTunnel refact first draft, TODO: 1.Merge connected and finished in…

f4729b2

…to one enum 2.Comment rewrite Signed-off-by: yibin <[email protected]>

Merge connected and finished flag, update comments

3ab908b

Signed-off-by: yibin <[email protected]>

ti-chi-bot added the release-note-none Denotes a PR that doesn't merit a release note. label Jul 5, 2022

yibin87 requested review from bestwoody and fuzhe1989 July 5, 2022 05:21

ti-chi-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 5, 2022

yibin87 requested a review from windtalker July 5, 2022 05:21

yibin87 added 2 commits July 6, 2022 10:12

Add related comments

4a423ec

Signed-off-by: yibin <[email protected]>

Little refact

eea293b

Signed-off-by: yibin <[email protected]>

ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2022

Merge branch 'master' into mpptunnel_refactor

73bf97f

ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2022

Resolve conflicts

e8345e9

Signed-off-by: yibin <[email protected]>

windtalker reviewed Jul 11, 2022

View reviewed changes

Changes according to comments

f528c2f

Signed-off-by: yibin <[email protected]>

yibin87 requested review from windtalker and LittleFall July 11, 2022 06:59

fuzhe1989 reviewed Jul 11, 2022

View reviewed changes

Update according to comments

87eefed

Signed-off-by: yibin <[email protected]>

yibin87 requested a review from fuzhe1989 July 12, 2022 01:22

JaySon-Huang mentioned this pull request Jul 12, 2022

[WIP] coprocessor: separate thread for encoding data for batch cop #5349

Closed

12 tasks

windtalker reviewed Jul 12, 2022

View reviewed changes

yibin87 added 2 commits July 12, 2022 13:41

Remove virutal keyword for connect function

726448c

Signed-off-by: yibin <[email protected]>

Rewrite comments for MPMCQueue pop/push

2d6ad9a

Signed-off-by: yibin <[email protected]>

ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2022

yibin87 added 2 commits July 12, 2022 14:05

Remove outdated comment

d592310

Signed-off-by: yibin <[email protected]>

Updates according to comments

7d631e1

Signed-off-by: yibin <[email protected]>

yibin87 requested a review from windtalker July 12, 2022 06:39

yibin87 added 2 commits July 12, 2022 15:43

Updates according to comments

46b5373

Signed-off-by: yibin <[email protected]>

Merge branch 'master' into mpptunnel_refactor

742b911

ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2022

yibin87 added 2 commits July 12, 2022 16:23

Updates according to comments

e6ba3ac

Signed-off-by: yibin <[email protected]>

Merge branch 'mpptunnel_refactor' of github.com:yibin87/tiflash into …

c9c9e97

…mpptunnel_refactor

windtalker approved these changes Jul 12, 2022

View reviewed changes

dbms/src/Flash/Mpp/MPPTunnel.h Outdated Show resolved Hide resolved

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 12, 2022

Remove useless virtual dctor

2df4eb8

Signed-off-by: yibin <[email protected]>

Remove useless friend class

02a26f2

Signed-off-by: yibin <[email protected]>

fuzhe1989 approved these changes Jul 13, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 13, 2022

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 14, 2022

Merge branch 'master' into mpptunnel_refactor

b2350fe

ti-chi-bot merged commit f65240e into pingcap:master Jul 14, 2022

Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this pull request Jul 19, 2022

Refact MPPTunnel class to encapsulate different tunnel mode (pingcap#…

69ae1ca

…5286) close pingcap#5095

windtalker mentioned this pull request Aug 16, 2022

Sync MPPTunnel hangs when it meet error #5631

Closed

Refact MPPTunnel class to encapsulate different tunnel mode #5286

Refact MPPTunnel class to encapsulate different tunnel mode #5286

Conversation

yibin87 commented Jul 5, 2022 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Jul 5, 2022 • edited Loading

yibin87 commented Jul 5, 2022

sre-bot commented Jul 5, 2022

Coverage for changed files

Coverage summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yibin87 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

fuzhe1989 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

fuzhe1989 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

yibin87 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yibin87 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yibin87 Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

windtalker left a comment

Choose a reason for hiding this comment

yibin87 commented Jul 12, 2022

yibin87 commented Jul 14, 2022

ti-chi-bot commented Jul 14, 2022

ti-chi-bot commented Jul 14, 2022

ti-chi-bot commented Jul 14, 2022

sre-bot commented Jul 14, 2022

Coverage for changed files

Coverage summary

yibin87 commented Jul 14, 2022

sre-bot commented Jul 14, 2022

Coverage for changed files

Coverage summary

yibin87 commented Jul 5, 2022 •

edited

Loading

ti-chi-bot commented Jul 5, 2022 •

edited

Loading

yibin87 Jul 12, 2022 •

edited

Loading

fuzhe1989 Jul 12, 2022 •

edited

Loading

fuzhe1989 Jul 12, 2022 •

edited

Loading

yibin87 Jul 12, 2022 •

edited

Loading

yibin87 Jul 12, 2022 •

edited

Loading

yibin87 Jul 12, 2022 •

edited

Loading