Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refact MPPTunnel class to encapsulate different tunnel mode #5286

Merged
merged 19 commits into from
Jul 14, 2022

Conversation

yibin87
Copy link
Contributor

@yibin87 yibin87 commented Jul 5, 2022

What problem does this PR solve?

Issue Number: close #5095

Problem Summary:

What is changed and how it works?

  1. Separate 'Consumer' logic apart from MPPTunnel, which now named TunnelSender, and have three possible modes
  2. Local Receiver, EstablishCall now owns TunnelSender instead of MPPTunnel
  3. Refact MPPTunnel's status to an enum, and distinguish finished and consumerFinish

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jul 5, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • fuzhe1989
  • windtalker

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the release-note-none Denotes a PR that doesn't merit a release note. label Jul 5, 2022
@yibin87 yibin87 requested review from bestwoody and fuzhe1989 July 5, 2022 05:21
@ti-chi-bot ti-chi-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 5, 2022
@yibin87 yibin87 requested a review from windtalker July 5, 2022 05:21
@yibin87
Copy link
Contributor Author

yibin87 commented Jul 5, 2022

/run-unit-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jul 5, 2022

Coverage for changed files

Filename                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/MPMCQueue.h                           76                 0   100.00%          29                 0   100.00%         166                 1    99.40%          44                 2    95.45%
Flash/EstablishCall.h                         2                 2     0.00%           2                 2     0.00%           2                 2     0.00%           0                 0         -
Flash/Mpp/GRPCReceiverContext.cpp            51                51     0.00%          21                21     0.00%         129               129     0.00%          28                28     0.00%
Flash/Mpp/MPPTunnel.cpp                     368               108    70.65%          21                 1    95.24%         299                32    89.30%         150                52    65.33%
Flash/Mpp/MPPTunnel.h                        20                 6    70.00%          20                 6    70.00%          36                 8    77.78%           0                 0         -
Flash/Mpp/PacketWriter.h                      4                 2    50.00%           4                 2    50.00%           4                 2    50.00%           0                 0         -
Flash/Mpp/tests/gtest_mpptunnel.cpp        1673               461    72.44%          52                 0   100.00%         462                 0   100.00%         468               242    48.29%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                      2194               630    71.29%         149                32    78.52%        1098               174    84.15%         690               324    53.04%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18422      9640             47.67%    207129  96449        53.44%

full coverage report (for internal network access only)

yibin87 added 2 commits July 6, 2022 10:12
Signed-off-by: yibin <[email protected]>
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 7, 2022
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2022
Signed-off-by: yibin <[email protected]>
@@ -117,7 +117,7 @@ void EstablishCallData::writeDone(const ::grpc::Status & status)
state = FINISH;
if (stopwatch)
{
LOG_FMT_INFO(mpp_tunnel->getLogger(), "connection for {} cost {} ms.", mpp_tunnel->id(), stopwatch->elapsedMilliseconds());
LOG_FMT_INFO(async_tunnel_sender->getLogger(), "connection cost {} ms.", stopwatch->elapsedMilliseconds());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not log the tunnel id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous version, tunnel_sender doesn't have tunnel_id field, and the Logger itself already has tunnel_id info as prefix, so just remove the tunnel_id.
Since now add tunnel_id field in tunnel_sender, will add it back.

async_tunnel_sender = std::make_shared<AsyncTunnelSender>(mode, send_queue, writer, log, tunnel_id);
tunnel_sender = async_tunnel_sender;
RUNTIME_ASSERT(writer != nullptr, log, "Async writer shouldn't be null");
RUNTIME_ASSERT(tunnel_sender != nullptr, log, "Tunnel sender shouldn't be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this assert, line 210 already make sure that tunnel_sender is not nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I'll remove the check.

send_queue.push(std::make_shared<mpp::MPPDataPacket>(getPacketWithError(reason)));
if (!is_local && is_async)
writer->tryFlushOne();
send_queue->push(std::make_shared<mpp::MPPDataPacket>(getPacketWithError(reason)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in WaitingForSenderFinish state, I think the send_queue is already finished or cancelled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems in WaitingForSenderFinish state, MPPTunnel should just ignore the close event.

if (!err_msg.empty())
{
err_msg = fmt::format("{} meet error: {}", tunnel_id, err_msg);
LOG_ERROR(log, err_msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to trim the stack trace information in err_msg after it is logged. Otherwise the error message will be too long and hard to read. Please refer to #5304 for the details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, nice suggestions!

if (!err_msg.empty())
{
err_msg = fmt::format("{} meet error: {}", tunnel_id, err_msg);
LOG_ERROR(log, err_msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

* - Consumer may close `send_queue` to notify MPPTunnel that an error occurs.
* - After `connect` only the consumer can set `finished` to `true`.
* - Consumer's state is saved in `consumer_state` and be available after consumer finished.
* - MPPTunnel may close `send_queue` to notify Sender normally finish.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways to close send_queue: finish and cancel, is there any difference between the two close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change 'close' to 'finish'.

@yibin87 yibin87 requested review from windtalker and LittleFall July 11, 2022 06:59
@@ -76,7 +76,8 @@ class MPMCQueue

/// Block until:
/// 1. Pop succeeds with a valid T: return true.
/// 2. The queue is cancelled or finished: return false.
/// 2. The queue is cancelled: return false.
/// 3. The queue is finished: return true if the queue is not empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first case already contains "queue is finished but not empty".

Copy link
Contributor Author

@yibin87 yibin87 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about make rule1 like: "The queue is normal or finished, Pop succeeds with a valid T: return true" to make it more clear.

Copy link
Contributor

@fuzhe1989 fuzhe1989 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of reverse to me in that I (the user) needn't to know the queue state until the pop failed. That means a normal queue and a finished queue seem no difference to me if they are both non-empty.

Copy link
Contributor

@fuzhe1989 fuzhe1989 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big issue though. It's ok if you think the new desc is more friendly to you.

Copy link
Contributor Author

@yibin87 yibin87 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, previous comment is a little confusing to me, make me think pop finished queue will always return false because cancel/finish has no differences in the previous comments.
To make it simple and clear, I'll add a table to describe the behavior for both pop/push.

@@ -115,7 +116,7 @@ class EstablishCallData : public PacketWriter
FINISH
};
CallStatus state; // The current serving state.
std::shared_ptr<DB::MPPTunnel> mpp_tunnel = nullptr;
std::shared_ptr<DB::AsyncTunnelSender> async_tunnel_sender = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needn't to give it a default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'll remove it.

Unconnected, // Not connect to any writer, not able to accept new data
Connected, // Connected to some writer, accepting data
WaitingForSenderFinish, // Accepting all data already, wait for sender to finish
Finished // Final state, no more work to do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about differentiate Finished (finish after connect) and Cancelled (close before connect)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they are different states, I think they are just not so different, since cancel event may lead to Finished state too:)
In the other side, it might be more meaningful to differentiate cancel and finished, I'll consider it later.

{
}

// before finished, must be called without protection of mu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment is out of date, there is no mu in TunnelSender

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll remove it.

}
void setMsg(const String & msg)
{
promise.set_value(msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not releated to this pr, but do we need some protections to make sure promise.set_value is not called multiple times? Since here says "An exception is thrown if there is no shared state or the shared state already stores a value or exception."

Copy link
Contributor Author

@yibin87 yibin87 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the suggested way to set consumer_state's msg is to call consumerFinish. In consumerFinish, there is the protection, check if msg already set, if so, just do nothing and return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we enhance setMsg as

void setMsg(const String & msg)
{
    bool old_value = false;
    if (!msg_has_set.compare_exchange_strong(old_val, true, std::memory_order_seq_cst, std::memory_order_relaxed))
        return;
    promise.set_value(msg);
}

And consumerFinish don't need to involve lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky part of this is that, it's an obvious bug if promise.set_value is called more than once. Do we need to protect for potential bug and pretend nothing happened, or just leave it throw then we could quickly know there's a bug?

Copy link
Contributor Author

@yibin87 yibin87 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Good point from Fu, I would choose to protect this, since we used to allow multiple invocations of consumeFinish to ensure the MPPTunnel not blocked, and I don't see any significant bad effects here.

/// close() finishes the tunnel, if the tunnel is connected already, it will
/// write the error message to the tunnel, otherwise it just close the tunnel
void close(const String & reason);

// a MPPConn request has arrived. it will build connection by this tunnel;
void connect(Writer * writer_);
virtual void connect(PacketWriter * writer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to be a virtual function, seems I don't found override version of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it is a mistake, I'll fix it.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2022
@yibin87 yibin87 requested a review from windtalker July 12, 2022 06:39
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2022
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM

dbms/src/Flash/Mpp/MPPTunnel.h Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 12, 2022
@yibin87
Copy link
Contributor Author

yibin87 commented Jul 12, 2022

/run-unit-tests

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 13, 2022
@yibin87
Copy link
Contributor Author

yibin87 commented Jul 14, 2022

/merge

@ti-chi-bot
Copy link
Member

@yibin87: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 02a26f2

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 14, 2022
@ti-chi-bot
Copy link
Member

@yibin87: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@sre-bot
Copy link
Collaborator

sre-bot commented Jul 14, 2022

Coverage for changed files

Filename                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/MPMCQueue.h                           82                 0   100.00%          31                 0   100.00%         178                 1    99.44%          44                 2    95.45%
Flash/EstablishCall.h                         2                 2     0.00%           2                 2     0.00%           2                 2     0.00%           0                 0         -
Flash/Mpp/GRPCReceiverContext.cpp            51                51     0.00%          21                21     0.00%         129               129     0.00%          28                28     0.00%
Flash/Mpp/MPPTunnel.cpp                     371               110    70.35%          21                 1    95.24%         302                34    88.74%         150                53    64.67%
Flash/Mpp/MPPTunnel.h                        24                 7    70.83%          21                 7    66.67%          41                11    73.17%           2                 0   100.00%
Flash/Mpp/PacketWriter.h                      4                 2    50.00%           4                 2    50.00%           4                 2    50.00%           0                 0         -
Flash/Mpp/tests/gtest_mpptunnel.cpp        1673               461    72.44%          52                 0   100.00%         462                 0   100.00%         468               242    48.29%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                      2207               633    71.32%         152                33    78.29%        1118               179    83.99%         692               325    53.03%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18527      9572             48.33%    208792  96370        53.84%

full coverage report (for internal network access only)

@yibin87
Copy link
Contributor Author

yibin87 commented Jul 14, 2022

/run-all-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jul 14, 2022

Coverage for changed files

Filename                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/MPMCQueue.h                           82                 0   100.00%          31                 0   100.00%         178                 1    99.44%          44                 2    95.45%
Flash/EstablishCall.h                         2                 2     0.00%           2                 2     0.00%           2                 2     0.00%           0                 0         -
Flash/Mpp/GRPCReceiverContext.cpp            51                51     0.00%          21                21     0.00%         129               129     0.00%          28                28     0.00%
Flash/Mpp/MPPTunnel.cpp                     371               110    70.35%          21                 1    95.24%         302                34    88.74%         150                53    64.67%
Flash/Mpp/MPPTunnel.h                        24                 7    70.83%          21                 7    66.67%          41                11    73.17%           2                 0   100.00%
Flash/Mpp/PacketWriter.h                      4                 2    50.00%           4                 2    50.00%           4                 2    50.00%           0                 0         -
Flash/Mpp/tests/gtest_mpptunnel.cpp        1673               461    72.44%          52                 0   100.00%         462                 0   100.00%         468               242    48.29%
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                      2207               633    71.32%         152                33    78.29%        1118               179    83.99%         692               325    53.03%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18527      9572             48.33%    208792  96351        53.85%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot merged commit f65240e into pingcap:master Jul 14, 2022
Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this pull request Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refine error handling/cancel in TiFlash MPP system
5 participants