Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change one additional_input_at_end to many streams in ParallelInputsProcessor #5274

Merged
merged 22 commits into from
Jul 8, 2022

Conversation

gengliqi
Copy link
Contributor

@gengliqi gengliqi commented Jul 1, 2022

What problem does this PR solve?

Issue Number: close #5263 close #4856

Problem Summary:
This PR fixes two performance issues that are #5263 and #4856.

For #5263, when a query has a join then aggregation and its non-join data are enormous, this PR has a huge speedup in this scenario because multiple threads can be used to aggregate instead of a single thread in the previous implementation.

For #4856, this PR can optimize a scenario where some input streams' data are very huge and some threads may exit too early. For now, no thread can exit early. They work until all input steams are exhausted.

What is changed and how it works?

Refine the ParallelInputsProcessor code.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

gengliqi added 3 commits July 1, 2022 14:30
Signed-off-by: gengliqi <[email protected]>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jul 1, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • SeaRise
  • fuzhe1989

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 1, 2022
@gengliqi gengliqi force-pushed the speed-parallel-input branch 2 times, most recently from 2ed4141 to 858f851 Compare July 1, 2022 15:40
@SeaRise SeaRise self-requested a review July 4, 2022 03:09
Signed-off-by: gengliqi <[email protected]>
@SeaRise
Copy link
Contributor

SeaRise commented Jul 4, 2022

It seems that #4856 can be fixed in this pr

gengliqi added 3 commits July 5, 2022 14:12
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 6, 2022
gengliqi added 2 commits July 6, 2022 15:58
Signed-off-by: gengliqi <[email protected]>
@gengliqi
Copy link
Contributor Author

gengliqi commented Jul 6, 2022

@fuzhe1989 @SeaRise I have refined the code by using MPMCQueue. PTAL again, thanks!

@gengliqi
Copy link
Contributor Author

gengliqi commented Jul 6, 2022

/run-unit-test

@gengliqi
Copy link
Contributor Author

gengliqi commented Jul 6, 2022

/run-integration-test

@fuzhe1989
Copy link
Contributor

great refactoring!

@SeaRise SeaRise self-requested a review July 6, 2022 11:47
@gengliqi
Copy link
Contributor Author

gengliqi commented Jul 6, 2022

/run-unit-test
/run-integration-test

@sre-bot
Copy link
Collaborator

sre-bot commented Jul 6, 2022

Coverage for changed files

Filename                                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataStreams/ParallelAggregatingBlockInputStream.cpp         111               100     9.91%          13                 9    30.77%         162               145    10.49%          70                66     5.71%
DataStreams/ParallelAggregatingBlockInputStream.h             6                 4    33.33%           6                 4    33.33%          14                12    14.29%           0                 0         -
DataStreams/ParallelInputsProcessor.h                        73                19    73.97%          20                 6    70.00%         132                18    86.36%          34                 7    79.41%
DataStreams/UnionBlockInputStream.h                         102                38    62.75%          24                 8    66.67%         132                56    57.58%          46                23    50.00%
Flash/Coprocessor/DAGQueryBlockInterpreter.cpp              241                75    68.88%          38                 4    89.47%         592               115    80.57%         152                53    65.13%
Flash/Coprocessor/InterpreterUtils.cpp                       18                 1    94.44%           3                 0   100.00%          30                 1    96.67%          14                 3    78.57%
Interpreters/InterpreterSelectQuery.cpp                     540               540     0.00%          52                52     0.00%         919               919     0.00%         442               442     0.00%
Interpreters/InterpreterSelectWithUnionQuery.cpp             73                73     0.00%           7                 7     0.00%         135               135     0.00%          56                56     0.00%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                      1164               850    26.98%         163                90    44.79%        2116              1401    33.79%         814               650    20.15%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18413      9642             47.63%    207173  96508        53.42%

full coverage report (for internal network access only)

Signed-off-by: gengliqi <[email protected]>
@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Jul 8, 2022

Coverage for changed files

Filename                                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataStreams/ParallelAggregatingBlockInputStream.cpp         111               100     9.91%          13                 9    30.77%         162               145    10.49%          70                66     5.71%
DataStreams/ParallelAggregatingBlockInputStream.h             6                 4    33.33%           6                 4    33.33%          14                12    14.29%           0                 0         -
DataStreams/ParallelInputsProcessor.h                        67                19    71.64%          20                 6    70.00%         124                17    86.29%          30                 7    76.67%
DataStreams/UnionBlockInputStream.h                         102                38    62.75%          24                 8    66.67%         132                56    57.58%          46                23    50.00%
Debug/astToExecutor.cpp                                     578               210    63.67%          53                 9    83.02%        1516               545    64.05%         570               246    56.84%
Flash/Coprocessor/DAGQueryBlockInterpreter.cpp              241                71    70.54%          38                 4    89.47%         592               100    83.11%         152                50    67.11%
Flash/Coprocessor/InterpreterUtils.cpp                       18                 1    94.44%           3                 0   100.00%          30                 1    96.67%          14                 3    78.57%
Flash/tests/gtest_interpreter.cpp                           166                95    42.77%           6                 0   100.00%         520                 0   100.00%          20                20     0.00%
Interpreters/InterpreterSelectQuery.cpp                     540               540     0.00%          52                52     0.00%         919               919     0.00%         442               442     0.00%
Interpreters/InterpreterSelectWithUnionQuery.cpp             73                73     0.00%           7                 7     0.00%         135               135     0.00%          56                56     0.00%
TestUtils/mockExecutor.h                                      6                 1    83.33%           6                 1    83.33%          15                 3    80.00%           0                 0         -
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                      1908              1152    39.62%         228               100    56.14%        4159              1933    53.52%        1400               913    34.79%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18464      9605             47.98%    207983  96462        53.62%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Jul 8, 2022
@gengliqi gengliqi removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2022
@gengliqi
Copy link
Contributor Author

gengliqi commented Jul 8, 2022

Changed InterpreterSelectQuery to fix integration tests. Make some code the same with DAGQueryBlockInterpreter.

@gengliqi
Copy link
Contributor Author

gengliqi commented Jul 8, 2022

/run-all-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jul 8, 2022

Coverage for changed files

Filename                                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataStreams/ParallelAggregatingBlockInputStream.cpp         111               100     9.91%          13                 9    30.77%         162               145    10.49%          70                66     5.71%
DataStreams/ParallelAggregatingBlockInputStream.h             6                 4    33.33%           6                 4    33.33%          14                12    14.29%           0                 0         -
DataStreams/ParallelInputsProcessor.h                        67                19    71.64%          20                 6    70.00%         124                17    86.29%          30                 7    76.67%
DataStreams/UnionBlockInputStream.h                         102                38    62.75%          24                 8    66.67%         132                56    57.58%          46                23    50.00%
Debug/astToExecutor.cpp                                     578               210    63.67%          53                 9    83.02%        1516               545    64.05%         570               246    56.84%
Flash/Coprocessor/DAGQueryBlockInterpreter.cpp              240                70    70.83%          38                 4    89.47%         592               100    83.11%         152                50    67.11%
Flash/Coprocessor/InterpreterUtils.cpp                       19                 3    84.21%           3                 0   100.00%          44                 5    88.64%          18                 4    77.78%
Flash/tests/gtest_interpreter.cpp                           166                95    42.77%           6                 0   100.00%         520                 0   100.00%          20                20     0.00%
Interpreters/InterpreterSelectQuery.cpp                     542               542     0.00%          52                52     0.00%         929               929     0.00%         448               448     0.00%
Interpreters/InterpreterSelectQuery.h                         5                 5     0.00%           3                 3     0.00%          10                10     0.00%           4                 4     0.00%
Interpreters/InterpreterSelectWithUnionQuery.cpp             73                73     0.00%           7                 7     0.00%         135               135     0.00%          56                56     0.00%
TestUtils/mockExecutor.h                                      6                 1    83.33%           6                 1    83.33%          15                 3    80.00%           0                 0         -
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                      1915              1160    39.43%         231               103    55.41%        4193              1957    53.33%        1414               924    34.65%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18464      9605             47.98%    208007  96495        53.61%

full coverage report (for internal network access only)

@gengliqi gengliqi force-pushed the speed-parallel-input branch from 321ad28 to 11367da Compare July 8, 2022 15:28
@gengliqi
Copy link
Contributor Author

gengliqi commented Jul 8, 2022

/merge

@ti-chi-bot
Copy link
Member

@gengliqi: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 11367da

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 8, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Jul 8, 2022

Coverage for changed files

Filename                                                Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataStreams/ParallelAggregatingBlockInputStream.cpp         111               100     9.91%          13                 9    30.77%         162               145    10.49%          70                66     5.71%
DataStreams/ParallelAggregatingBlockInputStream.h             6                 4    33.33%           6                 4    33.33%          14                12    14.29%           0                 0         -
DataStreams/ParallelInputsProcessor.h                        67                19    71.64%          20                 6    70.00%         124                17    86.29%          30                 7    76.67%
DataStreams/UnionBlockInputStream.h                         102                38    62.75%          24                 8    66.67%         132                56    57.58%          46                23    50.00%
Debug/astToExecutor.cpp                                     578               210    63.67%          53                 9    83.02%        1516               545    64.05%         570               246    56.84%
Flash/Coprocessor/DAGQueryBlockInterpreter.cpp              240                70    70.83%          38                 4    89.47%         592               100    83.11%         152                50    67.11%
Flash/Coprocessor/InterpreterUtils.cpp                       19                 3    84.21%           3                 0   100.00%          45                 6    86.67%          18                 4    77.78%
Flash/tests/gtest_interpreter.cpp                           166                95    42.77%           6                 0   100.00%         520                 0   100.00%          20                20     0.00%
Interpreters/InterpreterSelectQuery.cpp                     542               542     0.00%          52                52     0.00%         930               930     0.00%         448               448     0.00%
Interpreters/InterpreterSelectQuery.h                         5                 5     0.00%           3                 3     0.00%          10                10     0.00%           4                 4     0.00%
Interpreters/InterpreterSelectWithUnionQuery.cpp             73                73     0.00%           7                 7     0.00%         135               135     0.00%          56                56     0.00%
TestUtils/mockExecutor.h                                      6                 1    83.33%           6                 1    83.33%          15                 3    80.00%           0                 0         -
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                      1915              1160    39.43%         231               103    55.41%        4195              1959    53.30%        1414               924    34.65%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18464      9605             47.98%    208009  96482        53.62%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot merged commit b62dc6a into pingcap:master Jul 8, 2022
SeaRise pushed a commit to SeaRise/tiflash that referenced this pull request Jul 8, 2022
SeaRise added a commit to SeaRise/tiflash that referenced this pull request Jul 8, 2022
Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this pull request Jul 12, 2022
…s in README (pingcap#5182)

close pingcap#5172, ref pingcap#5178

Enhancement: add a integrated test on DDL module (pingcap#5130)

ref pingcap#5129

Revert "Revise default background threads size" (pingcap#5176)

close pingcap#5177

chore: remove extra dyn cast (pingcap#5186)

close pingcap#5185

Add MPPReceiverSet, which includes ExchangeReceiver and CoprocessorReader (pingcap#5175)

ref pingcap#5095

DDL: Use Column Name Instead of Offset to Find the common handle cluster index (pingcap#5166)

close pingcap#5154

Add random failpoint in critical paths (pingcap#4876)

close pingcap#4807

Segment test framework (pingcap#5150)

close pingcap#5151

optimize ps v3 restore (pingcap#5163)

ref pingcap#4914

Fix build failed (pingcap#5196)

close pingcap#5195

feat: delta tree dispatching (pingcap#5199)

close pingcap#5200

feat: introduce specialized API to write fixed length data rapidly (pingcap#5181)

close pingcap#5183

Add gtest for Limit, TopN, Projection (pingcap#5187) (pingcap#5188)

close pingcap#5187

add `MPPTask::handleError()` (pingcap#5202)

ref pingcap#5095

Check result of starting grpc server (pingcap#5257)

close pingcap#5255

feat: add optimized routines for aarch64 (pingcap#5231)

close pingcap#5240

fix: aarch64-quick-fix (pingcap#5259)

close pingcap#5260

Update client-c to support ipv6 (pingcap#5270)

close pingcap#5247

upgrade prometheus-cpp to v1.0.1 (pingcap#5279)

ref pingcap#2103, close pingcap#5278

Fix README type error (pingcap#5273)

ref pingcap#5178

fix(cmake): make sure libc++ is utilized by tiflash-proxy (pingcap#5281)

close pingcap#5282

fix the wrong order of execution summary for list based executors (pingcap#5242)

close pingcap#5241

Schema: allow loading empty schema diff when the version grows up. (pingcap#5245)

close pingcap#5244

Optimize apply speed under heavy write pressure (pingcap#4883)

ref pingcap#4728

update proxy to raftstore-proxy-6.2 (pingcap#5287)

ref pingcap#4982

Flush segment cache when doing the compaction (pingcap#5284)

close pingcap#5179

metrics: Fix incorrect metrics for delta_merge tasks (pingcap#5061)

close pingcap#5055

dep: upgrade jemalloc (pingcap#5197)

close pingcap#5258

*: TiFlash pagectl/dttool use only-decryption mode (pingcap#5271)

close pingcap#5122

suppresion false positive report from tsan (pingcap#5303)

close pingcap#5088

Refine test framework code and tests (pingcap#5261)

close pingcap#5262

feat: add logical cpu cores and memory into grafana (pingcap#5124)

close pingcap#3821

Implement TimeToSec function push down (pingcap#5235)

close pingcap#5116

feat: implement shiftRight function push down (pingcap#5156)

close pingcap#5100

schema : make update to partition tables when 'set tiflash replica' (pingcap#5267)

close pingcap#5266

Replace initializer_list with vector for planner test framework (pingcap#5307)

close pingcap#5295

KVStore: decouple flush region and CompactLog with a new FFI fn_try_flush_data (pingcap#5283)

ref pingcap#5170

refine error message in mpptask (pingcap#5304)

ref pingcap#5095

Implement ReverseUTF8/Reverse function push down (pingcap#5233)

close pingcap#5111

Optimize comparision for collation `UTF8_BIN` and `UTF8MB4_BIN` (pingcap#5299)

ref pingcap#5294

feat : support set tiflash mode ddl action (pingcap#5256)

ref pingcap#5252

Add non-blocking functions for MPMCQueue (pingcap#5311)

close pingcap#5310

add random segment test for CI weekly (pingcap#5300)

close pingcap#5301

*: tidy FunctionString.cpp (pingcap#5312)

close pingcap#5313

ci: fix check-license github action (pingcap#5318)

close pingcap#5317

update proxy to raftstore-proxy-6.2 (pingcap#5316)

ref pingcap#4982

Change one `additional_input_at_end` to many streams in `ParallelInputsProcessor`  (pingcap#5274)

close pingcap#4856, close pingcap#5263

support fine grained shuffle for window function (pingcap#5048)

close pingcap#5142

feat: pushdown get_format into TiFlash (pingcap#5269)

close pingcap#5115

fix: format throw data truncated error (pingcap#5272)

close pingcap#4891

Print content of columns for gtest (pingcap#5243)

close pingcap#5203

*: also enable O3 for aarch64 (pingcap#5338)

close pingcap#5342

Add debug image build target for CentOS7 (pingcap#5344)

close pingcap#5343

*: mini refactor (pingcap#5326)

close pingcap#4739

Refactor initialize of background pool (pingcap#5190)

close pingcap#5189

delete copy/move ctor of MPMCQueue explicitly (pingcap#5328)

close pingcap#5329

Introduce proxy_server and new-mock-engine-store (pingcap#5319)

ref pingcap#5170

fix: incorrect uptime in grafana panel

Signed-off-by: Lloyd-Pottiger <[email protected]>
Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this pull request Jul 19, 2022
ywqzzy added a commit to ywqdev/tiflash that referenced this pull request Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
5 participants