Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize apply speed under heavy write pressure #4883

Merged
merged 10 commits into from
Jul 4, 2022

Conversation

lidezhu
Copy link
Contributor

@lidezhu lidezhu commented May 13, 2022

What problem does this PR solve?

Issue Number: ref #4728

Problem Summary: When tiflash is under heavy write pressure, it will consume a lot of write throughput and cause some latency on the process of apply raft log, which will make tiflash consume a log memory.

What is changed and how it works?

  1. Only allow one delta flush process on a segment to process at any time to avoid unnecessary throughput consumption;
  2. When processing CompactLog raft command, just try to flush the unsaved value in storage once and return Persist if the flush succed or None if the flush failed.(Originally we try to flush until it success)
  3. Increase foreground flush threshold for write thread.
  4. Avoid add minor compaction task to background if the segment is flushing to save some throughput and reduce lock contention on the segment.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    Test step:
  1. perform large transaction update through 5 tidb instance at the same time and the total transaction size of all tidb instance is about 24.5 GB;
    Before optimize:
    image
    After optimize:
    image

Detail result(unit: GB)

  1st run 2nd run 3rd run 4th run 5th run 6th run) 7th run average value amplification factor
before optimization 40.8 37.2 37.3 36 34.7 33.6 34.5 36.3 1.48
after optimization 38.1 29.4 29 34.3 29.6 32.6 37.7 32.9 1.34

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented May 13, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • CalvinNeo
  • flowbehappy

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels May 13, 2022
@lidezhu lidezhu changed the title Optimize apply speed under heavy write pressure [WIP] Optimize apply speed under heavy write pressure May 13, 2022
@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 13, 2022
@lidezhu lidezhu force-pushed the optimize-apply-0512 branch from 98e9c3e to e148557 Compare May 26, 2022 13:45
@lidezhu lidezhu changed the title [WIP] Optimize apply speed under heavy write pressure Optimize apply speed under heavy write pressure May 26, 2022
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2022
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 23, 2022
@lidezhu lidezhu force-pushed the optimize-apply-0512 branch from 887b2a7 to f8f7206 Compare June 24, 2022 13:27
@lidezhu lidezhu force-pushed the optimize-apply-0512 branch from c551a81 to c16bb0a Compare June 26, 2022 12:23
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 26, 2022

/run-all-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jun 26, 2022

Coverage for changed files

Filename                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DeltaMerge/Delta/DeltaValueSpace.cpp         217                92    57.60%          16                 0   100.00%         175                41    76.57%          98                58    40.82%
DeltaMerge/Delta/DeltaValueSpace.h           108                21    80.56%          49                 3    93.88%         122                19    84.43%          36                13    63.89%
DeltaMerge/DeltaMergeStore.cpp              1465               537    63.34%          67                 7    89.55%        2054               521    74.63%         852               405    52.46%
DeltaMerge/DeltaMergeStore.h                  41                11    73.17%          19                 2    89.47%          91                26    71.43%          42                 9    78.57%
DeltaMerge/Segment.h                          35                 5    85.71%          23                 3    86.96%          32                 4    87.50%           6                 3    50.00%
IManageableStorage.h                          20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
StorageDeltaMerge.cpp                        679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
StorageDeltaMerge.h                           11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Transaction/KVStore.cpp                      391                79    79.80%          41                 3    92.68%         682                84    87.68%         210                78    62.86%
Transaction/KVStore.h                          5                 3    40.00%           5                 3    40.00%           5                 3    40.00%           0                 0         -
Transaction/RegionTable.cpp                  280               141    49.64%          27                 9    66.67%         347               141    59.37%         124                82    33.87%
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                       3252              1241    61.84%         336                80    76.19%        4870              1608    66.98%        1746               891    48.97%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18385      9659             47.46%    206720  96734        53.21%

full coverage report (for internal network access only)

Copy link
Contributor

@flowbehappy flowbehappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 27, 2022
@flowbehappy
Copy link
Contributor

Can you add some metrics to show the result of this optimization? @lidezhu

@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 27, 2022

Can you add some metrics to show the result of this optimization? @lidezhu

Done

@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 27, 2022

/rebuild

@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 27, 2022

/run-all-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jun 27, 2022

Coverage for changed files

Filename                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DeltaMerge/Delta/DeltaValueSpace.cpp         217                92    57.60%          16                 0   100.00%         175                41    76.57%          98                58    40.82%
DeltaMerge/Delta/DeltaValueSpace.h           108                21    80.56%          49                 3    93.88%         122                19    84.43%          36                13    63.89%
DeltaMerge/DeltaMergeStore.cpp              1465               537    63.34%          67                 7    89.55%        2054               521    74.63%         852               406    52.35%
DeltaMerge/DeltaMergeStore.h                  41                11    73.17%          19                 2    89.47%          91                26    71.43%          42                 9    78.57%
DeltaMerge/Segment.h                          35                 5    85.71%          23                 3    86.96%          32                 4    87.50%           6                 3    50.00%
IManageableStorage.h                          20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
StorageDeltaMerge.cpp                        679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
StorageDeltaMerge.h                           11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Transaction/KVStore.cpp                      391                79    79.80%          41                 3    92.68%         682                84    87.68%         210                78    62.86%
Transaction/KVStore.h                          5                 3    40.00%           5                 3    40.00%           5                 3    40.00%           0                 0         -
Transaction/RegionTable.cpp                  280               141    49.64%          27                 9    66.67%         347               141    59.37%         124                82    33.87%
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                       3252              1241    61.84%         336                80    76.19%        4870              1608    66.98%        1746               892    48.91%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18389      9660             47.47%    206766  96777        53.19%

full coverage report (for internal network access only)

Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 4, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jul 4, 2022

/merge

@ti-chi-bot
Copy link
Member

@lidezhu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 3e6a777

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 4, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Jul 4, 2022

Coverage for changed files

Filename                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DeltaMerge/Delta/DeltaValueSpace.cpp         217                92    57.60%          16                 0   100.00%         175                41    76.57%          98                58    40.82%
DeltaMerge/Delta/DeltaValueSpace.h           108                21    80.56%          49                 3    93.88%         122                19    84.43%          36                13    63.89%
DeltaMerge/DeltaMergeStore.cpp              1465               537    63.34%          67                 7    89.55%        2054               521    74.63%         852               406    52.35%
DeltaMerge/DeltaMergeStore.h                  41                11    73.17%          19                 2    89.47%          91                26    71.43%          42                 9    78.57%
DeltaMerge/Segment.h                          35                 5    85.71%          23                 3    86.96%          32                 4    87.50%           6                 3    50.00%
IManageableStorage.h                          20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
StorageDeltaMerge.cpp                        679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
StorageDeltaMerge.h                           11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Transaction/KVStore.cpp                      391                78    80.05%          41                 3    92.68%         682                84    87.68%         210                76    63.81%
Transaction/KVStore.h                          5                 3    40.00%           5                 3    40.00%           5                 3    40.00%           0                 0         -
Transaction/RegionTable.cpp                  280               141    49.64%          27                 9    66.67%         347               141    59.37%         124                82    33.87%
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                       3252              1240    61.87%         336                80    76.19%        4870              1608    66.98%        1746               890    49.03%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18410      9642             47.63%    207179  96569        53.39%

full coverage report (for internal network access only)

@lidezhu
Copy link
Contributor Author

lidezhu commented Jul 4, 2022

/run-integration-test

@ti-chi-bot ti-chi-bot merged commit 6da631c into pingcap:master Jul 4, 2022
@lidezhu lidezhu deleted the optimize-apply-0512 branch July 4, 2022 08:58
Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this pull request Jul 12, 2022
…s in README (pingcap#5182)

close pingcap#5172, ref pingcap#5178

Enhancement: add a integrated test on DDL module (pingcap#5130)

ref pingcap#5129

Revert "Revise default background threads size" (pingcap#5176)

close pingcap#5177

chore: remove extra dyn cast (pingcap#5186)

close pingcap#5185

Add MPPReceiverSet, which includes ExchangeReceiver and CoprocessorReader (pingcap#5175)

ref pingcap#5095

DDL: Use Column Name Instead of Offset to Find the common handle cluster index (pingcap#5166)

close pingcap#5154

Add random failpoint in critical paths (pingcap#4876)

close pingcap#4807

Segment test framework (pingcap#5150)

close pingcap#5151

optimize ps v3 restore (pingcap#5163)

ref pingcap#4914

Fix build failed (pingcap#5196)

close pingcap#5195

feat: delta tree dispatching (pingcap#5199)

close pingcap#5200

feat: introduce specialized API to write fixed length data rapidly (pingcap#5181)

close pingcap#5183

Add gtest for Limit, TopN, Projection (pingcap#5187) (pingcap#5188)

close pingcap#5187

add `MPPTask::handleError()` (pingcap#5202)

ref pingcap#5095

Check result of starting grpc server (pingcap#5257)

close pingcap#5255

feat: add optimized routines for aarch64 (pingcap#5231)

close pingcap#5240

fix: aarch64-quick-fix (pingcap#5259)

close pingcap#5260

Update client-c to support ipv6 (pingcap#5270)

close pingcap#5247

upgrade prometheus-cpp to v1.0.1 (pingcap#5279)

ref pingcap#2103, close pingcap#5278

Fix README type error (pingcap#5273)

ref pingcap#5178

fix(cmake): make sure libc++ is utilized by tiflash-proxy (pingcap#5281)

close pingcap#5282

fix the wrong order of execution summary for list based executors (pingcap#5242)

close pingcap#5241

Schema: allow loading empty schema diff when the version grows up. (pingcap#5245)

close pingcap#5244

Optimize apply speed under heavy write pressure (pingcap#4883)

ref pingcap#4728

update proxy to raftstore-proxy-6.2 (pingcap#5287)

ref pingcap#4982

Flush segment cache when doing the compaction (pingcap#5284)

close pingcap#5179

metrics: Fix incorrect metrics for delta_merge tasks (pingcap#5061)

close pingcap#5055

dep: upgrade jemalloc (pingcap#5197)

close pingcap#5258

*: TiFlash pagectl/dttool use only-decryption mode (pingcap#5271)

close pingcap#5122

suppresion false positive report from tsan (pingcap#5303)

close pingcap#5088

Refine test framework code and tests (pingcap#5261)

close pingcap#5262

feat: add logical cpu cores and memory into grafana (pingcap#5124)

close pingcap#3821

Implement TimeToSec function push down (pingcap#5235)

close pingcap#5116

feat: implement shiftRight function push down (pingcap#5156)

close pingcap#5100

schema : make update to partition tables when 'set tiflash replica' (pingcap#5267)

close pingcap#5266

Replace initializer_list with vector for planner test framework (pingcap#5307)

close pingcap#5295

KVStore: decouple flush region and CompactLog with a new FFI fn_try_flush_data (pingcap#5283)

ref pingcap#5170

refine error message in mpptask (pingcap#5304)

ref pingcap#5095

Implement ReverseUTF8/Reverse function push down (pingcap#5233)

close pingcap#5111

Optimize comparision for collation `UTF8_BIN` and `UTF8MB4_BIN` (pingcap#5299)

ref pingcap#5294

feat : support set tiflash mode ddl action (pingcap#5256)

ref pingcap#5252

Add non-blocking functions for MPMCQueue (pingcap#5311)

close pingcap#5310

add random segment test for CI weekly (pingcap#5300)

close pingcap#5301

*: tidy FunctionString.cpp (pingcap#5312)

close pingcap#5313

ci: fix check-license github action (pingcap#5318)

close pingcap#5317

update proxy to raftstore-proxy-6.2 (pingcap#5316)

ref pingcap#4982

Change one `additional_input_at_end` to many streams in `ParallelInputsProcessor`  (pingcap#5274)

close pingcap#4856, close pingcap#5263

support fine grained shuffle for window function (pingcap#5048)

close pingcap#5142

feat: pushdown get_format into TiFlash (pingcap#5269)

close pingcap#5115

fix: format throw data truncated error (pingcap#5272)

close pingcap#4891

Print content of columns for gtest (pingcap#5243)

close pingcap#5203

*: also enable O3 for aarch64 (pingcap#5338)

close pingcap#5342

Add debug image build target for CentOS7 (pingcap#5344)

close pingcap#5343

*: mini refactor (pingcap#5326)

close pingcap#4739

Refactor initialize of background pool (pingcap#5190)

close pingcap#5189

delete copy/move ctor of MPMCQueue explicitly (pingcap#5328)

close pingcap#5329

Introduce proxy_server and new-mock-engine-store (pingcap#5319)

ref pingcap#5170

fix: incorrect uptime in grafana panel

Signed-off-by: Lloyd-Pottiger <[email protected]>
Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this pull request Jul 19, 2022
@lidezhu lidezhu added the needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. label Aug 22, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Aug 22, 2022

/run-cherry-picker

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5668.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants