Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/chunk: optimize (*ListInDisk).GetChunk and add a fast row container reader #45130

Merged
merged 3 commits into from
Jul 4, 2023

Conversation

YangKeao
Copy link
Member

@YangKeao YangKeao commented Jul 3, 2023

What problem does this PR solve?

Issue Number: close #45125

Problem Summary:

The existing reading method of RowContainer (GetChunk(...)) is not fast enough for dumping a lot of rows from disk (for the cursorFetch use case).

The existing Iterator4RowContainer is even slower, as it allocates a new chunk for each row 🤦.

This PR is extracted from #44730 (with a some refractor).

What is changed and how it works?

This PR pipelines the IO and CPU calculation, to make full use of the IO bandwidth. It should also help other features using rowContainer, as GetChunk is now much faster.

The performance of existing benchmark BenchmarkListInDisk_GetChunk increases from 2877471ns/op to 462864ns/op

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

@YangKeao YangKeao added the release-note-none Denotes a PR that doesn't merit a release note. label Jul 3, 2023
@ti-chi-bot ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 3, 2023
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Jul 3, 2023
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Jul 3, 2023
@YangKeao YangKeao force-pushed the row-container-reader branch 2 times, most recently from c24a452 to 2b1480b Compare July 3, 2023 09:24
@YangKeao YangKeao changed the title util/chunk: add a fast row container reader util/chunk: optimize GetChunk and add a fast row container reader Jul 3, 2023
@YangKeao YangKeao changed the title util/chunk: optimize GetChunk and add a fast row container reader util/chunk: optimize (*ListInDisk).GetChunk and add a fast row container reader Jul 3, 2023
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Jul 3, 2023
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Jul 3, 2023
@YangKeao YangKeao force-pushed the row-container-reader branch from 2b1480b to 108efd2 Compare July 3, 2023 09:29
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Jul 3, 2023
@YangKeao
Copy link
Member Author

YangKeao commented Jul 4, 2023

As tested, spawning a new goroutine to read is faster for long row, but slower for short rows. It's because creating a new goroutine to just read 64KB is a waste 🤦.

4096            1024            512             64
8562687 ns/op   2573457 ns/op   1363930 ns/op   370070 ns/op    (serial)
6035161 ns/op   2086350 ns/op   1249451 ns/op   439846 ns/op    (parallel)
10592875 ns/op  3968269 ns/op   3223435 ns/op   2217039 ns/op   (master)

But both method is much faster than the original implementation.

util/chunk/row_container_test.go Outdated Show resolved Hide resolved
}
}

func TestCloseRowContainerReader(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the test means? seems the same as the last one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only reads 8.5 chunks, but doesn't drain out the whole row container. It tests whether the reader can be closed successfully (and no goroutine leaks) when it doesn't reach the end.

@YangKeao YangKeao requested a review from wshwsh12 July 4, 2023 06:48
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jul 4, 2023
@YangKeao
Copy link
Member Author

YangKeao commented Jul 4, 2023

/retest

@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Jul 4, 2023
@YangKeao
Copy link
Member Author

YangKeao commented Jul 4, 2023

/retest

@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Jul 4, 2023
@YangKeao YangKeao requested a review from xhebox July 4, 2023 09:26
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 4, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 4, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wshwsh12, xhebox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 4, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-07-04 06:52:48.292586709 +0000 UTC m=+100400.226220123: ☑️ agreed by wshwsh12.
  • 2023-07-04 09:33:27.626487348 +0000 UTC m=+110039.560120770: ☑️ agreed by xhebox.

@ti-chi-bot ti-chi-bot bot merged commit ab4c06a into pingcap:master Jul 4, 2023
@YangKeao YangKeao added needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.2 labels Jul 6, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #45203.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #45204.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.2: #45205.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a fast row container reader to dump rows from disk
4 participants