Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug that duplicated page file block GC (#2170) #2186

Merged

Conversation

ti-srebot
Copy link
Collaborator

@ti-srebot ti-srebot commented Jun 17, 2021

cherry-pick #2170 to release-5.1
You can switch your code base to this Pull Request by using git-extras:

# In tics repo:
git pr https://github.com/pingcap/tics/pull/2186

After apply modifications, you can push your change to this PR via:

git push [email protected]:ti-srebot/tics.git pr/2186:release-5.1-ec5f976a8fb8

What problem does this PR solve?

Issue Number: close #2169

Problem Summary:
In DataCompactor::migratePages, we avoid generating a PageFile that already exists, but we didn't check whether its "Legacy" mode exists or not.
https://github.com/pingcap/tics/blob/74c69fb1d35da3582cb9279ecb4d8597e4a78d00/dbms/src/Storages/Page/gc/DataCompactor.cpp#L150-L158
https://github.com/pingcap/tics/blob/74c69fb1d35da3582cb9279ecb4d8597e4a78d00/dbms/src/Storages/Page/PageStorage.cpp#L1137-L1145

For example,

  1. We generate a PageFile "page_1000_1" for storing GC data
  2. Then the data in "page_1000_1" have been migrated to another file, and "page_1000_1" become "legacy.page_1000_1"
  3. Maybe some old files are held by snapshot for a long time, we happen to generate a PageFile "page_1000_1" again, then we have both "page_1000_1" and "legacy.page_1000_1" at the same time
  4. After the "page_1000_1" generate in step 3 become useless, we want to set it to "legacy" and remove its data, but we find "legacy.page_1000_1" already exists, then it will throw an exception and stop us from GCing useless data
  5. Finally, the TiFlash node will full of data in "t_{table_id}/log" (almost 1TiB in our case) and make the load balance bad between multiple TiFlash nodes

What is changed and how it works?

Check whether page file with same <id, level>, status in [Formal, Legacy] exists before generating PageFile for GC data

Related changes

  • Need to cherry-pick to the release branch: 5.1, 5.0, 4.0

Check List

Tests

  • Unit test

Side effects

Release note

  • Fix the bug that TiFlash can not GC delta data under rare case

@ti-srebot ti-srebot added CHERRY-PICK cherry pick status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug. labels Jun 17, 2021
@ti-srebot ti-srebot requested a review from flowbehappy June 17, 2021 07:35
@ti-srebot ti-srebot added this to the v5.1.0 milestone Jun 17, 2021
Copy link
Contributor

@flowbehappy flowbehappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JaySon-Huang
Copy link
Contributor

/run-all-tests

@JaySon-Huang JaySon-Huang merged commit 4da169f into pingcap:release-5.1 Jun 18, 2021
@JaySon-Huang JaySon-Huang deleted the release-5.1-ec5f976a8fb8 branch June 18, 2021 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CHERRY-PICK cherry pick status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants