Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageStorage: improvement for the workload some pages not updated for a long time #2473

Closed
JaySon-Huang opened this issue Jul 23, 2021 · 1 comment · Fixed by #3094
Closed
Assignees
Labels
type/bugfix This PR fixes a bug.

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Jul 23, 2021

After #2436, we can keep the number of legacy files around 100.
But it may generate a HUGE PageFile with lots of pages when GC keeps pushing forward.

I split the migrate entries into smaller batches, trying to control the memory peak while running DataCompactor:: mergeValidPages.
However, if there are some pages not updated for a long time. We need to rewrite those pages to another PageFile with a higher "compact_sequence" so that it won't stop LegacyCompactor from compacting the meta part. It will cause high read && write amplification while doing this GC.

Assume that:

  1. there is a PageFile (generated by DataCompactor) (naming pf1_0) whose size is 2GiB, and all pages on it are in a WriteBatch with sequence=100. The pages on pf1_0 are not updated for a long time, the valid rate of it is high (0.8).
  2. there are some PageFiles with a lower valid rate (0.2) (naming pf2_0,pf3_0) with the highest WriteBatch sequence=900
  3. there are many legacy PageFile and we need to compact pf1_0,pf2_0,pf3_0 into a new PageFile pf3_1 with the sequence=900 to push the GC forward. Or it will cause problems like this Reduce the memory cost when there are stale snapshots for PageStorage #2199 (comment)

The key points are:

  • Need to generate pf3_1 with sequence=900 instead of sequence=100, or it will block legacy compactor from compacting those legacy files
  • Can not directly rename pf1_0/data to pf3_1/data, cause it may be reading by other threads
  • There are still some valid pages on pf2_0,pf3_0 we need to do migration

Here may be a solution for reducing the read/write amplification without rewriting the whole PageStorage:

  1. Create .tmp.pf3_1/data as a hard link to pf1_0/data
  2. Read those entries on pf1_0/meta and let those entries point to .tmp.pf3_1/data, with sequence=900, saving to .tmp.pf3_1/meta
  3. Rename .tmp.pf3_1 to pf3_1 (data "compaction" for high valid rate PageFiles)
  4. Create another PageFile .tmp.pf3_2 and migrate those pages on pf2_0,pf3_0 to .tmp.pf3_2 with sequence=900
  5. Rename .tmp.pf3_2 to pf3_2 (data compaction for low valid rate PageFiles)

Cause pf3_1/data is a hard link to pf1_0/data, they share the same inode. So the read/write amplification is greatly reduced.
But we should not append those valid pages on pf2_0,pf3_0 to pf3_1, or it may cause other problems(?) for those threads already reading on pf1_0.

@flowbehappy @jiaqizho PTAL


Finally, we need to redesign the PageStorage in the near future. Splitting the "meta" part from the "data" part could greatly reduce the complexity of GC strategies. But it is not discussed in this issue.

Originally posted by @JaySon-Huang in #2382 (comment)

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Dec 28, 2021

Following bug(#3438) and its fix(#3645)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants