You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After #2436, we can keep the number of legacy files around 100.
But it may generate a HUGE PageFile with lots of pages when GC keeps pushing forward.
I split the migrate entries into smaller batches, trying to control the memory peak while running DataCompactor:: mergeValidPages.
However, if there are some pages not updated for a long time. We need to rewrite those pages to another PageFile with a higher "compact_sequence" so that it won't stop LegacyCompactor from compacting the meta part. It will cause high read && write amplification while doing this GC.
Assume that:
there is a PageFile (generated by DataCompactor) (naming pf1_0) whose size is 2GiB, and all pages on it are in a WriteBatch with sequence=100. The pages on pf1_0 are not updated for a long time, the valid rate of it is high (0.8).
there are some PageFiles with a lower valid rate (0.2) (naming pf2_0,pf3_0) with the highest WriteBatch sequence=900
Need to generate pf3_1 with sequence=900 instead of sequence=100, or it will block legacy compactor from compacting those legacy files
Can not directly rename pf1_0/data to pf3_1/data, cause it may be reading by other threads
There are still some valid pages on pf2_0,pf3_0 we need to do migration
Here may be a solution for reducing the read/write amplification without rewriting the whole PageStorage:
Create .tmp.pf3_1/data as a hard link to pf1_0/data
Read those entries on pf1_0/meta and let those entries point to .tmp.pf3_1/data, with sequence=900, saving to .tmp.pf3_1/meta
Rename .tmp.pf3_1 to pf3_1 (data "compaction" for high valid rate PageFiles)
Create another PageFile .tmp.pf3_2 and migrate those pages on pf2_0,pf3_0 to .tmp.pf3_2 with sequence=900
Rename .tmp.pf3_2 to pf3_2 (data compaction for low valid rate PageFiles)
Cause pf3_1/data is a hard link to pf1_0/data, they share the same inode. So the read/write amplification is greatly reduced.
But we should not append those valid pages on pf2_0,pf3_0 to pf3_1, or it may cause other problems(?) for those threads already reading on pf1_0.
Finally, we need to redesign the PageStorage in the near future. Splitting the "meta" part from the "data" part could greatly reduce the complexity of GC strategies. But it is not discussed in this issue.
After #2436, we can keep the number of legacy files around 100.
But it may generate a HUGE PageFile with lots of pages when GC keeps pushing forward.
I split the migrate entries into smaller batches, trying to control the memory peak while running
DataCompactor:: mergeValidPages
.However, if there are some pages not updated for a long time. We need to rewrite those pages to another PageFile with a higher "compact_sequence" so that it won't stop LegacyCompactor from compacting the meta part. It will cause high read && write amplification while doing this GC.
Assume that:
DataCompactor
) (namingpf1_0
) whose size is 2GiB, and all pages on it are in a WriteBatch with sequence=100. The pages onpf1_0
are not updated for a long time, the valid rate of it is high (0.8).pf2_0
,pf3_0
) with the highest WriteBatch sequence=900pf1_0
,pf2_0
,pf3_0
into a new PageFilepf3_1
with the sequence=900 to push the GC forward. Or it will cause problems like this Reduce the memory cost when there are stale snapshots for PageStorage #2199 (comment)The key points are:
pf3_1
with sequence=900 instead of sequence=100, or it will block legacy compactor from compacting those legacy filespf1_0/data
topf3_1/data
, cause it may be reading by other threadspf2_0
,pf3_0
we need to do migrationHere may be a solution for reducing the read/write amplification without rewriting the whole PageStorage:
.tmp.pf3_1/data
as a hard link topf1_0/data
pf1_0/meta
and let those entries point to.tmp.pf3_1/data
, with sequence=900, saving to.tmp.pf3_1/meta
.tmp.pf3_1
topf3_1
(data "compaction" for high valid rate PageFiles).tmp.pf3_2
and migrate those pages onpf2_0
,pf3_0
to.tmp.pf3_2
with sequence=900.tmp.pf3_2
topf3_2
(data compaction for low valid rate PageFiles)Cause
pf3_1/data
is a hard link topf1_0/data
, they share the same inode. So the read/write amplification is greatly reduced.But we should not append those valid pages on
pf2_0
,pf3_0
topf3_1
, or it may cause other problems(?) for those threads already reading onpf1_0
.@flowbehappy @jiaqizho PTAL
Finally, we need to redesign the PageStorage in the near future. Splitting the "meta" part from the "data" part could greatly reduce the complexity of GC strategies. But it is not discussed in this issue.
Originally posted by @JaySon-Huang in #2382 (comment)
The text was updated successfully, but these errors were encountered: