-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the local cache fallback behavior for corrupted pages #18498
Improve the local cache fallback behavior for corrupted pages #18498
Conversation
0d4b82f
to
c715238
Compare
c715238
to
9b39619
Compare
@@ -927,10 +929,20 @@ private int getPage(PageInfo pageInfo, int pageOffset, int bytesToRead, | |||
// data read from page store is inconsistent from the metastore | |||
LOG.error("Failed to read page {}: supposed to read {} bytes, {} bytes actually read", | |||
pageInfo.getPageId(), bytesToRead, ret); | |||
target.offset(originOffset); //reset the offset | |||
//best efforts to delete the corrupted file without acquire the write lock | |||
deletePage(pageInfo, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete the metadata as well?
LOG.error("Data corrupted page {} from pageStore", pageInfo.getPageId(), e); | ||
target.offset(originOffset); //reset the offset | ||
//best efforts to delete the corrupted file without acquire the write lock | ||
deletePage(pageInfo, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete the metadata as well?
@Test | ||
public void testPageDataFileCorrupted() throws Exception | ||
{ | ||
int pages = 10; | ||
int fileSize = mPageSize * pages; | ||
byte[] testData = BufferUtils.getIncreasingByteArray(fileSize); | ||
ByteArrayCacheManager manager = new ByteArrayCacheManager(); | ||
//by default local cache fallback is not enabled, the read should fail for any error | ||
LocalCacheFileInStream streamWithOutFallback = setupWithSingleFile(testData, manager); | ||
|
||
sConf.set(PropertyKey.USER_CLIENT_CACHE_FALLBACK_ENABLED, true); | ||
LocalCacheFileInStream streamWithFallback = setupWithSingleFile(testData, manager); | ||
Assert.assertEquals(100, streamWithFallback.positionedRead(0, new byte[10], 100, 100)); | ||
Assert.assertEquals(1, | ||
MetricsSystem.counter(MetricKey.CLIENT_CACHE_POSITION_READ_FALLBACK.getName()).getCount()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method duplicated with the method testPositionReadFallBack()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh, I see, I got your point, yes, this test is covered by testPositionReadFallBack. I will remove this test case. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, and I left some comments.
alluxio-bot, merge this please. |
What changes are proposed in this pull request?
Throw a PageCorruptedException when the length of page inconsistent with the metadata
Do our best efforts to delete the corrupted page file
Reset the offset of buffer when we found the data has been corrupted to avoid ArrayOutOfBound exception.
Why are the changes needed?
We found the cache make presto keep failing when some of the page file got corrupted
Does this PR introduce any user facing changes?
No