Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The store used_size of compute node is not correct after restarting with smaller capacity #8920

Closed
JaySon-Huang opened this issue Apr 9, 2024 · 0 comments · Fixed by #8921
Assignees
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/storage severity/minor type/bug The issue is confirmed as a bug.

Comments

@JaySon-Huang
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Deploy disagg arch and let the compute node download cache into local disk with a large capacity, for example, 500GB
  2. Restart the compute node with a smaller capacity, for example, 100GB. Without cleaning the local cached files.
  3. The used_size will be a negative number

Reason:
FileCache::restoreDMFile will call FileCache::removeDiskFile to remove the files that larger than capacity. However, it will also call capacity_metrics->freeUsedSize to subtract the used_size, which make the used_size become a negative number.

void FileCache::restoreDMFile(const std::filesystem::directory_entry & dmfile_entry)
{
RUNTIME_CHECK_MSG(dmfile_entry.is_directory(), "{} is not a directory", dmfile_entry.path());
for (const auto & file_entry : std::filesystem::directory_iterator(dmfile_entry.path()))
{
RUNTIME_CHECK_MSG(file_entry.is_regular_file(), "{} is not a regular file", file_entry.path());
auto fname = file_entry.path().string();
if (unlikely(isTemporaryFilename(fname)))
{
removeDiskFile(fname);
}
else
{
auto file_type = getFileType(fname);
auto & table = tables[static_cast<UInt64>(file_type)];
auto size = file_entry.file_size();
if (canCache(file_type) && cache_capacity - cache_used >= size)
{
table.set(
toS3Key(fname),
std::make_shared<FileSegment>(fname, FileSegment::Status::Complete, size, file_type));
capacity_metrics->addUsedSize(fname, size);
cache_used += size;
CurrentMetrics::set(CurrentMetrics::DTFileCacheUsed, cache_used);
}
else
{
removeDiskFile(fname);
}
}
}
}

Note that it does not affect any functionality, but only the used_size shown in Grafana is negative. And restarting the compute node again can workaround the issue.

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

master, 7.5

@JaySon-Huang JaySon-Huang added type/bug The issue is confirmed as a bug. severity/minor component/storage labels Apr 9, 2024
@JinheLin JinheLin added affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels Apr 9, 2024
@JinheLin JinheLin self-assigned this Apr 9, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in #8921 Apr 9, 2024
ti-chi-bot bot pushed a commit that referenced this issue Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/storage severity/minor type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants