Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix the CachingFileIO had wrong length and support pin/unpin for disk cache #18892

Merged
merged 5 commits into from
Mar 10, 2023

Conversation

zombee0
Copy link
Contributor

@zombee0 zombee0 commented Mar 3, 2023

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

fix CachingFileIO has wrong length(),
diskCache support pin and unpin to avoid deleting
a file which is using.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto backported to target branch
    • 3.0
    • 2.5
    • 2.4
    • 2.3

@mergify mergify bot assigned zombee0 Mar 3, 2023
@github-actions github-actions bot removed the be-build label Mar 6, 2023
@github-actions github-actions bot removed the be-build label Mar 7, 2023
@github-actions github-actions bot removed the be-build label Mar 7, 2023
@github-actions github-actions bot removed the be-build label Mar 8, 2023
@zombee0 zombee0 changed the title [Just for Test] turn on disk cache [BugFix] fix the CachingFileIO had wrong length and support pin/unpin for disk cache Mar 8, 2023
@zombee0
Copy link
Contributor Author

zombee0 commented Mar 8, 2023

run starrocks_fe_unittest

@@ -142,10 +147,12 @@ private CacheEntry(long length, List<ByteBuffer> buffers) {
private static class DiskCacheEntry {
private final long length;
private final InputFile inputFile;
private int useCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomic int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need for that. we use cache.asMap.computeIfPresent that is thread-safe for the cache. so pin() and unpin() are both safe. as discussed by ben-manes/caffeine#513 compute will replace the entry with the compute() result, and Weights are measured and recorded when entries are inserted into or updated in the cache, and are thus effectively static during the lifetime of a cache entry, the weight is compute when insert, so use in weigher is also safe.

} catch (Exception e) {
LOG.warn("failed on deleting file :" + hadoopOutputFile.getPath());
// use sync CacheWriter to avoid delete file newly generated by another thread
.writer(new CacheWriter<String, DiskCacheEntry>() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please upgrade and use .evictionListener((key, value, cause) -> ...)? This handles evictions only, you'd need to handle the explicit removals via a Map compute.

CacheWriter was deprecated in v2 and removed in v3. The write method was not as useful and had a lot of confusing details. That interface approach just didn't work well with Map or AsyncCache, where computations and an eviction listener were more explicit and clear. Plus upgrading means improvements and bug fixes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ben-manes That's so nice of you.
In my case, I just use delete() of CacheWriter to delete the file on disk when DiskCacheEntry evicting, and I don't want to delete the disk file when I use computeIfPresent() to update useCount, so I think removalListener is not suitable.
You give me another choice, use .evictionListener I will check it if I can upgrade the version and use it.
and I have another question, dose .evictionListener((key, value, cause) -> ...) and .removalListener work sync with the eviction, is there any possibility that evictionListener removes the resource may used by a new entry which means that I want to evict an entry and delete the file and insert new entry (same key and same file name)with new file, I want that execute one by one, but delete happened after insert, that means my new file is deleted, is that possible? thank you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caffeine.evictionListener runs within the atomic scope of the ConcurrentHashMap.compute method, so other writes to that key are blocked (just like CacheWriter.delete). This way the deletion cannot run concurrent with the insertion because the entry lock must be held, so you get key-ordered operations. If an eviction selects a victim that is no longer eligible by the time it acquired the entry lock (pinned), then it was "resurrected" and the attempt no-ops (likely finding another victim). Similarly if the client tries to pin and the entry was being evicted, then when the client acquires the lock it will be to an entry reservation (ala computeIfAbsent). So it should give you the atomicity that you need.

As you said, Caffeine.removalListener runs after the atomic operation completed so you lose key-ordered actions and may suffer races. Of course the benefit is that you are not making writes more expensive to do some other work, so it's ideal when ordering does not matter.

AsyncCache means that a synchronous listener for removals only makes sense for evictions, as a Map.remove could be of an in-flight future. Obviously we don't want to force Map.remove to perform a future.join to wait in order to notify the listener under the entry's lock. That's why evictionListener only applies to expired / size / collected causes and not the explicit / replaced ones. Therefore if you have an evictionListener to delete the file, an explicit Map.remove won't call the delete for you and you need to use a compute to both delete it and remove the entry. Therefore, CacheWriter was replaced by using Map.compute and evictionListener instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ben-manes Thank you so much! I think it ok that use CacheWriter in my case. because I don't use Map.compute to generate a null entry so Map.compute only has replace not 'removal', evictionListener is also enough.
if Map.compute generate null entry then 'Map.compute' needs to deal with the 'removal action', am I right?
for invalidate CacheWriter can deal that but we need deal it explicitly if we use evictionListener, right?
Thanks again.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right. My concern is that CacheWriter was removed in v3 (2/2021) so you’d have to stay on v2. I just don’t want that to be an annoying surprise so chimed in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ben-manes You are so kind, I will consider use new version and upgrade need check all the use case in our system. I will take it, thank you again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ben-manes , I upgrade to 2.9.3 because up to now we work on jdk8, and I use evictionListener to deal with evicted resource, and for invalidating I use Map.computeIfPresent to deal with invalidated resource and return null to remove the entry from cache.

Youngwb
Youngwb previously approved these changes Mar 10, 2023
zombee0 added 5 commits March 10, 2023 14:17
fix CachingFileIO get wrong length(),
diskCache support pin and unpin to avoid deleting
a file which is using.

Signed-off-by: zombee0 <[email protected]>
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug D 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 7 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage Check]

😞 fail : 8 / 133 (06.02%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/connector/iceberg/io/IcebergCachingFileIO.java 5 112 04.46% [152, 160, 161, 163, 165, 166, 167, 173, 174, 176, 177, 180, 181, 183, 185, 186, 188, 189, 190, 191, 193, 196, 197, 199, 201, 202, 203, 204, 205, 206, 207, 211, 213, 214, 216, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 257, 323, 324, 325, 328, 329, 330, 331, 340, 344, 345, 364, 365, 366, 399, 405, 406, 407, 414, 415, 416, 417, 419, 421, 426, 427, 428, 430, 432, 433, 434, 435, 436, 437, 439, 448, 457, 458, 459, 460, 461, 465, 467, 468, 469, 472, 476, 481, 482, 486, 491, 495, 527, 528, 549, 550, 580, 581, 583, 592, 601]
🔵 com/starrocks/connector/iceberg/io/IOUtil.java 2 20 10.00% [100, 104, 108, 113, 147, 150, 151, 152, 153, 154, 158, 159, 161, 162, 166, 167, 169, 170]
🔵 com/starrocks/common/Config.java 1 1 100.00% []

@Youngwb Youngwb enabled auto-merge (squash) March 10, 2023 07:41
@wanpengfei-git wanpengfei-git added the Approved Ready to merge label Mar 10, 2023
@wanpengfei-git
Copy link
Collaborator

run starrocks_admit_test

@Youngwb Youngwb merged commit 4a3be4c into StarRocks:main Mar 10, 2023
@github-actions github-actions bot removed the Approved Ready to merge label Mar 10, 2023
@wanpengfei-git
Copy link
Collaborator

@Mergifyio backport branch-3.0

@github-actions github-actions bot removed the be-build label Mar 10, 2023
@mergify
Copy link
Contributor

mergify bot commented Mar 10, 2023

backport branch-3.0

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Mar 10, 2023
… for disk cache (#18892)

Signed-off-by: zombee0 <[email protected]>
(cherry picked from commit 4a3be4c)
wanpengfei-git pushed a commit that referenced this pull request Mar 10, 2023
… for disk cache (#18892)

Signed-off-by: zombee0 <[email protected]>
(cherry picked from commit 4a3be4c)
Jay-ju pushed a commit to Jay-ju/starrocks that referenced this pull request Mar 19, 2023
abc982627271 pushed a commit to abc982627271/starrocks that referenced this pull request Mar 28, 2023
numbernumberone pushed a commit to numbernumberone/starrocks that referenced this pull request May 31, 2023
abc982627271 pushed a commit to abc982627271/starrocks that referenced this pull request Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants