-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Support committing delete files with multiple specs #2985
Conversation
protected PartitionSpec writeSpec() { | ||
Preconditions.checkState(spec != null, | ||
"Cannot determine partition spec: no data or delete files have been added"); | ||
protected PartitionSpec dataSpec() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming this does require touching more places but I think keeping it writeSpec
will be confusing.
PartitionSpec fileSpec = ops.current().spec(file.specId()); | ||
List<DeleteFile> deleteFiles = newDeleteFiles.computeIfAbsent(file.specId(), specId -> Lists.newArrayList()); | ||
deleteFiles.add(file); | ||
addedFilesSummary.addedFile(fileSpec, file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file spec is only used for partition summaries. I added a test that shows it works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look and change looks good to me
addedFilesSummary.addedFile(writeSpec(), file); | ||
Preconditions.checkNotNull(file, "Invalid delete file: null"); | ||
PartitionSpec fileSpec = ops.current().spec(file.specId()); | ||
List<DeleteFile> deleteFiles = newDeleteFiles.computeIfAbsent(file.specId(), specId -> Lists.newArrayList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but for me this would be easier to understand if it was deleteFilesForSpec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll update that. You refer to the map name, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the map name.
} | ||
this.cachedNewDeleteManifests.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the explicit clear here? Are we just trying to free it up for GC early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems a little different than it was previously?
Before
if committed doesn't contain cachedNewDeleteManifest
deleteFile()
clear cachedNewDeleteManifest
for any cachedNewDeleteManifest
if commited doesn't contain cachedNewDeleteManifest
deleteFile
clear all cachedDeleteManifests
I'm still trying to understand the check here but it seems like we will clear out all manifests even if some of them are committed?
Seems like the equivalent would be something like
for (ManifestFile cachedNewDeleteManifest : cachedNewDeleteManifests) {
if (!committed.contains(cachedNewDeleteManifest)) {
deleteFile(cachedNewDeleteManifest.path());
this.cachedNewDeleteManifests.remove(cachedNewDeleteManifests) // Although this would be modifying the list as we iterated through it but you get the idea
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I'll update this place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use LinkedList
and listIterator
.
44fdaad
to
b199536
Compare
This one is ready for another review round. |
for (ManifestFile cachedNewDeleteManifest : cachedNewDeleteManifests) { | ||
deleteFile(cachedNewDeleteManifest.path()); | ||
} | ||
cachedNewDeleteManifests.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this will rewrite all delete manifests even if there is only one new delete file. I think it's fine to simplify it right now since we don't expect this case very often. But it would be good to note that this is something we can improve in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. I think this will be rare enough in real world so should be fine to optimize later.
@aokolnychyi, this looks good to me. I had a couple of minor comments, but merge when you're ready. |
Thanks for reviewing, @szehon-ho @rdblue @RussellSpitzer! |
This PR enables committing delete files that belong to different specs in a single operation. Previously, we only supported row deltas where all delete and data files were part of the same spec.