-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(s3): restore working test for DynamoDb log store repair log on read #2120
fix(s3): restore working test for DynamoDb log store repair log on read #2120
Conversation
@dispanser how about the python S3 round trip test? |
This test isn't related to the changes in this PR. @roeap , is this some known regression related to your changes? |
For this test, first
second
The underlying problem is that we detect the attempt to write to a table that can't be written to in a very late stage of writing. As a result, there's already some files (both parquet files and a temporary commit entry) present, which makes the next incarnation of delta lake think that the table already exists. Both Potential fixes:
I tend to think that it's valuable to avoid accidental overwriting of an existing table, and I do think our existing method to detect the presence of an existing table is too limiting - so in my opinion, option 1 seems most sensible to me. Opionions? @ion-elgreco @rtyler @roeap ? |
Yes, that failure was introduced, when we moved to the arrow backend. Also thanks for digging into it and figuring out why the round-trip test fails. I agree with your assessment, that a better way to see if we have a table is the way to go. The current approach is to check if there is a Maybe we could just try to create a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @dispanser - this brings us much closer to a releasable state again 👍.
) -> DeltaResult<Self> { | ||
debug!( | ||
"try_new_slice: start_version: {}, end_version: {:?}", | ||
start_version, end_version | ||
); | ||
let max_version_2 = log_store.get_latest_version(start_version).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a TODO here o.a. saying why this statement exists and the we want to get rid of it?
For most of our users this should not be too bad, but for some tables this can be very expensive. Particularly if you have streaming scenarios we sometimes see A LOT of versions ... Ultimately I hope that we can converge on some consolidated storage trait, but I hope that we can find out wht the write abstractions will look like in a post-kernel world. |
@dispanser, when looking into this, I though that updating this line
to use |
# Description Right now we have some [issue](#2120 (comment) how we identify if a location is a delta-table. This disables an affected test so we can merge PRs again without having to ignore requires CI runs.
Yeah, that would also work. To avoid this repeated interaction with DynamoDb, it would be good to replace |
@dispanser and i discussed we'll wait and explore an alternative before merging this one ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I interpret @roeap 's last comment to mean that this is no longer approved. Please dismiss this review when it's mergeable so nobody hits the tempting green Merge button 😄
…er/delta-rs into s3-dynamodb-fix-repair-on-update
5f18623
to
73980cf
Compare
…er/delta-rs into s3-dynamodb-fix-repair-on-update
This reverts commit 73980cf.
…ad (delta-io#2120) # Description Make sure the read path for delta table commit entries passes through the log store, enabling it to ensure the invariants and potentially repair a broken commit in the context of S3 / DynamoDb log store implementation. This also adds another test in the context of S3 log store: repairing a log store on load was not implemented previously. Note that this a stopgap and not a complete solution: it comes with a performance penalty as we're triggering a redundant object store list operation just for the purpose of "triggering" the log store functionality. fixes delta-io#2109 --------- Co-authored-by: Ion Koutsouris <[email protected]> Co-authored-by: R. Tyler Croy <[email protected]>
# Description Right now we have some [issue](delta-io#2120 (comment) how we identify if a location is a delta-table. This disables an affected test so we can merge PRs again without having to ignore requires CI runs.
…ad (delta-io#2120) # Description Make sure the read path for delta table commit entries passes through the log store, enabling it to ensure the invariants and potentially repair a broken commit in the context of S3 / DynamoDb log store implementation. This also adds another test in the context of S3 log store: repairing a log store on load was not implemented previously. Note that this a stopgap and not a complete solution: it comes with a performance penalty as we're triggering a redundant object store list operation just for the purpose of "triggering" the log store functionality. fixes delta-io#2109 --------- Co-authored-by: Ion Koutsouris <[email protected]> Co-authored-by: R. Tyler Croy <[email protected]>
Description
Make sure the read path for delta table commit entries passes through the log store, enabling it to ensure the invariants and potentially repair a broken commit in the context of S3 / DynamoDb log store implementation.
This also adds another test in the context of S3 log store: repairing a log store on load was not implemented previously.
Note that this a stopgap and not a complete solution: it comes with a performance penalty as we're triggering a redundant object store list operation just for the purpose of "triggering" the log store functionality.
fixes #2109