-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Ensure data read order reflects commit sequence in Iceberg tables #6341
feat: Ensure data read order reflects commit sequence in Iceberg tables #6341
Conversation
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small thing left.
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
return uri.compareTo(otherTyped.uri); | ||
} | ||
// When comparing with non-iceberg location key, we want to compare both partitions and URI | ||
return super.compareTo(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good first step, but I'm realizing we have some trouble here.
We may have neglected this, but TLKs really must be comparable to all TLKs, not just TLKs of the same provenance.
We need fallback comparisons.
We have two paths forward:
- Keep behavior as-is, and defer to a new issue: Go back to throwing an exception on this line for now. Create an issue to define an ordering.
- Fix the problem now.
My proposal for fixing, whether we do it here or in a new issue:
- Make unrelated implementations compare to one another based on fully-qualified class name. This imposes an arbitrary but consistent ordering, and should roughly keep distinct sources together.
- Make PartitionedTLK use order + partitions, followed by classname for tie-breaking.
- Make URITLK order + partitions, then break ties by URI (if the other is a URITLK), and lastly break ties by classname.
- Make IcebergTLK compare to other IcbergTLKs by Catalog name and TableIdentifier (which should also be added to equals and hashcode), then as currently defined in this PR, then to other URITLKs as defined by the URITLK rules, and so on.
Might want opinions from @abaranec and @devinrsmith
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
...-utils/src/main/java/io/deephaven/engine/testutil/locations/TableBackedTableLocationKey.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
return new IcebergTableParquetLocationKey(tableUuid, catalogName, tableIdentifier, manifestFile, dataFile, | ||
fileUri, 0, partitions, parquetInstructions, channelsProvider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sure is a lot of constructor args. Sorry. Builder? (Probably not worth it for something called in only once place.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy if @abaranec and @devinrsmith are.
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergTableAdapter.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergUtils.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergUtils.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/util/IcebergUtils.java
Outdated
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/location/IcebergTableParquetLocationKey.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Before this change:
Data files were read in the order they appeared in the snapshot, which sometimes led to new data appearing at the top of the table instead of at the bottom.
This PR:
TableLocationKey
implementations using fully qualified class names as a falback.