-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All stores created using DeltaObjectStore::new
have an identical object_store_url
#1188
Comments
Thanks for reporting! While the generated URLs are right now not unique, they should distinguish at least on the table prefix / path within the store. One reason here might be that both stores are rooted directly into the deta table. The urls with no host or any other part would suggest as much. That said, we can probably generate a more unique URl form the table state itself... Eventually we wnaat to support absolute file path in the log as well, but this is so far un-scoped work. I'll dig a bit deeper. |
Hey, thank you for a fast reply!
I think the reason is simply that the What I've had in mind is something along these lines: diff --git a/rust/src/storage/mod.rs b/rust/src/storage/mod.rs
index 360c241..ed90ce6 100644
--- a/rust/src/storage/mod.rs
+++ b/rust/src/storage/mod.rs
@@ -66,15 +66,16 @@ impl std::fmt::Display for DeltaObjectStore {
impl DeltaObjectStore {
/// Create a new instance of [`DeltaObjectStore`]
///
- /// # Arguemnts
+ /// # Arguments
///
/// * `storage` - A shared reference to an [`ObjectStore`](object_store::ObjectStore) with "/" pointing at delta table root (i.e. where `_delta_log` is located).
- /// * `location` - A url corresponding to the storagle location of `storage`.
+ /// * `location` - A url corresponding to the storage location of `storage`.
pub fn new(storage: Arc<DynObjectStore>, location: Url) -> Self {
+ let prefix = Path::from(location.path());
Self {
storage,
location,
- prefix: Path::from("/"),
+ prefix,
options: HashMap::new().into(),
}
} which then produces a unique Object store url 1: delta-rs://var-folders-b7-zvxnq_n96rj09ggrc_c5qts00000gn-T-table_1.06FYeNS7SdpL/
Object store url 2: delta-rs://var-folders-b7-zvxnq_n96rj09ggrc_c5qts00000gn-T-table_2.iWQ5DpnHBPzN/ However, that change reveals another problem, perhaps more appropriate for Discussions or the Slack workspace (which I can't seem to join). Namely it seems to me that the logic inside What it does is simply try to overwrite the object store for the scanned table that was already registered in Arguably, this makes sense in case of multi-node deployments, where the physical plan needs to be serialized and transferred in-between creation and execution. However, even then someone needs to register that Of course I could be missing some important points here, so I'd be glad to learn more. |
In case it is of any help I can also present my confusion with Here are the noteworthy points:
This is a silent problem currently, but once I apply the diff from my previous comment this is no longer the case. Namely, everything is the same up to the points 5, 6 and 7, which now are:
UPDATE: on the off chance that |
# Description Make the object store url be unique for stores created via `DeltaObjectStore::new`, by generating it from the location instead of the prefix (which was previously hard-coded to `/`), in the same manner as for `try_new`. Also, in the (unlikely) case that I'm not mistaken about `DeltaScan::execute` logic being redundant (see #1188 for more details), I've removed it and added a couple of tests. # Related Issue(s) Closes #1188 # Documentation
# Description Make the object store url be unique for stores created via `DeltaObjectStore::new`, by generating it from the location instead of the prefix (which was previously hard-coded to `/`), in the same manner as for `try_new`. Also, in the (unlikely) case that I'm not mistaken about `DeltaScan::execute` logic being redundant (see delta-io#1188 for more details), I've removed it and added a couple of tests. # Related Issue(s) Closes delta-io#1188 # Documentation
Environment
Delta-rs version: 0.7.0 (latest main)
Binding: Rust
Environment:
Bug
What happened:
Using
DeltaObjectStore::new
to create new object stores leads to those stores having an identicalobject_store_url
. Consequently, in situations where multiple Delta tables are being scanned concurrently, they will overwrite each other's object stores in the registry, leading to errors.What you expected to happen:
DeltaObjectStore::new
should parse the prefix from the provided location url, as is done inDeltaObjectStore::try_new
, which will ensure thatobject_store_url
are unique across the different tables.How to reproduce it:
Currently, running
leads to:
More details:
The text was updated successfully, but these errors were encountered: