-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration of storage layer to delta-rs #307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's on the right track, great work!
Re: the /[dbname]/[schemaname]/[tablename]
vs /[table_uuid]
path prefix schema, I'm inclined towards the latter. It looks like it'd help us avoid a lot of complexity and edge cases with managing object store prefixes and paths. Re: potential downsides:
- We'll have to have the catalog DB anyway to store UDF definitions and future metadata (unless we somehow manage to keep everything in the delta lake as flat files and avoid the catalog DB altogether, which could be an interesting thought experiment)
- The delta spec doesn't seem to concern itself with how the tables are named/arranged (it only specifies what happens inside the table directory), so I don't think many other tools would rely on a specific way the table directory is named/accessed vs. just operating on the contents of that directory
- Depending on the object store implementation, there might be some argument against having a giant root directory with all the tables vs a hierarchy of
db/schema/table
(performance reasons, fewer items in a single directory) - Similarly, we might be able to use some object store specific authorization code / IAMs to implement multitenancy -- this might mean sharding just by db-id/table-id? Not sure.
How do information schema queries work under this proposed model? It looks like we use the catalog DB to satisfy them instead of crawling the object store for all tables and their schemas?
Yes, that is correct, we pre-load the schema information from our catalog. |
This uses some private delta-rs code, as well as some WIP code for native delta-rs DELETE support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing and should let us close #175 + delete a lot of our homegrown table version tracking code! Great work 👍
src/catalog.rs
Outdated
// Build a delta table but don't load it yet; we'll do that only for tables that are | ||
// actually referenced in a statement, via the async `table` method of the schema provider. | ||
|
||
// TODO: if the table has no columns, the result set will be empty, so we use the default UUID (all zeros). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any implications? do we use this UUID to apply some DDL modifications to 0-column tables later down the line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there are—we use this UUID to scope our root object store for a particular table by adding it to the end of the path (or in other words, prefixing all table paths with it, relative to the root store uri).
I'll think about a way to better handle no-column situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that this comment is misleading—there's no way that the vec in build_table
is empty since before we pass it down we group the original collection_columns_vec
by table name (.group_by(|col| &col.table_name)
), which means there was at least one element of the vector to begin with in order for grouping to work (likewise for build_legacy_table
).
Moreover, in case of new tables, we only really need the uuid besides the name, which we can also fetch with the group by call as it's unique per table. In other words the rest of the AllDatabaseColumnsResult
fields don't play a role in instantiating new tables, and I aim to simplify that loading logic after dropping the legacy tables completely.
On the other hand, this means that tables without columns wouldn't be loaded at all in the schema provider's map. However, this problem is averted since delta-rs doesn't allow column-less table creation (and we don't yet support ALTER table DROP column
). Therefore, for now I'll just add a test to lock that in so it doesn't change underneath us.
This is the initial effort of switching our custom MVCC/Lakehouse protocol to use the Delta lake protocol instead.
A couple of general points:
DELETE
&UPDATE
VACUUM
still not implemented for new tablesdropped_table
table and related methods)As for the implementation details:
inner
ininternal_object_store
), and then "scoping" it for each DeltaTable accordingly. This means wrapping it up with some other object stores so that/
points to the table root uri (see below). The other alternative that delta-rs offers is providing the store_options, out of which it will then build the store for a given uri, but I didn't like this mostly because the memory store implementation then doesn't work (it get's re-created from scratch for each create/write/table builder instantiation).<inner_store_root>/<table_uuid>/
(that is the place where delta-rs puts all the table-related stuff as per the protocol). For instances3://my_bucket/e62c26e2-b6e8-414a-a37a-66dd77430b5d/
orfile:///data/dir/e62c26e2-b6e8-414a-a37a-66dd77430b5d/
. Thetable_uuid
is kept in thetable
catalog table and created randomly for each new table.VACUUM
s in chunked batches) because table uri is unique to a given table and independent of db/schema/table name. Likewise, renaming a table is only a catalog operation.