-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: store learns to delete prefixes when overwriting/creating hierarchy nodes #2430
Feature: store learns to delete prefixes when overwriting/creating hierarchy nodes #2430
Conversation
overwriting nodes - add Store.delete_dir and Store.delete_prefix - update array and group creation methods to call delete_dir - change list_prefix to return absolue keys
…into feature/store_erase_prefix
…python into feature/store_erase_prefix
@@ -371,6 +371,36 @@ def list_dir(self, prefix: str) -> AsyncGenerator[str, None]: | |||
""" | |||
... | |||
|
|||
async def delete_dir(self, prefix: str, recursive: bool = True) -> None: | |||
""" | |||
Remove all keys and prefixes in the store that begin with a given prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make life much easier for Store
implementers if we declare that prefix
will always be the path of a group or a array. Otherwise, all implementations need to deal with unnecessary edge cases like delete_dir("path/to/array/c/0")
or worse.
Icechunk is an example of a store that could very cleanly and efficiently implement the case for a group/array prefix, but it would struggle to handle other prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for recursive = False
? If zarr itself doesn't have one, I'd remove the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see a use case for delete_dir("path/to/array/c/")
(remove all chunks from an array) so I think we should allow that -- even if it makes life a little harder in icechunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's totally fine. Let's document the three possible prefix cases, and say that stores are allowed to do undefined behavior for other prefixes
if not self.supports_listing: | ||
raise NotImplementedError | ||
self._check_writable() | ||
async for key in self.list_prefix(prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document what's the expected return value of list_prefix
? Absolute vs. relative, final "/" or not, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done in the list_prefix docstring
…into feature/store_erase_prefix
Looks good overall. I've been struggling a bit with whether Stores should be a bit more aware about the context the operation is occurring in (e.g. deleting a group vs a specific chunk of an array; I think this relates to the conversation at #2430 (comment)). But I think this is a good improvement on its own. |
@TomAugspurger / @d-v-b / @paraseba - any final comments here. I'd like to get this in ahead of the larger store mode refactor in #2442 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. regarding the change to list_prefix
, I will note that the definition of "absolute paths" is a bit shaky across different stores, since RemoteStore
will return keys relative to its path
attribute, and similarly LocalStore
with its root
attribute.
Would defining these as "relative to the root of the store" be clearer? |
pre-commit.ci autofix |
This PR implements
delete_dir
teaches the Group and Array classes to use it when overwriting nodes in a hierarchy.Also included here:
-delete_prefix
- not used but perhaps we should just use it instead ofdelete_dir
list_prefix
to return absolute paths instead of strippingprefix
. Given thatlist_prefix
was unused before, this wasn't very disruptive but calling it out so others can help decide if this is what we want.Still needs:
delete_dir
delete_dir
closes #2191
first steps toward #2108 and #2359
TODO: