-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to start listing at a particular key #3970
Comments
Seems like a useful feature, I'll have a think about what an API for this could look like. I wonder if we should just add a What is support for this like in other object stores, I presume they support it, but I've learnt never to assume anything when it comes to object stores 😅 |
It looks like support isn't that wide:
So this is mostly providing a useful optimization for S3 and GCS. There can be a default implementation that just throws out earlier entries. Also, for consistency between S3 and GCS, we would have to make the lower bound exclusive, since that seems to be the S3 behavior. |
Speaking selfishly supporting S3-based optimizations goes a long way given its dominance in the market. Most data workloads I see are on AWS or GCP, so that's great you found a compatible API in GCS @wjones127 |
I've created #3973 if we like the interface I can flesh it out |
The Azure Data Lake Storage Gen2 REST API has endpoints for filesystem list:
path list:
|
# Description Adds the `list_with_offset` delegation method to `DeltaObjectStore`. # Related Issue(s) - closes #1252 # Documentation apache/arrow-rs#3970 Signed-off-by: Shingo OKAWA <[email protected]>
# Description Adds the `list_with_offset` delegation method to `DeltaObjectStore`. # Related Issue(s) - closes delta-io#1252 # Documentation apache/arrow-rs#3970 Signed-off-by: Shingo OKAWA <[email protected]>
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In an object store, we might have a bunch of sequential files being written:
We'd like to be able to query for all the "new" files starting at a certain point, skipping all the earlier files.
S3 has a
start-after
parameter we can use for this. https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html#API_ListObjectsV2_RequestParametersTDB on other systems.
Describe the solution you'd like
Not sure the best way to add the parameter. Does it belong in a new method? Should we introduce a more complex "ListCallBuilder" API?
Describe alternatives you've considered
Not sure if there is an easier way to do it.
Additional context
The text was updated successfully, but these errors were encountered: