Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new
board_gcs()
for Google Cloud Storage #695Add new
board_gcs()
for Google Cloud Storage #695Changes from 7 commits
9cf74ba
a87bd7d
f8fe5ba
70b0866
835952e
6a509c4
76973ae
a919975
8afce68
3677308
a89c865
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up taking the approach outlined in the gargle vignette "Managing tokens securely", and it seems to be working great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution for local testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I discussed earlier in this PR, no
pin_list()
for the time being because we don't have good tooling for handling the flat namespace in a bucket. This is different from AWS and Azure. We can try coming back to this later if it is a high priority issue for folks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to enable it this would need to download a list of all objects, then parse through it locally for those ending with /. Its doable, but could get slow if a very large (10k+) number of objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see how we handle that for AWS and Azure. Both of those rely on underlying functionality already available in paws.storage and AzureStor (ways to identify directory-like structure, common prefixes, etc). I don't think I want to move forward with writing that parsing code here in pins for now; maybe we can collaborate in the future on adding features like this to googleCloudStorageR if it is a high priority for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use
gcs_list_objects()
plus a prefix? Because it returns all objects "recursively"?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right:
Created on 2023-01-12 with reprex v2.0.2
In the packages for AWS and Azure, there is existing support for identifying directory-like structures or to find "common prefixes" but googleCloudStorageR doesn't have that as of today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I did to deal with the flat namespace situation; suggestions welcome for a nicer option!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the concept of object versioning in GCS, does this match what pin_versions does? https://cloud.google.com/storage/docs/object-versioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, one of the ideas of pins is to provide versioning, even if the data practitioner is not allowed to turn on GCS versioning because of, say, the bucket retention policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably not needed unless when
nrow()
is a 0, there's noname
column.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that is the situation:
Created on 2023-01-12 with reprex v2.0.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one function doesn't seem to respect
googleAuthR.verbose
so for now we'll usesuppressMessages()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess its the use of the cli:: message bar - yes please open an issue as should be a quick fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just opened here: cloudyr/googleCloudStorageR#170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this via
gcs_get_object("obj_name", meta=TRUE)
and thentryCatch(http_404 = FALSE)
- will be much quicker and direct only inspecting HTTP headers and not content. Custom error types are supported in the HTTP responses to help this workflow. Listing objects could be slow for large amount of pinsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example here for a pull request I did for targets a while ago https://github.com/MarkEdmondson1234/targets/blob/8204e4268553a2680f5d5ff3b0fc006b0c40d45a/R/utils_gcp.R#L7-22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you're saying! Hmmm...
It's possible this function could have a better name, but it's not always looking for a specific object; sometimes it is looking for a directory-like thing. Given the information at hand when calling, for example,
pin_exists()
, I think we do need to list the objects in the bucket rather than get a specific object.