Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): return file metadata on read #11212

Merged
merged 8 commits into from
Dec 19, 2024

Conversation

jshore1296
Copy link
Contributor

implements #11211

This isn't in a mergeable state quite yet, but I wanted to start the discussion. This is only supported by the XML and gRPC apis. I'm unsure of how to handle the JSON api. Should I just document that the JSON api does not support metadata on read? Or is there a more elegant way to block the user from trying to read it and being surprised by it being empty?

@jshore1296 jshore1296 requested review from a team as code owners December 3, 2024 17:48
@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Dec 3, 2024
@tritone
Copy link
Contributor

tritone commented Dec 4, 2024

implements #11211

This isn't in a mergeable state quite yet, but I wanted to start the discussion. This is only supported by the XML and gRPC apis. I'm unsure of how to handle the JSON api. Should I just document that the JSON api does not support metadata on read? Or is there a more elegant way to block the user from trying to read it and being surprised by it being empty?

I'd say we can just document that it is unavailable via the JSON API. There are other fields in ReaderObjectAttrs which are only populated sometimes.

I need to look at the rest of the code more deeply; just let me know when it's ready for review.

@jshore1296
Copy link
Contributor Author

Alright, this is ready for review.

@jshore1296
Copy link
Contributor Author

@tritone sending a ping in case the previous comment didn't notify you

Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments; overall looks good. Thanks for adding tests!

@@ -795,6 +795,62 @@ func TestOpenReaderEmulated(t *testing.T) {
})
}

func TestOpenReaderEmulated_Metadata(t *testing.T) {
transportClientTest(skipHTTP("metadata on read not supported by JSON api"), t, func(t *testing.T, ctx context.Context, project, bucket string, client storageClient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the HTTP client should use XML by default, does the test not pass with a default HTTP client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right you are.

t.Fatalf("closing object: %v", err)
}
if _, err := veneerClient.Bucket(bucket).Object(want.Name).Update(ctx, ObjectAttrsToUpdate{
Metadata: map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do at least 2 keys in here just to make sure the decoding logic works correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1133,6 +1133,7 @@ func (c *grpcStorageClient) NewRangeReader(ctx context.Context, params *newRange
CacheControl: obj.GetCacheControl(),
LastModified: obj.GetUpdateTime().AsTime(),
Metageneration: obj.GetMetageneration(),
Metadata: obj.GetMetadata(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned this wouldn't be sufficient because we didn't have it in the unmarshaler for gRPC, but it looks like we did that part already so this should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was surprised that was already in there!

@@ -204,10 +205,10 @@ func initIntegrationTest() func() error {
if err != nil {
log.Fatalf("NewStorageControlClient: %v", err)
}
if err := client.Bucket(bucketName).Create(ctx, testutil.ProjID(), nil); err != nil {
if err := client.Bucket(bucketName).Create(ctx, testutil.ProjID(), &BucketAttrs{SoftDeletePolicy: &SoftDeletePolicy{RetentionDuration: 0}}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should maybe be a separate change in our test harness? Was it impossible to delete the bucket if this wasn't included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this is leftover. My org has a policy set that made this necessary for me to be able to run the tests. I'll remove this before merge.

@@ -5041,7 +5042,58 @@ func TestIntegration_ReaderAttrs(t *testing.T) {
Metageneration: attrs.Metageneration,
CRC32C: crc32c(c),
}
if got != want {
if !reflect.DeepEqual(got, want) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use https://pkg.go.dev/github.com/google/go-cmp/cmp#Diff as elsewhere in these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if err := writeObject(ctx, o, defaultType, c); err != nil {
t.Errorf("Write for %v failed with %v", o.ObjectName(), err)
}
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use t.Cleanup to ensure this is run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}()

oa, err := o.Update(ctx, ObjectAttrsToUpdate{Metadata: map[string]string{"Custom-Key": "custom-value"}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's use at least 2 keys here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Metageneration: attrs.Metageneration,
CRC32C: crc32c(c),
}
if !reflect.DeepEqual(got, want) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cmp.Diff and let's only check the Metadata field in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -59,6 +59,10 @@ type ReaderObjectAttrs struct {
// Generation is the generation number of the object's content.
Generation int64

// Metadata represents user-provided metadata, in key/value pairs.
// Not supported by the JSON api. Use ObjectHandle.Attrs instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase this as follows:

// Metadata represents user-provided metadata, in key/value pairs.
//
// It can be nil if no metadata is present, or if the client uses the JSON
// API for downloads. Only the XML and gRPC APIs support getting
// custom metadata via the Reader; for JSON make a separate call to 
// ObjectHandle.Attrs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -586,6 +587,9 @@ func TestRetryConformance(t *testing.T) {
if host == "" {
t.Skip("This test must use the testbench emulator; set STORAGE_EMULATOR_HOST to run.")
}
if runtime.GOOS == "darwin" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this from the PR; seems non-relevant to this particular issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 there was handling for this in the test script so I moved it in here, good to remove though.

Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more minor nit; otherwise looks good. Thanks for your contribution!

expectedMetadata := map[string]string{
"Custom-Metadata-Key": "custom-metadata-value",
}
if !reflect.DeepEqual(rd.Attrs.Metadata, expectedMetadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more place to switch to cmp.Diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old habits die hard, haha.

@tritone tritone changed the title storage: return file metadata on read feat(storage): return file metadata on read Dec 16, 2024
@tritone
Copy link
Contributor

tritone commented Dec 19, 2024

Okay so looks like there are some presubmit issues:

  1. Check for formatting; you can fix this with gofmt.
  2. Technically this is a breaking change to Reader because, with an exported map field, ReaderObjectAttrs becomes a non-comparable type.

(2) is kind of an annoying check (who is comparing storage.Reader using comparison operators?) but I think we should fix it. Perhaps we can make a GetMetadata() method on Reader rather than making it a field of ReaderObjectAttrs? Or maybe we could change the Metadata field to a pointer to a map; I think pointers are good to go.

Sorry for not flagging that sooner and LMK if that is sufficient clarity on what needs to change...

Adding a map field to Attrs would have made the object uncomparable
@jshore1296
Copy link
Contributor Author

makes sense @tritone . Pushed the updates, hopefully I caught all the formatting stuff.

@tritone tritone added the kokoro:run Add this label to force Kokoro to re-run the tests. label Dec 19, 2024
@kokoro-team kokoro-team removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Dec 19, 2024
@@ -222,7 +222,9 @@ var emptyBody = ioutil.NopCloser(strings.NewReader(""))
// the stored CRC, returning an error from Read if there is a mismatch. This integrity check
// is skipped if transcoding occurs. See https://cloud.google.com/storage/docs/transcoding.
type Reader struct {
Attrs ReaderObjectAttrs
Attrs ReaderObjectAttrs
objectMetadata map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like even a non-exported map here makes the comparable check unhappy.

Let's make this a pointer, and you can keep the method to return the map.

to preserve API compatibility - keeps the Reader struct comparable
@tritone tritone added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 19, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 19, 2024
@tritone tritone merged commit d49263b into googleapis:main Dec 19, 2024
7 checks passed
@jshore1296 jshore1296 deleted the metadata-on-read branch December 19, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants