Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return PutResult with an ETag from ObjectStore::put (#4934) #4944

Merged
merged 5 commits into from
Oct 19, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #4934

Rationale for this change

Returning the ETag on ObjectStore::put is important for write-through caching

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label Oct 17, 2023
@tustvold tustvold added the api-change Changes to the arrow API label Oct 17, 2023
@alamb alamb changed the title Return ETag from ObjectStore::put (#4934) Return PutResult with an ETag from ObjectStore::put (#4934) Oct 17, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @tustvold

@@ -243,12 +249,14 @@ impl S3Client {
}

/// Make an S3 PUT request <https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html>
///
/// Returns the ETag
pub async fn put_request<T: Serialize + ?Sized + Sync>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this structure (S3Client) is not public, so this is not a breaking API change in case anyone else was wondering

let part = (part_idx + 1).to_string();

let response = self
let content_id = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this is called content_id here and etag elsewhere? I just found the difference in terminology confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because PartId is used across multiple stores, and for some of them this isn't an etag but a part number

/// Result for a put request
#[derive(Debug)]
pub struct PutResult {
/// The unique identifier for the object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could also include a link to what an ETAG is for those who are not familiar

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag perhaps

@@ -850,6 +850,13 @@ impl GetResult {
}
}

/// Result for a put request
#[derive(Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always find it annoying when these traits are not derived for simple objects.

Suggested change
#[derive(Debug)]
#[derive(Debug, Hash, Clone, PartialOrd)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Clone, but I'm not sure about Hash and PartialOrd here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clone is probably good enough to start with

Comment on lines +129 to +131
Ok(PutResult {
e_tag: Some(etag.to_string()),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you made PutResult have non pub fields, maybe this could be a builder style API like this

OK(PutResult::new().with_etag(etag))

@@ -850,6 +850,13 @@ impl GetResult {
}
}

/// Result for a put request
#[derive(Debug)]
pub struct PutResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we should makd this a non pub field and add an accessor / into_etag() function instead to allow flexibility in adding new fields later without making a breaking API change

However, the existing GetResult https://docs.rs/object_store/latest/object_store/struct.GetResult.html has pub fields so this just follows the same pattern

.put_request(location, Some(bytes), false, &())
.await?;
Ok(())
let e_tag = Some(get_etag(response.headers()).context(MetadataSnafu)?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that these services (and their emulators, etc) all return an ETag? I wonder if this should be an error, or if the etag isn't present in the response, this should return Ok(PutResult{ etag: None}) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual object stores guarantee it is present, the same is technically true of WebDav, but given HTTPStore gets used for more than that I opted to not make it so strict

@tustvold tustvold merged commit 4cca029 into apache:master Oct 19, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return ETag and Version on Put
2 participants