Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate ETags for InMemory and LocalFileSystem (#4879) #4922

Merged
merged 4 commits into from
Oct 17, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 114 additions & 19 deletions object_store/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -718,25 +718,41 @@ pub struct GetOptions {

impl GetOptions {
/// Returns an error if the modification conditions on this request are not satisfied
fn check_modified(
&self,
location: &Path,
last_modified: DateTime<Utc>,
) -> Result<()> {
if let Some(date) = self.if_modified_since {
if last_modified <= date {
return Err(Error::NotModified {
path: location.to_string(),
source: format!("{} >= {}", date, last_modified).into(),
///
/// <https://datatracker.ietf.org/doc/html/rfc7232#section-6>
fn test(&self, meta: &ObjectMeta) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn test(&self, meta: &ObjectMeta) -> Result<()> {
fn check_modification_conditions(&self, meta: &ObjectMeta) -> Result<()> {

"test" is rather non-descriptive and is easily confused w/ unit tests.

// The use of the invalid etag "*" means no ETag is equivalent to never matching
let etag = meta.e_tag.as_deref().unwrap_or("*");
let last_modified = meta.last_modified;

if let Some(m) = &self.if_match {
if m != "*" && m.split(',').map(str::trim).all(|x| x != etag) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add an explicit mention of * handling as well as support for the , handling to the docstrings (so someone can see it on docs.rs)?

return Err(Error::Precondition {
path: meta.location.to_string(),
source: format!("{etag} does not match {m}").into(),
});
}
}

if let Some(date) = self.if_unmodified_since {
} else if let Some(date) = self.if_unmodified_since {
if last_modified > date {
return Err(Error::Precondition {
path: location.to_string(),
source: format!("{} < {}", date, last_modified).into(),
path: meta.location.to_string(),
source: format!("{date} < {last_modified}").into(),
});
}
}

if let Some(m) = &self.if_none_match {
if m == "*" || m.split(',').map(str::trim).any(|x| x == etag) {
return Err(Error::NotModified {
path: meta.location.to_string(),
source: format!("{etag} matches {m}").into(),
});
}
} else if let Some(date) = self.if_modified_since {
if last_modified <= date {
return Err(Error::NotModified {
path: meta.location.to_string(),
source: format!("{date} >= {last_modified}").into(),
});
}
}
Expand Down Expand Up @@ -940,6 +956,7 @@ mod test_util {
mod tests {
use super::*;
use crate::test_util::flatten_list_stream;
use chrono::TimeZone;
use rand::{thread_rng, Rng};
use tokio::io::AsyncWriteExt;

Expand Down Expand Up @@ -1685,8 +1702,86 @@ mod tests {
assert!(stream.next().await.is_none());
}

// Tests TODO:
// GET nonexisting location (in_memory/file)
// DELETE nonexisting location
// PUT overwriting
#[test]
fn test_get_options() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a test of check_preconditions isn't it?

Suggested change
fn test_get_options() {
fn test_pre_conditions() {

let mut meta = ObjectMeta {
location: Path::from("test"),
last_modified: Utc.timestamp_nanos(100),
size: 100,
e_tag: Some("123".to_string()),
};

let mut options = GetOptions::default();
options.test(&meta).unwrap();

options.if_modified_since = Some(Utc.timestamp_nanos(50));
options.test(&meta).unwrap();

options.if_modified_since = Some(Utc.timestamp_nanos(100));
options.test(&meta).unwrap_err();

options.if_modified_since = Some(Utc.timestamp_nanos(101));
options.test(&meta).unwrap_err();

options = GetOptions::default();

options.if_unmodified_since = Some(Utc.timestamp_nanos(50));
options.test(&meta).unwrap_err();

options.if_unmodified_since = Some(Utc.timestamp_nanos(100));
options.test(&meta).unwrap();

options.if_unmodified_since = Some(Utc.timestamp_nanos(101));
options.test(&meta).unwrap();

options = GetOptions::default();

options.if_match = Some("123".to_string());
options.test(&meta).unwrap();

options.if_match = Some("123,354".to_string());
options.test(&meta).unwrap();

options.if_match = Some("354, 123,".to_string());
options.test(&meta).unwrap();

options.if_match = Some("354".to_string());
options.test(&meta).unwrap_err();

options.if_match = Some("*".to_string());
options.test(&meta).unwrap();

// If-Match takes precedence
options.if_unmodified_since = Some(Utc.timestamp_nanos(200));
options.test(&meta).unwrap();

options = GetOptions::default();

options.if_none_match = Some("123".to_string());
options.test(&meta).unwrap_err();

options.if_none_match = Some("*".to_string());
options.test(&meta).unwrap_err();

options.if_none_match = Some("1232".to_string());
options.test(&meta).unwrap();

options.if_none_match = Some("23, 123".to_string());
options.test(&meta).unwrap_err();

// If-None-Match takes precedence
options.if_modified_since = Some(Utc.timestamp_nanos(10));
options.test(&meta).unwrap_err();

// Check missing ETag
meta.e_tag = None;
options = GetOptions::default();

options.if_none_match = Some("*".to_string()); // Fails if any file exists
options.test(&meta).unwrap_err();

options = GetOptions::default();
options.if_match = Some("*".to_string()); // Passes if file exists
options.test(&meta).unwrap();
}
}
34 changes: 20 additions & 14 deletions object_store/src/local.rs
Original file line number Diff line number Diff line change
Expand Up @@ -365,23 +365,12 @@ impl ObjectStore for LocalFileSystem {
}

async fn get_opts(&self, location: &Path, options: GetOptions) -> Result<GetResult> {
if options.if_match.is_some() || options.if_none_match.is_some() {
return Err(super::Error::NotSupported {
source: "ETags not supported by LocalFileSystem".to_string().into(),
});
}

let location = location.clone();
let path = self.config.path_to_filesystem(&location)?;
maybe_spawn_blocking(move || {
let (file, metadata) = open_file(&path)?;
if options.if_unmodified_since.is_some()
|| options.if_modified_since.is_some()
{
options.check_modified(&location, last_modified(&metadata))?;
}

let meta = convert_metadata(metadata, location)?;
options.test(&meta)?;

Ok(GetResult {
payload: GetResultPayload::File(file, path),
Expand Down Expand Up @@ -994,7 +983,7 @@ fn convert_entry(entry: DirEntry, location: Path) -> Result<ObjectMeta> {
convert_metadata(metadata, location)
}

fn last_modified(metadata: &std::fs::Metadata) -> DateTime<Utc> {
fn last_modified(metadata: &Metadata) -> DateTime<Utc> {
metadata
.modified()
.expect("Modified file time should be supported on this platform")
Expand All @@ -1006,15 +995,32 @@ fn convert_metadata(metadata: Metadata, location: Path) -> Result<ObjectMeta> {
let size = usize::try_from(metadata.len()).context(FileSizeOverflowedUsizeSnafu {
path: location.as_ref(),
})?;
let inode = get_inode(&metadata);
let mtime = last_modified.timestamp_micros();

// Use an ETag scheme based on that used by many popular HTTP servers
// <https://httpd.apache.org/docs/2.2/mod/core.html#fileetag>
// <https://stackoverflow.com/questions/47512043/how-etags-are-generated-and-configured>
let etag = format!("{inode:x}-{mtime:x}-{size:x}");

Ok(ObjectMeta {
location,
last_modified,
size,
e_tag: None,
e_tag: Some(etag),
})
}

#[cfg(unix)]
fn get_inode(metadata: &Metadata) -> u64 {
std::os::unix::fs::MetadataExt::ino(metadata)
}

#[cfg(not(unix))]
fn get_inode(metadata: &Metadata) -> u64 {
0
Copy link
Contributor Author

@tustvold tustvold Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inode isn't strictly necessary, it is still a pretty strong ETag without it, but on Unix systems we might as well make use of it.

A happy consequence of the way most operations create a new file and then link it into place, is that the inode is actually a pretty good change detector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. If the inode isn't needed, then what value does including it add?

Either modification time and size (along with path) uniquely determines the result, or it isn't...

Perhaps the inode can detect file modifications that do not change the modification timestamp (some sort of symlink shenanigans, perhaps 🤔 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any event I think this "inode is not necessary" context should be encoded as comments in the source code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have clarified the comment that inode is used to be more resistent to collisions -- sounds good to me

}

/// Convert walkdir results and converts not-found errors into `None`.
/// Convert broken symlinks to `None`.
fn convert_walkdir_result(
Expand Down
Loading