Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OBS-463: Use custom time as modification time for GCS objects if set. #3112

Merged
merged 1 commit into from
Jan 31, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tecken/ext/gcs/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def get_object_metadata(self, key: str) -> Optional[ObjectMetadata]:
content_encoding=blob.content_encoding,
original_content_length=original_content_length,
original_md5_sum=gcs_metadata.get("original_md5_hash"),
last_modified=blob.updated,
last_modified=blob.custom_time or blob.updated,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this. For files that we migrated from S3 to GCS, custom_time will be set and will be the same value type as updated (e.g. seconds since epoch, Python datetime in UTC, etc). Is that right? What is that value type?

For files that were not migrated from S3, what does custom_time get set to? Does it exist as an attribute of blob object?

Copy link
Member

@relud relud Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrated files will have a custom time that matches the s3 upload time and an upload time in the last week, and yes they will have the same type, though i don't know what that is.

when custom time isn't present the attribute will be None, which is false-y.

Copy link
Contributor Author

@smarnach smarnach Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of both blob.custom_time and blob.updated is datetime.datetime | None. The latter will always be set in this case, since it can only be None for objects that haven't been uploaded yet.

For objects directly uploaded to GCS, blob.updated is the time the upload finished, and blob.custom_time is None.

For objects migrated from S3, blob.updated is the time the object was migrated, and blob.custom_time is the original upload time.

In both cases, blob.custom_time or blob.updated evaluates to the original upload time, which is what we want.

I added type annotations to ObjectMetadata, and the GCS Python client also has type hints, so VS Code shows me type information for everything while typing. I also tested this by manually creating a GCS client and retrieving blobs with an without custom time metadata.

)
return metadata

Expand Down