Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OBS-463: Use custom time as modification time for GCS objects if set. #3112

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

smarnach
Copy link
Contributor

@smarnach smarnach requested a review from a team as a code owner January 30, 2025 14:51
@@ -126,7 +126,7 @@ def get_object_metadata(self, key: str) -> Optional[ObjectMetadata]:
content_encoding=blob.content_encoding,
original_content_length=original_content_length,
original_md5_sum=gcs_metadata.get("original_md5_hash"),
last_modified=blob.updated,
last_modified=blob.custom_time or blob.updated,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this. For files that we migrated from S3 to GCS, custom_time will be set and will be the same value type as updated (e.g. seconds since epoch, Python datetime in UTC, etc). Is that right? What is that value type?

For files that were not migrated from S3, what does custom_time get set to? Does it exist as an attribute of blob object?

Copy link
Member

@relud relud Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrated files will have a custom time that matches the s3 upload time and an upload time in the last week, and yes they will have the same type, though i don't know what that is.

when custom time isn't present the attribute will be None, which is false-y.

Copy link
Contributor Author

@smarnach smarnach Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of both blob.custom_time and blob.updated is datetime.datetime | None. The latter will always be set in this case, since it can only be None for objects that haven't been uploaded yet.

For objects directly uploaded to GCS, blob.updated is the time the upload finished, and blob.custom_time is None.

For objects migrated from S3, blob.updated is the time the object was migrated, and blob.custom_time is the original upload time.

In both cases, blob.custom_time or blob.updated evaluates to the original upload time, which is what we want.

I added type annotations to ObjectMetadata, and the GCS Python client also has type hints, so VS Code shows me type information for everything while typing. I also tested this by manually creating a GCS client and retrieving blobs with an without custom time metadata.

@smarnach smarnach added this pull request to the merge queue Jan 31, 2025
Merged via the queue into main with commit c4e49f7 Jan 31, 2025
1 check passed
@smarnach smarnach deleted the use-gcs-custom-time branch January 31, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants