-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Filter newer" #19
Comments
Hello. |
Thanks for the quick reply. Does |
It's applying to the source file timestamp. |
OK, I see what you mean. My use case is slightly different: if I Ctrl-C in the middle of a sync, I'd like s3sync to pick up from where it left when I restart it and not reupload files that it uploaded on the previous run. In this case there are no "new" files in the source so I can't use |
Besides, mtime, that is "last modified" is not the same as the creation time. If I copy a file, its last-modified time gets copied with it, so I can have a file that is older than the timestamp of the last sync, but was created (copied into the source folder) after the sync. Please correct me if I'm wrong on this account. |
I understand, In this case only Your filter is something like this:
S3 does not have creation time (or I could not find it). No, for copied file last-modified will set to current time, not for original file mtime. And you cannot set custom mtime, it is setted by S3 server automatically. |
I may have been unclear. My use case is syncing a local folder to S3. I can sync a local folder to S3 and then add files to it that will have an mtime that is earlier than the time of sync, because mtime also gets copied when you copy files. In this situation I won't be able to use When I used Looking at the the new filter, it seems to be exactly what I need. When can I test it? :) |
Looking at the filter code again, it still does an extra operation Ideally, if I sync a local folder to S3 that already have the exact same files in them, then the only S3 operation that s3sync should do is a ListObjectsV2 to check for existence and freshness. This will get you the maximum throughput when doing incremental syncs. |
ListObjectsV2 is not a universal solution. Destination can have millions of files in one directory, so we should list them all and save to memory/local cache to get mtime for files (in worst case for 1 file). Please keep in mind, that the tool designed to sync very large buckets with millions of files and this is a key requirement. |
I agree, but it covers the most important case, I think - when you use syncing to keep two folders in sync - in that case you'd have a billion objects both at the source and at the destination and then ListObjectsV2 is faster than checking each object separately. I haven't thought about the cache size needed to store all the metadata. |
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysUsingAPIs.html
Which means that the object list returned by ListObjectsV2 is always sorted. You don't need to cache the folder list. You can just list both the source and destination in parallel, and compare the entries in order. This is how the AWS CLI makes the sync efficient. |
I mean this case:
In this case we should list millions of files in destination for uploading small count of files. Because of this with AWS sync and such issues. Additionally, this synchronization algorithm fits very poorly on the architecture of the current application. |
Nothing to do about this one now, is there? :) |
With algorithm with ListObjectsV2 - yes. |
any progress on this? we are in need to sync a large (ATM 300 mil keys) bucket for local backup with incremental (e.g. daily) invocations to assure we have up to date backup locally.
|
It would be great if there were an option to not sync files that exist at the destination and are newer than the source file. I think this is even the default behavior of
aws s3 sync
. This would really help with incremental syncing.--filter-modified
seems to actually do double work, instead of saving work in the case of incremental syncing.The text was updated successfully, but these errors were encountered: