-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add workarounds for servers that don't paginate correctly #167
Conversation
I discovered this problem while trying to use rclone to copy data from an OpenStack Swift into a Ceph RADOSGW. Listings of some RADOSGW containers would terminate after 1999 entries even though there were over 10,000 objects in them, and With this change applied to a local build of rclone I'm now getting full listings and clean syncs with no spurious copies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will do one more API transaction than it needs to on conforming servers won't it?
So for your average directory listing this will now do two transactions rather than one.
I can't see a way round this in the protocol - can you?
It will indeed result in an extra transaction, and unfortunately I can't see another way around it. (The number of objects in a container can be looked up by issuing a HEAD request, but of course that's still an extra request, and racy to boot.) Correcting myself, I found a separate page describing the Swift API's pagination and RADOSGW is clearly in the wrong, at least in terms of that document. However, I also discovered that Swift's own Python client code (which I think we could probably call the reference client) doesn't implement pagination as described above when fetching full listings, rather just fetching new pages until it receives an empty one. (Link is into The code dates from 2012, so perhaps this was implemented before API pagination was nailed down. Either way, |
One thing we could do is make a feature flag for this and only do the new behaviour if the feature flag is set. So make a new flag in the Connection struct and check it to enable the new behaviour. I have a feeling that this has already been reported as a radosgw bug - it would be worth searching their issue tracker to see. I hate the idea of doubling the number of transactions for directory traversals - I can see that being very bad for performance. |
...or we could use some kind of heuristic - if we got more than 90% of the max listing then do an extra transaction just to check. This would probably work quite well but has the potential to go wrong. |
The 90% heuristic should work nicely, based on the lengths of the replies I'm getting from RADOSGW for my problematic buckets:
I was not able to find a matching bug in the Ceph tracker, but I'll start a thread on ceph-users to confirm. And I'll also take a look at implementing the flag and heuristic. |
useful data - thanks Those missing items are probably filtered out items (eg deleted items) or something like that.
If you find something out can you link it here?
:-) |
dfa1f5b
to
1818271
Compare
1818271
to
ac42f5d
Compare
I'm reasonably certain that it's not deleted items, at least -- it's a fairly new cluster, and these buckets have only ever been written to by
Most definitely!
Implemented, and from my testing with hacked rclone builds, the workarounds seem to perform as expected when enabled. |
I forgot to mention I made a draft rclone PR that makes this change much easier to try out: rclone/rclone#5224 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay!
This looks great now - I'll merge it
Thank you :-)
Ceph's Swift API emulation does not fully confirm to the API spec. As a result, it sometimes returns fewer items in a container than the requested limit, which according to the spec should means that there are no more objects left in the container. (Note that python-swiftclient always fetches unless the current page is empty.) This commit adds a pair of new Swift backend settings to handle this. Set `fetch_until_empty_page` to true to always fetch another page of the container listing unless there are no items left. Alternatively, set `partial_page_fetch_threshold` to an integer percentage. In this case rclone will fetch a new page only when the current page is within this percentage of the limit. Swift API reference: https://docs.openstack.org/swift/latest/api/pagination.html PR against ncw/swift with research and discussion: ncw/swift#167
Ceph's Swift API emulation does not fully confirm to the API spec. As a result, it sometimes returns fewer items in a container than the requested limit, which according to the spec should means that there are no more objects left in the container. (Note that python-swiftclient always fetches unless the current page is empty.) This commit adds a pair of new Swift backend settings to handle this. Set `fetch_until_empty_page` to true to always fetch another page of the container listing unless there are no items left. Alternatively, set `partial_page_fetch_threshold` to an integer percentage. In this case rclone will fetch a new page only when the current page is within this percentage of the limit. Swift API reference: https://docs.openstack.org/swift/latest/api/pagination.html PR against ncw/swift with research and discussion: ncw/swift#167
Ceph's Swift API emulation does not fully confirm to the API spec. As a result, it sometimes returns fewer items in a container than the requested limit, which according to the spec should means that there are no more objects left in the container. (Note that python-swiftclient always fetches unless the current page is empty.) This commit adds a pair of new Swift backend settings to handle this. Set `fetch_until_empty_page` to true to always fetch another page of the container listing unless there are no items left. Alternatively, set `partial_page_fetch_threshold` to an integer percentage. In this case rclone will fetch a new page only when the current page is within this percentage of the limit. Swift API reference: https://docs.openstack.org/swift/latest/api/pagination.html PR against ncw/swift with research and discussion: ncw/swift#167 Fixes #7924
Ceph's Swift API emulation does not fully confirm to the API spec. As a result, it sometimes returns fewer items in a container than the requested limit, which according to the spec should means that there are no more objects left in the container. (Note that python-swiftclient always fetches unless the current page is empty.) This commit adds a pair of new Swift backend settings to handle this. Set `fetch_until_empty_page` to true to always fetch another page of the container listing unless there are no items left. Alternatively, set `partial_page_fetch_threshold` to an integer percentage. In this case rclone will fetch a new page only when the current page is within this percentage of the limit. Swift API reference: https://docs.openstack.org/swift/latest/api/pagination.html PR against ncw/swift with research and discussion: ncw/swift#167 Fixes #7924
Ceph's Swift API emulation does not fully confirm to the API spec. As a result, it sometimes returns fewer items in a container than the requested limit, which according to the spec should means that there are no more objects left in the container. (Note that python-swiftclient always fetches unless the current page is empty.) This commit adds a pair of new Swift backend settings to handle this. Set `fetch_until_empty_page` to true to always fetch another page of the container listing unless there are no items left. Alternatively, set `partial_page_fetch_threshold` to an integer percentage. In this case rclone will fetch a new page only when the current page is within this percentage of the limit. Swift API reference: https://docs.openstack.org/swift/latest/api/pagination.html PR against ncw/swift with research and discussion: ncw/swift#167 Fixes rclone#7924
Some Swift API implementations (I've observed this with Ceph
RADOSGW) can return fewer results than specified by the "limit"
parameter, even when we have not reached the end of the listing.
It's unclear to me from reading the API docs whether this is
a violation of the API specification, but since it happens in the
wild, it's best to be able to handle it.
One way of doing this is to simply keep fetching pages until we
receive an empty page. Another is to assume that pages within
a certain percentage of limit are not the last page. Given the
tradeoffs involved, let's support both.