Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination and Sorting for Get Snapshots API #73952

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jun 9, 2021

Pagination and snapshots for get snapshots API, build on top of the current implementation to enable work that needs this API for testing. A follow-up will leverage the changes to make things more efficient via pagination.

Relates #73570 which does part of the under-the-hood changes required to efficiently implement this API on the repository layer.

@original-brownbear original-brownbear added >enhancement WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jun 9, 2021
@@ -98,6 +98,36 @@ comprising the number of shards in the index, the total size of the index in
bytes, and the maximum number of segments per shard in the index. Defaults to
`false`, meaning that this information is omitted.

`sort`::
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main question left from my end here is on what to do about ?verbose=false requests. For these, the sorting as implemented here will not work out nicely if the request hits and old-version snapshot that doesn't yet track the timestamps in the RepositoryData but will instead just sort by name in this case.

My thinking here would be to maybe just not allow pagination and sorting with ?verbose=false at all for now. Or alternatively, fail these requests if old version snapshots are found in the repo.
Another option would be falling back to loading the missing information needed for pagination when running ?verbose=false queries to work around old repositories but I'm not sure this is worth the effort compared to the other two options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer your first option. Making it dependent on data makes testing difficult.

This would only apply if you specify sort, search or size then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only apply if you specify sort, search or size then?

Yes I think just not allowing any of the three for ?verbose=false would be the plan for option 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with option 1 in this PR now => no pagination allowed with verbose=false

Sort snapshots by the number of indices they contain and break ties by snapshot name.
====

`size`::
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we talked about this before, we want an upper bound here probably in some implicit + explicit form. I think from a response size perspective I'd go with something like 200 for the upper limit on the size in case it's explicitly set and also return a 400 in case a request that does not have size set would return more than a certain limit number of snapshots. The only issue with this is that currently there are users for whom something like 3k snapshots barely works and having an implicit limit would break their use case potentially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we leave this untouched if none of sort, search, size are specified at least in 7.x?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we leave this untouched if none of sort, search, size are specified at least in 7.x?

I wonder if we can/should :) Obviously we don't want to break the API here, but we are moving towards creating situations where users will have a lot more snapshots that before. We did have problems with the memory consumption of the unpaginated API before and although it's better now, I think we are creating situations where a call getting all snapshot verbose could OOM/destabilize a master easily (thinking about shard failures in particular here). If we do keep the unlimited option untouched in 7.x (which makes sense don't get me wrong :)), should we then maybe invest some effort into some sort of circuit breaking during collecting the SnapshotInfo instances to prevent adding a widely used API that can OOM master easily (we could estimate memory from the amount of bytes we read from the blob store or so)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, we are not adding such a new API, the existing API has those properties.

I think limiting explicit size in 7.x makes sense and also if you in any way opt-in to the new parts of the API (search, sort, after).

That said, adding some kind of a memory breaker does make sense (but I would prefer to do that after having done all the pagination/scalability improvement work).

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 9, 2021
Drying up a few spots of code duplication with these tests. Partly to
reduce the size of PR elastic#73952 that makes use of the smoke test infrastructure.
@original-brownbear
Copy link
Member Author

Thanks Henning + David! I addressed all points now I think and added the possibility to specify sort order as well. This should be good for another round :)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, otherwise looking good.

@@ -101,6 +136,20 @@ public ActionRequestValidationException validate() {
if (repositories == null || repositories.length == 0) {
validationException = addValidationError("repositories are missing", validationException);
}
if (size == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (size == 0) {
if (size == 0 || size < NO_LIMIT) {

}
if (after != null) {
validationException = addValidationError("can't use after with verbose=false", validationException);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a check of order here too.

Also, we should add a minimal test of these validation errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ added that to the validation as well + added a UT for all the validation branches

Sort order. Valid values are `asc` for ascending and `desc` for descending order. Defaults to `asc`, meaning ascending order.

NOTE: The pagination parameters `size`, `order`, and `sort` are not supported when using `verbose=false` and the sort order for
requests with `verbose=false` is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had a well-defined sort order of snapshot name, then with the introduction of start time, it changed to be start-time, then name. Now we are making it "undefined". I see that the implementation still order by start-time and then name by default, so we really did not change anything with this PR. I am inclined to leave this as is, though we could consider calling this a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we're only sorting by name for non-verbose (we sort these thin SnapshotInfo that we build in org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction#buildSimpleSnapshotInfos). I liked undefined better than starting to explain that when we never actually documented a defined sort order for non-verbose. This change for now won't change anything about that APIs return, just a docs adjustment so not really a breaking change? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, I missed that we still fill in 0L for start time, thanks.

int startIndex = 0;
if (after != null) {
final String name = after.snapshotName();
switch (sortBy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the comparator to avoid the switching and repetition here? I suppose this is what you meant by your comment above, but unless I am mistaken this would be easy to do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really right? We don't have a SnapshotInfo instance to compare against so we can't use the comparator directly. I refactored the logic here a little now to be drier and use streams to work around the list mutability situation as well, let me know what you think :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, though I think it could be simplified a bit more. But let us tackle that later once we are nearer completion, for the purpose of this PR this is fine.

@@ -98,6 +98,37 @@ comprising the number of shards in the index, the total size of the index in
bytes, and the maximum number of segments per shard in the index. Defaults to
`false`, meaning that this information is omitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new arguments are indicated as specified in the body, but I think they have to be specified as a query param?

In fact, looking at some of the other params here, it looks as though those also need to be specified as query param, at least ignore_unavailable? I feel like I am missing something that our REST framework handles automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this seems to be a doc bug, sorry for not noticing. The docs are simply wrong here IMO, all of these parameters only work as parts of the URL. We never parse the request body.
I'll adjust the docs accordingly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One second look, we have the same bug in at least the snapshot status API as well where we document all query parameters as body elements but don't actually parse the body. Shall we fix this in a separate PR and audit other docs for the same mistake and leave it as is here maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would just move the new parameters introduced in this PR to the query params section and then fix it in another PR.

But I am also good with your suggestion here if you prefer.

return startFullSnapshot(logger, repoName, snapshotName, partial);
}

public static ActionFuture<CreateSnapshotResponse> startFullSnapshot(Logger logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to weigh in that I also have a preference (but not a requirement) for static loggers.

import static org.hamcrest.Matchers.in;
import static org.hamcrest.Matchers.is;

// TODO: dry up duplication across this suite and org.elasticsearch.snapshots.GetSnapshotsIT more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be nice. Can be done in a follow-up. I would be OK to have a more minimal REST style test and keep the main sort, pagination and stability testing as an internal cluster IT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically just wanted to make extra sure all the param parsing is correct and it seemed easiest to test this by simply running a few cases. My reasoning for this one was that we probably won't save that much code but will lose coverage if we do a simpler rest test that still covers all possible param combinations.

@original-brownbear
Copy link
Member Author

Thanks @henningandersen addressed all points I think, this should be ready for another round

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the extra iterations Armin.

@@ -98,6 +98,37 @@ comprising the number of shards in the index, the total size of the index in
bytes, and the maximum number of segments per shard in the index. Defaults to
`false`, meaning that this information is omitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would just move the new parameters introduced in this PR to the query params section and then fix it in another PR.

But I am also good with your suggestion here if you prefer.

Sort order. Valid values are `asc` for ascending and `desc` for descending order. Defaults to `asc`, meaning ascending order.

NOTE: The pagination parameters `size`, `order`, and `sort` are not supported when using `verbose=false` and the sort order for
requests with `verbose=false` is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, I missed that we still fill in 0L for start time, thanks.

int startIndex = 0;
if (after != null) {
final String name = after.snapshotName();
switch (sortBy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, though I think it could be simplified a bit more. But let us tackle that later once we are nearer completion, for the purpose of this PR this is fine.

final ActionRequestValidationException e = request.validate();
assertThat(e.getMessage(), containsString("can't use non-default sort order with verbose=false"));
}
request.after(new GetSnapshotsRequest.After("foo", "bar"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We are accumulating errors here, I wonder if we should either do them one by one or just once with all the errors. I prefer one by one, i.e., a new request object for every validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ this was kind of lazy :) I adjusted to testing multiple requests.

@original-brownbear
Copy link
Member Author

Thanks Henning + David! Merging here and moving on to implementing next and the search field now.

@original-brownbear original-brownbear merged commit c1e9590 into elastic:master Jun 17, 2021
@original-brownbear original-brownbear deleted the paginated-get-snapshots-api branch June 17, 2021 07:00
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 17, 2021
Follow up to elastic#73952 adding documentation for the `after` query parameter
and the related `next` response field.
original-brownbear added a commit that referenced this pull request Jun 28, 2021
Follow up to #73952 adding documentation for the `after` query parameter
and the related `next` response field.
original-brownbear added a commit that referenced this pull request Jun 29, 2021
)

Backport of the recently introduced snapshot pagination and scalability improvements listed below.
Merged as a single backport because the `7.x` and master snapshot status API logic had massively diverged between master and 7.x. With the work in the below PRs, the logic in master and 7.x once again has been aligned very closely again.

#72842
#73172
#73199
#73570 
#73952
#74236 
#74451 (this one is only partly applicable as it was mainly a change to master to align `master` and `7.x` branches)
@original-brownbear original-brownbear restored the paginated-get-snapshots-api branch April 18, 2023 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants