Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Allow text formats to work in async mode #73729

Merged
merged 4 commits into from
Jun 9, 2021

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Jun 3, 2021

This extends the async implementation to support working with the text
formats (txt, csv, tsv).

A test validating the administrator operation of a user with the
"manage" permission has also been added.

@bpintea bpintea added the :Analytics/SQL SQL querying label Jun 3, 2021
@bpintea bpintea requested review from costin, astefan and matriv June 3, 2021 16:35
@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Jun 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bogdan, left some comments, mostly worrying about flakiness because of the async nature of things.

@@ -101,6 +102,24 @@ private void testCase(String user, String otherUser) throws Exception {
assertThat(exc.getResponse().getStatusLine().getStatusCode(), equalTo(400));
}

// user with manage privilege can check status and delete
public void testWithManager() throws IOException {
Response submitResp = submitAsyncSqlSearch("SELECT event_type FROM \"index\" WHERE val=0", TimeValue.timeValueSeconds(10), "user1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this time value safe for any case?
Could it be better to make use of some other async test mechanism like assertBusy to make sure that it's robust?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular value has been c&p'ed from the similar EQL tests.

But the way the test works is to get exceptions when the "other" user tries to get to this async request. For that to work, the async task would only need to be created. Which is checked both by asserting an OK answer to the initial request, as well as by getting its status, subsequently. So the task wouldn't even have to be finished, just started, which is guaranteed by the first checks.

So the "10s" aren't actually relevant for the outcome of the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thx!


class SqlResponseFormatter extends RestResponseListener<SqlQueryResponse> {

private final long startNanos = System.nanoTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct place to count the elapsed time for the request?
(just asking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how we counted before -- it's pretty much when the request enters SQL, but yeh, otherwise maybe "a bit" later than when it enters ES.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thx!


@Override
protected Set<String> responseParams() {
return Collections.singleton(URL_PARAM_DELIMITER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange, why do we return the Delimiter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's used to signal that this param can be checked on, but not consumed. BaseRestHandler.java#prepareRequest and responseParams() details how that works.

}
request.setJsonEntity(bulk.toString());
client().performRequest(request);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra empty line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, will remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


Response deleteResp = deleteAsyncSqlSearch(id, "manage_user");
assertOK(deleteResp);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

expected.put(IS_RUNNING_NAME, false);
expected.put(COLUMNS_NAME, singletonList(columnInfo(mode, "1", "integer", JDBCType.INTEGER, 11)));
expected.put(ROWS_NAME, singletonList(singletonList(1)));
assertAsyncResponse(expected, runSql(builder, mode));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worrying again here and for the rest of the aysnc tests about flakiness. Should we use the assertBusy mechanism to make sure that it always works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertBusy could be used in this case, true (tho with extra complexity). How would you see it failing in a flaky way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easy to achieve I guess, If you want to just check locally maybe try a simple Thread.sleep() somewhere in the SQL chain (e.g. Analyzer), so that the execution is delayed for let's say 5secs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC your suggestion, the (merged) local cancellation tests do that, they start a task and make sure it blocks at certain stages along its execution.

The tests here are all remote ones though, that are only mean to check the correct response formatting of the textual answers. The timing is irrelevant, it makes no difference between "10s" or "1d", for as long as the query finishes. But even if it doesn't finish, or the query goes wrong, it'd result in a failed test due to mismatch between expected and returned answer. That's why I wasn't sure how you see it failing in a flaky way.

Using assertBusy to check on task's actual completion before checking its result is possible, but don't think it's necessary. But let me know if you see it differently still.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpintea Thx Bogdan, I was confused with the functionality here.
Since we have the waitForCompletionTimeout that ensures that we have the response (or a timeout if the test runs for 1 day :D) when we try to assert it, so no need for the assertBusy.

return runSqlAsTextFormat(builder, null);
}

static Response runSqlAsTextFormat(RequestObjectBuilder builder, @Nullable String format) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a better name is runSqlWithFormat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. to runSqlAsTextWithFormat since there are existing runSqlAsText methods and only text formats need to be provided (as it checks for a Cursor header, only available with those formats).

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a comment on two methods that seem to share code.
The sync/async code paths complicate things though I don't have any good advice on how to move forward without splitting the code apart.
We just have to be careful and use proper formatting to delimitate the two and where possible use different methods.

@@ -85,7 +85,7 @@ public void testIncorrectAcceptHeader() throws IOException {
request.setEntity(stringEntity);
expectBadRequest(
() -> toMap(client().performRequest(request), "plain"),
containsString("Invalid response content type: Accept=[application/fff]")
containsString("Invalid request content type: Accept=[application/fff]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -194,7 +194,7 @@ public void writeTo(StreamOutput out) throws IOException {
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
{
if (Strings.hasText(asyncExecutionId)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using different class hierarchy instead of dedicated properties might help encapsulate the difference in response between sync/async better.
The hasId is obscure - is it used anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using different class hierarchy instead of dedicated properties might help encapsulate the difference in response between sync/async better.

That'd be ideal, but I didn't find a way to cross the hierarchies.

The hasId is obscure - is it used anywhere else?

hasId() is also used in SqlResponseFormatter and TextFormat, yes. It is a bit obscure and I hesitated using it, but eventually did since there is a hasCursor() (which returns the cursor field) and there is already the id() method. Now the id() method is maybe also generic, but I wanted to keep the symmetry with EqlSearchResponse.
(We could change them in both classes to something like asyncId() & co., but I'd do that subsequently, if needed.)

return getResponseMediaType(request);
}

public static MediaType getResponseMediaType(RestRequest request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should delegate to the one above or vice-versa.
The URL_PARAM_FORMAT & co is shared across the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check on URL_PARAM_FORMAT needs to stay in both functions, since the functions are called from different context and need to act differently on its presence. But I've refactored the functions to avoid the double check.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx!

@bpintea
Copy link
Contributor Author

bpintea commented Jun 8, 2021

@elasticmachine run elasticsearch-ci/rest-compatibility

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Left few comments.

Character csvDelimiter = ',';

assertEquals(200, response.getStatusLine().getStatusCode());
assertEquals(response.getHeader(HEADER_NAME_ASYNC_PARTIAL), response.getHeader(HEADER_NAME_ASYNC_RUNNING));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you are comparing these two. assertEquals is about comparing a (actual) value to something that's expected. Comparing something from the response with something else from the response breaks this contract. If both are true then it should be assertEquals(true, response.....) or assertTrue(response.get....). Just few lines above you used the right way for these assertions: https://github.com/elastic/elasticsearch/pull/73729/files#diff-6f95aacef2988567618b4250a265a6eadcebe2eb971653f7267a1a86c8e73c4fR1243-R1244

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEquals is about comparing a (actual) value to something that's expected.

Not sure, I might have taken editor's suggestion to "simplify" the code. The documentation is not "contractual" enough, IMO, but parameter naming might support your case. :-) So I see your point and I've refactored to assertTrue() (tho maybe a bit less easy to read).

Just few lines above you used the right way for these assertions

This case differs from above since both "true" or "false" are valid, as long as both headers have the same value.

long millis = 1;
for (boolean isRunning = true; isRunning; Thread.sleep(millis *= 2)) {
asyncStatus = toMap(client().performRequest(request), null);
isRunning = (boolean) asyncStatus.get(IS_RUNNING_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a danger here of this for not existing and running into a test suite timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require the server so somehow hang with the search task alive (which I guess would also trigger a timeout).

if (randomBoolean()) {
request.addParameter(URL_PARAM_FORMAT, format);
if (format.equals("csv")) {
csvDelimiter = ';';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this well, you use the default delimiter (,) but depending on a randomBoolean you change that (and is set explicitly) to ;. I would move Character csvDelimiter = ','; closer to this randomBoolean. If I wouldn't have put the code in the IDE and look at what happens with the csvDelimiter it would have been more difficult to see what the logic is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The randomBoolean() splits between passing the format as a URL parameter or HTTP header. Then, if the chosen branch is to use the parameter and the format - also randomly chosen - is csv then the splitting character is changed.
The randomBoolean() isn't solely responsible for the delimiter char, so not sure how to make it clearer, but if you have a suggestion I'll apply it.

Comment on lines -63 to -81
MediaType responseMediaType = sqlMediaTypeParser.getResponseMediaType(request, sqlRequest);
if (responseMediaType == null) {
String msg = String.format(Locale.ROOT, "Invalid response content type: Accept=[%s], Content-Type=[%s], format=[%s]",
request.header("Accept"), request.header("Content-Type"), request.param("format"));
throw new IllegalArgumentException(msg);
}

/*
* Special handling for the "delimiter" parameter which should only be
* checked for being present or not in the case of CSV format. We cannot
* override {@link BaseRestHandler#responseParams()} because this
* parameter should only be checked for CSV, not always.
*/
if ((responseMediaType instanceof XContentType || ((TextFormat) responseMediaType) != TextFormat.CSV)
&& request.hasParam(URL_PARAM_DELIMITER)) {
throw new IllegalArgumentException(unrecognized(request, Collections.singleton(URL_PARAM_DELIMITER), emptySet(), "parameter"));
}

long startNanos = System.nanoTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree with this code being "externalized" in a separate class that is a RestResponseListener called SqlResponseFormatter. If it's a Listener, this should be reflected in its name imo. Also, why its a separate class? Maybe an internal class in RestSqlQueryAction if you really like to refactor the code (to me it looks like a forced refactoring without any real benefit, maybe it's just me)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] class that is a RestResponseListener called SqlResponseFormatter

Good point, I've renamed it to SqlResponseListener.

why its a separate class?

The listener is used by both RestSqlQueryAction and RestSqlAsyncGetResultsAction.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for addressing my review.

bpintea added 3 commits June 9, 2021 17:42
This extends the async implementation to support working with the text
formats (txt, csv, tsv).

A test validating the administrator operation of a user with the
"manage" permission has also been added.
- refactor media type calculation to avoid redundant check on URL param.
- rename methods in tests.
- remove empty lines.
- renamed listener;
- minor test refactorings.
@bpintea bpintea force-pushed the feature/sql-async-text branch from e955d30 to ea85d2b Compare June 9, 2021 15:44
- Fix Nullable and TimeValue imports after classes relocation.
@bpintea bpintea merged commit 52001bd into elastic:feat/sql-async Jun 9, 2021
@bpintea bpintea deleted the feature/sql-async-text branch June 9, 2021 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants