SQL: Allow text formats to work in async mode #73729

bpintea · 2021-06-03T16:35:06Z

This extends the async implementation to support working with the text
formats (txt, csv, tsv).

A test validating the administrator operation of a user with the
"manage" permission has also been added.

elasticmachine · 2021-06-03T16:35:09Z

Pinging @elastic/es-ql (Team:QL)

matriv

Thanks Bogdan, left some comments, mostly worrying about flakiness because of the async nature of things.

matriv · 2021-06-04T08:55:55Z

...r/security/src/test/java/org/elasticsearch/xpack/sql/qa/security/RestSqlSecurityAsyncIT.java

@@ -101,6 +102,24 @@ private void testCase(String user, String otherUser) throws Exception {
        assertThat(exc.getResponse().getStatusLine().getStatusCode(), equalTo(400));
    }

+    // user with manage privilege can check status and delete
+    public void testWithManager() throws IOException {
+        Response submitResp = submitAsyncSqlSearch("SELECT event_type FROM \"index\" WHERE val=0", TimeValue.timeValueSeconds(10), "user1");


Is this time value safe for any case?
Could it be better to make use of some other async test mechanism like assertBusy to make sure that it's robust?

This particular value has been c&p'ed from the similar EQL tests.

But the way the test works is to get exceptions when the "other" user tries to get to this async request. For that to work, the async task would only need to be created. Which is checked both by asserting an OK answer to the initial request, as well as by getting its status, subsequently. So the task wouldn't even have to be finished, just started, which is guaranteed by the first checks.

So the "10s" aren't actually relevant for the outcome of the test.

Great, thx!

matriv · 2021-06-04T08:59:21Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plugin/SqlResponseFormatter.java

+
+class SqlResponseFormatter extends RestResponseListener<SqlQueryResponse> {
+
+    private final long startNanos = System.nanoTime();


Is this the correct place to count the elapsed time for the request?
(just asking)

That's how we counted before -- it's pretty much when the request enters SQL, but yeh, otherwise maybe "a bit" later than when it enters ES.

matriv · 2021-06-04T09:00:39Z

...lugin/sql/src/main/java/org/elasticsearch/xpack/sql/plugin/RestSqlAsyncGetResultsAction.java

+
+    @Override
+    protected Set<String> responseParams() {
+        return Collections.singleton(URL_PARAM_DELIMITER);


This seems strange, why do we return the Delimiter?

That's used to signal that this param can be checked on, but not consumed. BaseRestHandler.java#prepareRequest and responseParams() details how that works.

matriv · 2021-06-04T09:03:38Z

.../plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/rest/RestSqlTestCase.java

+        }
+        request.setJsonEntity(bulk.toString());
+        client().performRequest(request);
+


nit: extra empty line.

Thx, will remove.

matriv · 2021-06-04T09:03:55Z

...r/security/src/test/java/org/elasticsearch/xpack/sql/qa/security/RestSqlSecurityAsyncIT.java

+
+        Response deleteResp = deleteAsyncSqlSearch(id, "manage_user");
+        assertOK(deleteResp);
+


nit: extra empty line

matriv · 2021-06-04T09:06:08Z

.../plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/rest/RestSqlTestCase.java

+        expected.put(IS_RUNNING_NAME, false);
+        expected.put(COLUMNS_NAME, singletonList(columnInfo(mode, "1", "integer", JDBCType.INTEGER, 11)));
+        expected.put(ROWS_NAME, singletonList(singletonList(1)));
+        assertAsyncResponse(expected, runSql(builder, mode));


I'm worrying again here and for the rest of the aysnc tests about flakiness. Should we use the assertBusy mechanism to make sure that it always works?

The assertBusy could be used in this case, true (tho with extra complexity). How would you see it failing in a flaky way?

Not easy to achieve I guess, If you want to just check locally maybe try a simple Thread.sleep() somewhere in the SQL chain (e.g. Analyzer), so that the execution is delayed for let's say 5secs.

IIUC your suggestion, the (merged) local cancellation tests do that, they start a task and make sure it blocks at certain stages along its execution.

The tests here are all remote ones though, that are only mean to check the correct response formatting of the textual answers. The timing is irrelevant, it makes no difference between "10s" or "1d", for as long as the query finishes. But even if it doesn't finish, or the query goes wrong, it'd result in a failed test due to mismatch between expected and returned answer. That's why I wasn't sure how you see it failing in a flaky way.

Using assertBusy to check on task's actual completion before checking its result is possible, but don't think it's necessary. But let me know if you see it differently still.

@bpintea Thx Bogdan, I was confused with the functionality here.
Since we have the waitForCompletionTimeout that ensures that we have the response (or a timeout if the test runs for 1 day :D) when we try to assert it, so no need for the assertBusy.

matriv · 2021-06-04T10:07:11Z

.../plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/rest/RestSqlTestCase.java

+        return runSqlAsTextFormat(builder, null);
+    }
+
+    static Response runSqlAsTextFormat(RequestObjectBuilder builder, @Nullable String format) throws IOException {


maybe a better name is runSqlWithFormat

Makes sense, will change.

Changed. to runSqlAsTextWithFormat since there are existing runSqlAsText methods and only text formats need to be provided (as it checks for a Cursor header, only available with those formats).

costin

LGTM. I left a comment on two methods that seem to share code.
The sync/async code paths complicate things though I don't have any good advice on how to move forward without splitting the code apart.
We just have to be careful and use proper formatting to delimitate the two and where possible use different methods.

costin · 2021-06-08T09:03:20Z

...a/server/single-node/src/test/java/org/elasticsearch/xpack/sql/qa/single_node/RestSqlIT.java

@@ -85,7 +85,7 @@ public void testIncorrectAcceptHeader() throws IOException {
        request.setEntity(stringEntity);
        expectBadRequest(
            () -> toMap(client().performRequest(request), "plain"),
-            containsString("Invalid response content type: Accept=[application/fff]")
+            containsString("Invalid request content type: Accept=[application/fff]")


costin · 2021-06-08T09:09:03Z

...plugin/sql/sql-action/src/main/java/org/elasticsearch/xpack/sql/action/SqlQueryResponse.java

@@ -194,7 +194,7 @@ public void writeTo(StreamOutput out) throws IOException {
    public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
        builder.startObject();
        {
-            if (Strings.hasText(asyncExecutionId)) {


I wonder if using different class hierarchy instead of dedicated properties might help encapsulate the difference in response between sync/async better.
The hasId is obscure - is it used anywhere else?

I wonder if using different class hierarchy instead of dedicated properties might help encapsulate the difference in response between sync/async better.

That'd be ideal, but I didn't find a way to cross the hierarchies.

The hasId is obscure - is it used anywhere else?

hasId() is also used in SqlResponseFormatter and TextFormat, yes. It is a bit obscure and I hesitated using it, but eventually did since there is a hasCursor() (which returns the cursor field) and there is already the id() method. Now the id() method is maybe also generic, but I wanted to keep the symmetry with EqlSearchResponse.
(We could change them in both classes to something like asyncId() & co., but I'd do that subsequently, if needed.)

costin · 2021-06-08T09:12:27Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plugin/SqlMediaTypeParser.java

+        return getResponseMediaType(request);
+    }
+
+    public static MediaType getResponseMediaType(RestRequest request) {


This method should delegate to the one above or vice-versa.
The URL_PARAM_FORMAT & co is shared across the two.

The check on URL_PARAM_FORMAT needs to stay in both functions, since the functions are called from different context and need to act differently on its presence. But I've refactored the functions to avoid the double check.

matriv

LGTM, thx!

bpintea · 2021-06-08T19:16:41Z

@elasticmachine run elasticsearch-ci/rest-compatibility

astefan

Looks good in general. Left few comments.

astefan · 2021-06-09T12:57:44Z

.../plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/rest/RestSqlTestCase.java

+            Character csvDelimiter = ',';
+
+            assertEquals(200, response.getStatusLine().getStatusCode());
+            assertEquals(response.getHeader(HEADER_NAME_ASYNC_PARTIAL), response.getHeader(HEADER_NAME_ASYNC_RUNNING));


I don't understand why you are comparing these two. assertEquals is about comparing a (actual) value to something that's expected. Comparing something from the response with something else from the response breaks this contract. If both are true then it should be assertEquals(true, response.....) or assertTrue(response.get....). Just few lines above you used the right way for these assertions: https://github.com/elastic/elasticsearch/pull/73729/files#diff-6f95aacef2988567618b4250a265a6eadcebe2eb971653f7267a1a86c8e73c4fR1243-R1244

assertEquals is about comparing a (actual) value to something that's expected.

Not sure, I might have taken editor's suggestion to "simplify" the code. The documentation is not "contractual" enough, IMO, but parameter naming might support your case. :-) So I see your point and I've refactored to assertTrue() (tho maybe a bit less easy to read).

Just few lines above you used the right way for these assertions

This case differs from above since both "true" or "false" are valid, as long as both headers have the same value.

astefan · 2021-06-09T13:02:00Z

.../plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/rest/RestSqlTestCase.java

+                    long millis = 1;
+                    for (boolean isRunning = true; isRunning; Thread.sleep(millis *= 2)) {
+                        asyncStatus = toMap(client().performRequest(request), null);
+                        isRunning = (boolean) asyncStatus.get(IS_RUNNING_NAME);


Is there a danger here of this for not existing and running into a test suite timeout?

That would require the server so somehow hang with the search task alive (which I guess would also trigger a timeout).

astefan · 2021-06-09T13:13:11Z

.../plugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/rest/RestSqlTestCase.java

+                if (randomBoolean()) {
+                    request.addParameter(URL_PARAM_FORMAT, format);
+                    if (format.equals("csv")) {
+                        csvDelimiter = ';';


If I understand this well, you use the default delimiter (,) but depending on a randomBoolean you change that (and is set explicitly) to ;. I would move Character csvDelimiter = ','; closer to this randomBoolean. If I wouldn't have put the code in the IDE and look at what happens with the csvDelimiter it would have been more difficult to see what the logic is.

The randomBoolean() splits between passing the format as a URL parameter or HTTP header. Then, if the chosen branch is to use the parameter and the format - also randomly chosen - is csv then the splitting character is changed.
The randomBoolean() isn't solely responsible for the delimiter char, so not sure how to make it clearer, but if you have a suggestion I'll apply it.

astefan · 2021-06-09T13:32:09Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plugin/RestSqlQueryAction.java

-        MediaType responseMediaType = sqlMediaTypeParser.getResponseMediaType(request, sqlRequest);
-        if (responseMediaType == null) {
-            String msg = String.format(Locale.ROOT, "Invalid response content type: Accept=[%s], Content-Type=[%s], format=[%s]",
-                request.header("Accept"), request.header("Content-Type"), request.param("format"));
-            throw new IllegalArgumentException(msg);
-        }
-
-        /*
-         * Special handling for the "delimiter" parameter which should only be
-         * checked for being present or not in the case of CSV format. We cannot
-         * override {@link BaseRestHandler#responseParams()} because this
-         * parameter should only be checked for CSV, not always.
-         */
-        if ((responseMediaType instanceof XContentType || ((TextFormat) responseMediaType) != TextFormat.CSV)
-            && request.hasParam(URL_PARAM_DELIMITER)) {
-            throw new IllegalArgumentException(unrecognized(request, Collections.singleton(URL_PARAM_DELIMITER), emptySet(), "parameter"));
-        }
-
-        long startNanos = System.nanoTime();


I am not sure I agree with this code being "externalized" in a separate class that is a RestResponseListener called SqlResponseFormatter. If it's a Listener, this should be reflected in its name imo. Also, why its a separate class? Maybe an internal class in RestSqlQueryAction if you really like to refactor the code (to me it looks like a forced refactoring without any real benefit, maybe it's just me)?

[...] class that is a RestResponseListener called SqlResponseFormatter

Good point, I've renamed it to SqlResponseListener.

why its a separate class?

The listener is used by both RestSqlQueryAction and RestSqlAsyncGetResultsAction.

astefan

LGTM. Thank you for addressing my review.

This extends the async implementation to support working with the text formats (txt, csv, tsv). A test validating the administrator operation of a user with the "manage" permission has also been added.

- refactor media type calculation to avoid redundant check on URL param. - rename methods in tests. - remove empty lines.

- renamed listener; - minor test refactorings.

- Fix Nullable and TimeValue imports after classes relocation.

bpintea added the :Analytics/SQL SQL querying label Jun 3, 2021

bpintea requested review from costin, astefan and matriv June 3, 2021 16:35

elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Jun 3, 2021

matriv reviewed Jun 4, 2021

View reviewed changes

costin approved these changes Jun 8, 2021

View reviewed changes

matriv approved these changes Jun 8, 2021

View reviewed changes

astefan reviewed Jun 9, 2021

View reviewed changes

astefan approved these changes Jun 9, 2021

View reviewed changes

bpintea added 3 commits June 9, 2021 17:42

Allow text formats to work in async mode

817c8a2

This extends the async implementation to support working with the text formats (txt, csv, tsv). A test validating the administrator operation of a user with the "manage" permission has also been added.

Address review comments

67c8e03

- refactor media type calculation to avoid redundant check on URL param. - rename methods in tests. - remove empty lines.

Address review comments

ea85d2b

- renamed listener; - minor test refactorings.

bpintea force-pushed the feature/sql-async-text branch from e955d30 to ea85d2b Compare June 9, 2021 15:44

Fix imports past master merge

6455ff3

- Fix Nullable and TimeValue imports after classes relocation.

bpintea merged commit 52001bd into elastic:feat/sql-async Jun 9, 2021

bpintea deleted the feature/sql-async-text branch June 9, 2021 17:11


		class SqlResponseFormatter extends RestResponseListener<SqlQueryResponse> {

		private final long startNanos = System.nanoTime();


		Response deleteResp = deleteAsyncSqlSearch(id, "manage_user");
		assertOK(deleteResp);

SQL: Allow text formats to work in async mode #73729

SQL: Allow text formats to work in async mode #73729

Conversation

bpintea commented Jun 3, 2021

elasticmachine commented Jun 3, 2021

matriv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv left a comment

Choose a reason for hiding this comment

bpintea commented Jun 8, 2021

astefan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

costin left a comment •

edited

Loading