Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pull queries available on /query rest & ws endpoint #3820

Merged
merged 3 commits into from
Nov 14, 2019

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Nov 12, 2019

Description

fixes: #3672 by providing alternative way of issuing pull queries that does NOT log
fixes: #3806

We should look to include #3819 as part of this fix.

Makes pull queries available on the /query RESTful and Websocket endpoints, in the same way that push queries are.

Note: this change does not remove pull query support from the /ksql endpoint, nor does it switch the CLI over to use
the /query endpoint. The CLI continues to use the /ksql endpoint for pull queries.

Push and pull queries to the /query rest endpoint now return the schema of the rows in the first message.
This is required as the 'DESCRIBE' that CLI was previously running to get column headers doesn't work for pull queries yet. (Known issue: #3495).
This is similar to the pattern used by the websocket endpoint, which also sends the schema in the first message.

In addition, I've hidden null fields and added a 'header' row to return the schema of the data. The output now looks like:

{"header":{"queryId":"someId","schema":"`USERID` STRING, `PAGEID` STRING, `VIEWTIME` BIGINT, `ROWKEY` STRING"}}
{"row":{"columns":["USER_1","PAGE_1",1,"1"]}}
{"row":{"columns":["USER_2","PAGE_2",2,"2"]}}
{"finalMessage":"Limit Reached"}"

Note: With this PR the payload from the rest endpoint continues to be invalid JSON, just like it is for push queries. #3819 addresses this.

BREAKING CHANGE: the response from the RESTful API for push queries has changed: it now returns a line with the schema and query id in a header field and null fields are not included in the payload.

The CLI is backwards compatible with older versions of the server, though it won't output column headings from older versions.

Note: This PR does NOT address issue #3663. At the moment I have a feeling an admin client is still made per-request.

Outstanding tasks:

  • the schema being returned for pull queries is incorrect. (Working on this)
  • Decide if we want to switch CLI to use /query endpoint.
  • Decide if we want to drop pull query support from /ksql endpoint.
  • Perf test this new endpoint

Testing done

Lots of testing, including manual

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

fixes: confluentinc#3672 by providing alternative way of issuing pull queries that does NOT log

Makes pull queries available on the `/query` RESTful and Websocket endpoints, in the same way that push queries are.

Note: this change does not _remove_ pull query support from the `/ksql` endpoint, nor does it switch the CLI over to use
the `/query` endpoint. The CLI continues to use the `/ksql` endpoint for pull queries.

Push and pull queries to the `/query` rest endpoint now return the schema of the rows in the first message.
This is required as the 'DESCRIBE' that CLI was previously running to get column headers doesn't work for pull queries yet. (Known issue: confluentinc#3495).
This is similar to the pattern used by the websocket endpoint, which also sends the schema in the first message.

In addition, I've hidden null fields and added a 'header' row to return the schema of the data. The output now looks like:

```json
[{"header":{"queryId":"someId","schema":"`USERID` STRING, `PAGEID` STRING, `VIEWTIME` BIGINT, `ROWKEY` STRING"}},
{"row":{"columns":["USER_1","PAGE_1",1,"1"]}},
{"row":{"columns":["USER_2","PAGE_2",2,"2"]}},
{"finalMessage":"Limit Reached"}]"
```

BREAKING CHANGE: the response from the RESTful API for push queries has changed: it now returns a line with the schema and query id in a `header` field and null fields are not included in the payload.

The CLI is backwards compatible with older versions of the server, though it won't output column headings from older versions.
@big-andy-coates big-andy-coates requested a review from a team as a code owner November 12, 2019 14:42
Copy link
Member

@vpapavas vpapavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please get another +1 as I am not really familiar with the web socket code.

@big-andy-coates big-andy-coates requested a review from a team November 12, 2019 21:00
output.write("\n".getBytes(StandardCharsets.UTF_8));
output.flush();
}

private StreamedRow buildHeader() {
// Push queries only return value columns, but query metadata schema includes key and meta:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the comment, you mean you want the schema to only contain the value columns, but currently they return also the meta columns and that's what you are working on fixing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. The schema in the query metadata contains key and meta columns. But push queries only return value columns. Hence in this method I create a new schema with only the value columns.

The longer term fix is to store the right schema in the query metadata, but that's outside the scope of this PR.

The outstanding issue with schemas is for pull queries, not push. Pull queries return key columns, but the returned schema currently doesn't include this.

Copy link
Contributor

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM . Just one clarification.

@@ -316,18 +314,6 @@ private void handleStreamedQuery(
final String query,
final SqlBaseParser.QueryStatementContext ignored
) {
final RestResponse<KsqlEntityList> explainResponse = restClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have #3495 , is the plan to introduce some header to control whether schema is part of the response or not? It may make sense for CLI to request the schema always, it may add a constant overhead for queries from real applications.?

Copy link
Contributor Author

@big-andy-coates big-andy-coates Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. Though not sure how #3495 is related.

I think a query parameter might make more sense than a header. What do you think?

Raised #3835 to track

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah parameter makes sense.

@big-andy-coates
Copy link
Contributor Author

test this please

@big-andy-coates
Copy link
Contributor Author

retest this please

@vinothchandar
Copy link
Contributor

I rekicked it and it still failed with checkstyle

[2019-11-13T19:21:11.288Z] [INFO] 
[2019-11-13T19:21:11.288Z] [INFO] --- maven-checkstyle-plugin:2.17:check (validate) @ ksql-rest-app ---
[2019-11-13T19:21:11.847Z] [INFO] Starting audit...
[2019-11-13T19:21:11.847Z] [ERROR] /home/****/workspace/confluentinc-pr_ksql_PR-3820/ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/resources/streaming/PullQueryPublisher.java:75: Class PullQuerySubscription should be declared as final. [FinalClass]
[2019-11-13T19:21:11.847Z] Audit done.
[2019-11-13T19:21:11.847Z] [INFO] BUILD FAILURE
[2019-11-13T19:21:11.847Z] [INFO] ------------------------------------------------------------------------
[2019-11-13T19:21:11.847Z] [INFO] Total time: 17:46 min
[2019-11-13T19:21:11.847Z] [INFO] Finished at: 2019-11-13T19:21:11Z
[2019-11-13T19:21:12.405Z] [INFO] Final Memory: 277M/7839M
[2019-11-13T19:21:12.405Z] [INFO] ------------------------------------------------------------------------
[2019-11-13T19:21:12.405Z] [INFO] [****-event-spy] Generated /home/****/workspace/confluentinc-pr_ksql_PR-3820@tmp/withMavend43ff676/maven-spy-20191113-190324-9881154326827730460957.log
[2019-11-13T19:21:12.406Z] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (validate) on project ksql-rest-app: Failed during checkstyle execution: There is 1 error reported by Checkstyle 8.18 with checkstyle/checkstyle.xml ruleset. -> [Help 1]
[2019-11-13T19:21:12.406Z] [ERROR] 
[2019-11-13T19:21:12.406Z] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[2019-11-13T19:21:12.406Z] [ERROR] Re-run Maven using the -X switch to enable full debug logging.
[2019-11-13T19:21:12.406Z] [ERROR] 
[2019-11-13T19:21:12.406Z] [ERROR] For more information about the errors and possible solutions, please read the following articles:
[2019-11-13T19:21:12.406Z] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[2019-11-13T19:21:12.406Z] [ERROR] 
[2019-11-13T19:21:12.406Z] [ERROR] After correcting the problems, you can resume the build with the command
[2019-11-13T19:21:12.406Z] [ERROR]   mvn <goals> -rf :ksql-rest-app

@big-andy-coates big-andy-coates merged commit e2321f5 into confluentinc:master Nov 14, 2019
@big-andy-coates big-andy-coates deleted the pull_to_query branch November 14, 2019 13:02
big-andy-coates added a commit that referenced this pull request Nov 14, 2019
* fix: pull queries available on `/query` rest & ws endpoint

fixes: #3672 by providing alternative way of issuing pull queries that does NOT log

Makes pull queries available on the `/query` RESTful and Websocket endpoints, in the same way that push queries are.

Note: this change does not _remove_ pull query support from the `/ksql` endpoint, nor does it switch the CLI over to use
the `/query` endpoint. The CLI continues to use the `/ksql` endpoint for pull queries.

Push and pull queries to the `/query` rest endpoint now return the schema of the rows in the first message.
This is required as the 'DESCRIBE' that CLI was previously running to get column headers doesn't work for pull queries yet. (Known issue: #3495).
This is similar to the pattern used by the websocket endpoint, which also sends the schema in the first message.

In addition, I've hidden null fields and added a 'header' row to return the schema of the data. The output now looks like:

```json
[{"header":{"queryId":"someId","schema":"`USERID` STRING, `PAGEID` STRING, `VIEWTIME` BIGINT, `ROWKEY` STRING"}},
{"row":{"columns":["USER_1","PAGE_1",1,"1"]}},
{"row":{"columns":["USER_2","PAGE_2",2,"2"]}},
{"finalMessage":"Limit Reached"}]"
```

BREAKING CHANGE: the response from the RESTful API for push queries has changed: it now returns a line with the schema and query id in a `header` field and null fields are not included in the payload.

The CLI is backwards compatible with older versions of the server, though it won't output column headings from older versions.

(cherry picked from commit e2321f5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants