Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross cluster search in PPL #1512

Merged

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Apr 10, 2023

Description

Support cross cluster search in PPL.

Issues Resolved

#789

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ppl/src/main/antlr/OpenSearchPPLParser.g4 Outdated Show resolved Hide resolved
@@ -590,6 +600,10 @@ qualifiedName
: ident (DOT ident)* #identsAsQualifiedName
;

clusterQualifiedName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other to do this would be to allow the use of : in index names.

Copy link
Collaborator Author

@seankao-az seankao-az Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered this option. The reason why I didn't think it work very well is because we really only want : in search commands. We'll need to reject the : symbol else where such as describe command.

I will take this into consideration. Which do you think is better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed use of : in index name

YANG-DB
YANG-DB previously approved these changes Apr 11, 2023
@seankao-az seankao-az force-pushed the feature/cross-cluster-search branch 2 times, most recently from 93d441c to 6ac5d70 Compare April 18, 2023 23:51
@codecov-commenter
Copy link

codecov-commenter commented Apr 19, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.16%. Comparing base (3fc11a4) to head (04d03ea).
Report is 344 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1512   +/-   ##
=========================================
  Coverage     97.16%   97.16%           
- Complexity     4116     4120    +4     
=========================================
  Files           371      371           
  Lines         10366    10373    +7     
  Branches        704      704           
=========================================
+ Hits          10072    10079    +7     
  Misses          287      287           
  Partials          7        7           
Flag Coverage Δ
sql-engine 97.16% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Yury-Fridlyand
Copy link
Collaborator

Can I use this for SQL ITs? How can I configure number of nodes? How can I set per-node settings in IT?
This will be very helpful for pagination ITs.

@seankao-az
Copy link
Collaborator Author

Can I use this for SQL ITs?

Yes if it's based on OpenSearchSQLRestTestCase

How can I configure number of nodes?

Set numberOfNodes in gradle build script for the cluster, for example:

numberOfNodes = 3

How can I set per-node settings in IT?

I'm not sure about per-node settings...
But for cluster settings we have this function:

protected static JSONObject updateClusterSettings(ClusterSetting setting) throws IOException {

in this PR I allow it to also update remote cluster settings

@Yury-Fridlyand
Copy link
Collaborator

Thanks!

@seankao-az seankao-az force-pushed the feature/cross-cluster-search branch from 88bbcc1 to 7797ba7 Compare April 19, 2023 20:05
@seankao-az seankao-az marked this pull request as ready for review April 20, 2023 17:13
@seankao-az seankao-az requested review from penghuo and YANG-DB April 20, 2023 17:14
@seankao-az seankao-az force-pushed the feature/cross-cluster-search branch from 7797ba7 to d94322c Compare April 20, 2023 20:29
docs/user/ppl/admin/cross_cluster_search.rst Show resolved Hide resolved
docs/user/ppl/admin/cross_cluster_search.rst Outdated Show resolved Hide resolved
@@ -34,7 +34,8 @@ Example: Create the ppl_role for test_user. then test_user could use PPL to quer
],
"allowed_actions": [
"indices:data/read/search*",
"indices:admin/mappings/get"
"indices:admin/mappings/get",
"indices:monitor/settings/get"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain more why this permission is required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found during testing that this permission is needed. I believe it's this part to get cluster max result window setting that needs this permission.

I shall remove this change and make a separate PR for it, as it's not directly related to this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok to add to this PR.
Please create an follow up issue to create the PPL_ACCESS in security plugin.

Copy link
Collaborator Author

@seankao-az seankao-az Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added back to the doc. will make a PR to security plugin to create roles for both ppl and ccs

ppl/src/main/antlr/OpenSearchPPLParser.g4 Outdated Show resolved Hide resolved
@seankao-az seankao-az force-pushed the feature/cross-cluster-search branch from d94322c to 38f4a15 Compare April 24, 2023 22:06
@seankao-az seankao-az requested a review from penghuo April 25, 2023 00:13
@seankao-az seankao-az force-pushed the feature/cross-cluster-search branch from 00410d3 to c03b69e Compare April 27, 2023 19:36
@seankao-az seankao-az force-pushed the feature/cross-cluster-search branch from 90a0e96 to 30971b1 Compare April 27, 2023 21:37
@seankao-az seankao-az requested a review from penghuo April 27, 2023 23:29
@dai-chen dai-chen added feature PPL Piped processing language labels May 1, 2023
@seankao-az seankao-az requested a review from YANG-DB May 1, 2023 18:13
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1512-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1bda5fe73312017b3bc6029e2eaf378c2e72306c
# Push it to GitHub
git push --set-upstream origin backport/backport-1512-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1512-to-2.x.

seankao-az added a commit to seankao-az/sql that referenced this pull request May 12, 2023
* feat: PPL parser for ccs

Signed-off-by: Sean Kao <[email protected]>

* feat: disable describe remote cluster index in PPL

Allowing the syntax will lead to misunderstanding in the query result,
because we will do a local cluster query for index mapping, even for
remote indices. This is due to the restriction that OpenSearch doesn't
support remote cluster index mapping query at the moment.

Signed-off-by: Sean Kao <[email protected]>

* feat: Query system index without cluster name

We require system index query to happen at the local cluster.
Currently, OpenSearch does not support cross cluster system index
query. Thus, mapping of a remote index is unavailable.

Therefore, we require the local cluster to have the system index
of the remote cluster index.

The full "cluster:index" name is still
used to query OpenSearch for datarows, as CCS is natively supported.

Signed-off-by: Sean Kao <[email protected]>

* fix: index name parsing for datasources

To identify datasources in the index qualified names, they need to
be parsed into parts (separated only by dots). clusterQualifiedName
can't contain custom datasources, hence the distinction.

Signed-off-by: Sean Kao <[email protected]>

* multi clusters setup for integration test

Signed-off-by: Sean Kao <[email protected]>

* Add IT test case

Signed-off-by: Sean Kao <[email protected]>

* Document ccs for ppl

Signed-off-by: Sean Kao <[email protected]>

* documentation update

Signed-off-by: Sean Kao <[email protected]>

* feat: allow describe remote cluster index in PPL

Signed-off-by: Sean Kao <[email protected]>

* feat: allow "*:index" to match all remote clusters

Signed-off-by: Sean Kao <[email protected]>

* use local index names for field mappings request

Signed-off-by: Sean Kao <[email protected]>

* allow ':' in index identifier

Signed-off-by: Sean Kao <[email protected]>

* docs update

Signed-off-by: Sean Kao <[email protected]>

* limit cluster prefix to table names only

Signed-off-by: Sean Kao <[email protected]>

* move multicluster capability to sql rest test case

Signed-off-by: Sean Kao <[email protected]>

* add IT for failure case

Signed-off-by: Sean Kao <[email protected]>

* remove logger info for connection in IT test case

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>
(cherry picked from commit 1bda5fe)
seankao-az added a commit to seankao-az/sql that referenced this pull request May 12, 2023
* feat: PPL parser for ccs

Signed-off-by: Sean Kao <[email protected]>

* feat: disable describe remote cluster index in PPL

Allowing the syntax will lead to misunderstanding in the query result,
because we will do a local cluster query for index mapping, even for
remote indices. This is due to the restriction that OpenSearch doesn't
support remote cluster index mapping query at the moment.

Signed-off-by: Sean Kao <[email protected]>

* feat: Query system index without cluster name

We require system index query to happen at the local cluster.
Currently, OpenSearch does not support cross cluster system index
query. Thus, mapping of a remote index is unavailable.

Therefore, we require the local cluster to have the system index
of the remote cluster index.

The full "cluster:index" name is still
used to query OpenSearch for datarows, as CCS is natively supported.

Signed-off-by: Sean Kao <[email protected]>

* fix: index name parsing for datasources

To identify datasources in the index qualified names, they need to
be parsed into parts (separated only by dots). clusterQualifiedName
can't contain custom datasources, hence the distinction.

Signed-off-by: Sean Kao <[email protected]>

* multi clusters setup for integration test

Signed-off-by: Sean Kao <[email protected]>

* Add IT test case

Signed-off-by: Sean Kao <[email protected]>

* Document ccs for ppl

Signed-off-by: Sean Kao <[email protected]>

* documentation update

Signed-off-by: Sean Kao <[email protected]>

* feat: allow describe remote cluster index in PPL

Signed-off-by: Sean Kao <[email protected]>

* feat: allow "*:index" to match all remote clusters

Signed-off-by: Sean Kao <[email protected]>

* use local index names for field mappings request

Signed-off-by: Sean Kao <[email protected]>

* allow ':' in index identifier

Signed-off-by: Sean Kao <[email protected]>

* docs update

Signed-off-by: Sean Kao <[email protected]>

* limit cluster prefix to table names only

Signed-off-by: Sean Kao <[email protected]>

* move multicluster capability to sql rest test case

Signed-off-by: Sean Kao <[email protected]>

* add IT for failure case

Signed-off-by: Sean Kao <[email protected]>

* remove logger info for connection in IT test case

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>
(cherry picked from commit 1bda5fe)
seankao-az added a commit to seankao-az/sql that referenced this pull request May 12, 2023
* feat: PPL parser for ccs

Signed-off-by: Sean Kao <[email protected]>

* feat: disable describe remote cluster index in PPL

Allowing the syntax will lead to misunderstanding in the query result,
because we will do a local cluster query for index mapping, even for
remote indices. This is due to the restriction that OpenSearch doesn't
support remote cluster index mapping query at the moment.

Signed-off-by: Sean Kao <[email protected]>

* feat: Query system index without cluster name

We require system index query to happen at the local cluster.
Currently, OpenSearch does not support cross cluster system index
query. Thus, mapping of a remote index is unavailable.

Therefore, we require the local cluster to have the system index
of the remote cluster index.

The full "cluster:index" name is still
used to query OpenSearch for datarows, as CCS is natively supported.

Signed-off-by: Sean Kao <[email protected]>

* fix: index name parsing for datasources

To identify datasources in the index qualified names, they need to
be parsed into parts (separated only by dots). clusterQualifiedName
can't contain custom datasources, hence the distinction.

Signed-off-by: Sean Kao <[email protected]>

* multi clusters setup for integration test

Signed-off-by: Sean Kao <[email protected]>

* Add IT test case

Signed-off-by: Sean Kao <[email protected]>

* Document ccs for ppl

Signed-off-by: Sean Kao <[email protected]>

* documentation update

Signed-off-by: Sean Kao <[email protected]>

* feat: allow describe remote cluster index in PPL

Signed-off-by: Sean Kao <[email protected]>

* feat: allow "*:index" to match all remote clusters

Signed-off-by: Sean Kao <[email protected]>

* use local index names for field mappings request

Signed-off-by: Sean Kao <[email protected]>

* allow ':' in index identifier

Signed-off-by: Sean Kao <[email protected]>

* docs update

Signed-off-by: Sean Kao <[email protected]>

* limit cluster prefix to table names only

Signed-off-by: Sean Kao <[email protected]>

* move multicluster capability to sql rest test case

Signed-off-by: Sean Kao <[email protected]>

* add IT for failure case

Signed-off-by: Sean Kao <[email protected]>

* remove logger info for connection in IT test case

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>
(cherry picked from commit 1bda5fe)
seankao-az added a commit that referenced this pull request May 12, 2023
* feat: PPL parser for ccs

Signed-off-by: Sean Kao <[email protected]>

* feat: disable describe remote cluster index in PPL

Allowing the syntax will lead to misunderstanding in the query result,
because we will do a local cluster query for index mapping, even for
remote indices. This is due to the restriction that OpenSearch doesn't
support remote cluster index mapping query at the moment.

Signed-off-by: Sean Kao <[email protected]>

* feat: Query system index without cluster name

We require system index query to happen at the local cluster.
Currently, OpenSearch does not support cross cluster system index
query. Thus, mapping of a remote index is unavailable.

Therefore, we require the local cluster to have the system index
of the remote cluster index.

The full "cluster:index" name is still
used to query OpenSearch for datarows, as CCS is natively supported.

Signed-off-by: Sean Kao <[email protected]>

* fix: index name parsing for datasources

To identify datasources in the index qualified names, they need to
be parsed into parts (separated only by dots). clusterQualifiedName
can't contain custom datasources, hence the distinction.

Signed-off-by: Sean Kao <[email protected]>

* multi clusters setup for integration test

Signed-off-by: Sean Kao <[email protected]>

* Add IT test case

Signed-off-by: Sean Kao <[email protected]>

* Document ccs for ppl

Signed-off-by: Sean Kao <[email protected]>

* documentation update

Signed-off-by: Sean Kao <[email protected]>

* feat: allow describe remote cluster index in PPL

Signed-off-by: Sean Kao <[email protected]>

* feat: allow "*:index" to match all remote clusters

Signed-off-by: Sean Kao <[email protected]>

* use local index names for field mappings request

Signed-off-by: Sean Kao <[email protected]>

* allow ':' in index identifier

Signed-off-by: Sean Kao <[email protected]>

* docs update

Signed-off-by: Sean Kao <[email protected]>

* limit cluster prefix to table names only

Signed-off-by: Sean Kao <[email protected]>

* move multicluster capability to sql rest test case

Signed-off-by: Sean Kao <[email protected]>

* add IT for failure case

Signed-off-by: Sean Kao <[email protected]>

* remove logger info for connection in IT test case

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>
(cherry picked from commit 1bda5fe)
MitchellGale pushed a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Jun 12, 2023
* feat: PPL parser for ccs

Signed-off-by: Sean Kao <[email protected]>

* feat: disable describe remote cluster index in PPL

Allowing the syntax will lead to misunderstanding in the query result,
because we will do a local cluster query for index mapping, even for
remote indices. This is due to the restriction that OpenSearch doesn't
support remote cluster index mapping query at the moment.

Signed-off-by: Sean Kao <[email protected]>

* feat: Query system index without cluster name

We require system index query to happen at the local cluster.
Currently, OpenSearch does not support cross cluster system index
query. Thus, mapping of a remote index is unavailable.

Therefore, we require the local cluster to have the system index
of the remote cluster index.

The full "cluster:index" name is still
used to query OpenSearch for datarows, as CCS is natively supported.

Signed-off-by: Sean Kao <[email protected]>

* fix: index name parsing for datasources

To identify datasources in the index qualified names, they need to
be parsed into parts (separated only by dots). clusterQualifiedName
can't contain custom datasources, hence the distinction.

Signed-off-by: Sean Kao <[email protected]>

* multi clusters setup for integration test

Signed-off-by: Sean Kao <[email protected]>

* Add IT test case

Signed-off-by: Sean Kao <[email protected]>

* Document ccs for ppl

Signed-off-by: Sean Kao <[email protected]>

* documentation update

Signed-off-by: Sean Kao <[email protected]>

* feat: allow describe remote cluster index in PPL

Signed-off-by: Sean Kao <[email protected]>

* feat: allow "*:index" to match all remote clusters

Signed-off-by: Sean Kao <[email protected]>

* use local index names for field mappings request

Signed-off-by: Sean Kao <[email protected]>

* allow ':' in index identifier

Signed-off-by: Sean Kao <[email protected]>

* docs update

Signed-off-by: Sean Kao <[email protected]>

* limit cluster prefix to table names only

Signed-off-by: Sean Kao <[email protected]>

* move multicluster capability to sql rest test case

Signed-off-by: Sean Kao <[email protected]>

* add IT for failure case

Signed-off-by: Sean Kao <[email protected]>

* remove logger info for connection in IT test case

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Mitchell Gale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants