Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add transport interface for SQL query API #1327

Open
Tracked by #521
dai-chen opened this issue Feb 7, 2023 · 2 comments
Open
Tracked by #521

[FEATURE] Add transport interface for SQL query API #1327

dai-chen opened this issue Feb 7, 2023 · 2 comments

Comments

@dai-chen
Copy link
Collaborator

dai-chen commented Feb 7, 2023

Overview

Background

  • Goal: enable frontend and other backend plugins to have SQL query support over OpenSearch transport action
  • Motivation
    • SQL/PPL is easier to learn, more compact and powerful than native DSL
    • Database-independent: Not require OpenSearch knowledge, ex. how to parse SearchHits, how to do use scroll API for pagination etc
    • Access to OpenSearch functions as needed: search relevancy for example is still accessible via SQL functions
    • Enhanced query support: post-processing capability to support complex expression, window function, subqueries, JOIN etc
  • Previous Work: PPL has added transport API earlier in Add PPL Transport interface, and refactor existing Rest Interface to move handler logic to transport handler #667

Requirements

  • Target Users: other plugin developer for now; all OpenSearch plugin and application developer in future
  • Functional: we want to expose both SQL and PPL by single consistent query API to users
  • Non-Functional: the API should work with security plugin without much impact on existing SQL/PPL plugin users
  • Non-Goal: focus on generic query interface. Leave the choice of Object-Relation Mapping (ORM) and application-specific modeling work to user

Design: Query Interface

This section discuss the following design options for user query interface.

Screenshot 2023-02-07 at 2 23 43 PM

Low Level Option: Native Transport API

First option is to just follow the convention and add transport API classes to OpenSearch common-utils module. This is also what current PPL transport API looks like.

TransportPPLQueryRequest request =
  new TransportPPLQueryRequest(
    query="source = index | where age > 30",
    isExplain=false,
    format="default");

// Same for DQL, DDL, DML
transportService.sendRequest(
  clusterService.state(),
  PPLQueryAction.NAME,
  request,
  new TransportResponseHandler<TransportPPLQueryResponse>() {
    public void handleResponse(TransportPPLQueryResponse response) {
      String results = response.getResult();
      
      // Parse JSON results and populate to specific data model class
      JSONObject json = ...
    }
  }
);
  • Pros
    • This follows the same way as interaction between other plugin, ex. Alerting, AD, Notification
  • Cons
    • Query string is hardcoding in code
    • User needs to deal with transport API directly
    • Very low-level API that requires extra work to build query request and deserialize response

Mid Level Option: JDBC API over Transport

One option on higher level is to leverage JDBC API:

// Open connection
try (Connection conn = DriverManager.getConnection(
 "jdbc:opensearch://http://localhost:9200", "user1", "pass")) {

  // Query metadata
  DatabaseMetaData databaseMetaData = connection.getMetaData();
  try(ResultSet resultSet = databaseMetaData.getTables(null, null, null, new String[]{"TABLE"})){ 
    while(resultSet.next()) { 
      String tableName = resultSet.getString("TABLE_NAME");
      ...
    }
  }

  // DQL
  Statement stmt = conn.createStatement();
  List<ADResult> adResults = new ArrayList<>();
  ResultSet rs = stmt.executeQuery("SELECT ... FROM ...");
  while (rs.hasNext()) {
    ADResult result = new AdResult();
    result.setId(rs.getInt("ad_id");
    ...
    adResults.add(result);
  }

  // DDL & DML
  stmt.executeUpdate("CREATE TABLE test ...");
  stmt.executeUpdate("DELETE FROM test WHERE ...");
}
  • Pros
    • JDBC is standard in Java world
    • JDBC spec is stable
    • User is unaware of underlying transport API
    • We can provide same experience for Java developer no matter whether their application running inside OpenSearch or not
  • Cons
    • Query string is hardcoding in code
    • Client needs to load our JDBC driver which is not lightweight with many shadowed dependencies inside
    • Only SQL is supported and not clear if it’s possible to support PPL
    • JDBC driver always sends query to OpenSearch REST endpoint
    • JDBC driver doesn't support different response format, such as CSV, TXT and RAW
    • With all above changes done, we overload JDBC driver with responsibility for both internal and external use

Mid Level Option: Transport API with Data Modeling

Alternatively, we can provide thin wrapper on top of transport API and take care of de-serialization work by some data model class.

// Same for DQL, DDL, DML
transportService.sendRequest(
  clusterService.state(),
  PPLQueryAction.NAME,
  request,
  new TransportResponseHandler<TransportPPLQueryResponse>() {
    public void handleResponse(TransportPPLQueryResponse response) {

      // Result is already mapped to SQL/PPL data model 
      QueryResult result = response.getResult();
      Schema schema = result.getSchema();
      for (Row row : result.getRows()) {
        ...
      }
    }
  }
);
  • Pros
    • This follows the same way as other plugins, ex. AD has different data model class per API
    • Response deserialization is taken care of by API as well
  • Cons
    • Query string is hardcoding in code
    • User needs to deal with transport API directly
    • Need to add to common-utils and change whenever API or data model changes

High Level Option: Query DSL over Transport

Similarly as JOOQ, LINQ and other query DSL library, we can provide higher level query API than aforementioned options.

QueryDSL.init(transportService, clusterService ...)

// DQL
QueryDSL
   .sql()
   .from("index")
   .where("match(address, 'Seattle')")
   .select("name, age")
   .execute(format="csv")

QueryDSL
   .ppl()
   .search("index")
   .sort("name, age")
   .explain()

// DDL
QueryDSL
    .table("test").
    .column("age", "INT")
    .create();

// DML
QueryDSL
    .from("index")
    .where("age > 30")
    .delete();
  • Pros
    • No need to handle transport API setting and async callback
    • User writes query in fluent API without hardcoding query string
    • Internal implementation changes are hidden by DSL API
  • Cons
    • Need to add all code to common-utils module
    • Depends on our choice, the DSL maybe subject to change sometimes or frequently

Design: Permission Control on SQL Transport Action

As new transport action added, new action permission is required correspondingly. Here is an example from @penghuo that shows what's required from user side after SQL transport added:

PUT _plugins/_security/api/roles/sql_role
{
  "cluster_permissions": [
    **"cluster:admin/opensearch/sql"**
  ],
  "index_permissions": [{
    "index_patterns": [
      "sql-test-index"
    ],
    "allowed_actions": [
      "indices:data/read/search*",
      "indices:admin/mappings/get"
    ]
  }]
}

PUT _plugins/_security/api/rolesmapping/ppl_role
{
  "backend_roles" : [],
  "hosts" : [],
  "users" : ["test_user"]
}

Alternatively, user can create permission group for the new action permission as below:

PUT _plugins/_security/api/actiongroups/sql_access
{
  "allowed_actions": [
    "cluster:admin/opensearch/sql"
  ]
}

Questions: What' the impact on existing user who upgrade to latest version? Is it possible to avoid this and make user unaware?

@Yury-Fridlyand
Copy link
Collaborator

An idea for PPL use in Mid Level Option: JDBC API over Transport:

var properties = new Properties();
properties.setProperty("user", "looser");
properties.setProperty("password", "123");
properties.setProperty("lang", "PPL"); // <- The trick happens here
// Open connection
try (Connection conn = DriverManager.getConnection(
 "jdbc:opensearch://http://localhost:9200", properties)) {

  // Query metadata
  Statement stmt = conn.createStatement();

  // Query data
  List<ADResult> adResults = new ArrayList<>();
  ResultSet rs = stmt.executeQuery("search source=<index> | ... <rest PPL query>");
  while (rs.hasNext()) {
    ADResult result = new AdResult();
    result.setId(rs.getInt("ad_id");
    ...
    adResults.add(result);
  }
}

@lezzago
Copy link
Member

lezzago commented Apr 4, 2023

Coming from the Alerting plugin perspective, I believe the option, High Level Option: Query DSL over Transport, seems the best for other plugins to utilize the SQL and PPL query support. This allows plugin to not require in depth knowledge of SQL and PPL and lets the experts (the SQL plugin) deal with the lower level implementations.

The JDBC driver option and the low level option don't seem good coming from a consuming plugin perspective since they require a fair amount of duplicated setup work as well as detailed knowledge on how to use SQL queries. This logic should be ideally abstracted away from this query work and let plugins focus on their plugin features. Also by abstracting it to a high level design, the SQL plugin can ensure the right practices are enforced when doing SQL/PPL queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants