Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract common/reusable components from SQL for EQL #49773

Closed
6 tasks done
costin opened this issue Dec 2, 2019 · 4 comments
Closed
6 tasks done

Extract common/reusable components from SQL for EQL #49773

costin opened this issue Dec 2, 2019 · 4 comments
Assignees
Labels

Comments

@costin
Copy link
Member

costin commented Dec 2, 2019

A meta issue for identifying and extracting the reusable parts from SQL for EQL.
The new package (`org.elasticsearch.xpack.ql) will be a separate plugin that can be reused
by EQL and SQL.

The issue for EQL support is #49581

The steps below are broken per concept however in practice, due to dependencies, some
steps might be tied together.

Parsing and AST

Tree package

Node, Source & co which form the foundation of all types of ASTs. Similar the rule package can be shared.

Utility Parsing classes

sql.parser and sql.utils for handling identifier and stream manipulation inside ANTLR

Backing exception hierarchy

The query package has a handful of exceptions which are used for parsing, analysis and verification.
Due to their specialized message ("error occurred at X") it's useful to share them.

Data types

sql.type contains mapping for most ES types.
SQL time specific types, such as Intervals and the utilities around precisions will remain in SQL.
This means likely DataType will transition from an enum into an interface.

Query plan / Logical Plan

The Logical AST and common logical plans such as Filter, EsRelation, OrderBy are common as are
the infrastructure classes LeafPlan and Unresolved**.
SQL specific plans, such as Pivot and With will remain inside SQL.

Expression tree

The sql.expression package is for the most part reusable as things are quite similar.
At first stage only the core expressions will be moved with a minimal set of functions, mainly around math.
Same goes for predicates which would be restricted to arithmetic and comparison operators.
While the expectation is in time more functions could be shared (string, datetime, etc...) that's outside the initial scope.

QueryDSL

The expression tied to the querydsl package which performs logical translation of the query. As this is specific to ES, I would expect most of it to be shareable, regardless of the xQL.

Rule, Analysis and Verification

This set of modules takes care of executing rules to identify patterns either for doing resolution or verification of created trees.

Rules

The rule package is fairly small and contained.

Extract Analyzer, Verifier and Optimizer rules

Extract the rules that are common into their own package. EQL and SQL could then import said rules (likely in the same declaration order) and add on top the ones specific in their own Analyzer or Verifier.
Currently it seems the reuse is around 60-70%, the difference being in the language differences.
At first glance, the Optimizer seems to have the highest reusability factor as it works mainly on expressions.

QueryFolder & QueryTranslator

These two classes wrap the query generation by creating the expression and then folding the logical nodes onto each other. They are connected to each other but in the end they are still based on modular rules.

This would be the last step necessary to share query generation.

Implementation details

To avoid friction in SQL development, the plan is to:

  • notify the SQL team about upcoming development
  • merge any major PRs that affect code (such as Refactor named expression 3 #49693)
  • create the ql branch and mark it as a shared project for SQL.
  • move the basic infrastructure such as exception, parsing utilities, node and rules. For validation, move the tests as well.
  • Push a PR and once that passes and the SQL team confirms, push it into master.
  • keep iterating through each item, create appropriate PR then repeat.

As a significant chunk of classes from SQL will be moved, it's worth merging the changes back and have the SQL team ack them instead of developing separately and then doing a big merge.
The upside is that any unforeseen issues can be handled by the whole SQL team instead of just one person.

@costin costin added :Analytics/EQL EQL querying :Analytics/SQL SQL querying Meta labels Dec 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/EQL)

@costin
Copy link
Member Author

costin commented Jan 10, 2020

The first PR for extracting reusable components has landed in master:
#50815

@costin
Copy link
Member Author

costin commented Jan 23, 2020

The second PR (making data types pluggable) is ready for review: #51328

@costin
Copy link
Member Author

costin commented Jan 30, 2020

Marking this as done - the query translations bits will be covered by #49997

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants