You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A meta issue for identifying and extracting the reusable parts from SQL for EQL.
The new package (`org.elasticsearch.xpack.ql) will be a separate plugin that can be reused
by EQL and SQL.
The steps below are broken per concept however in practice, due to dependencies, some
steps might be tied together.
Parsing and AST
Tree package
Node, Source & co which form the foundation of all types of ASTs. Similar the rule package can be shared.
Utility Parsing classes
sql.parser and sql.utils for handling identifier and stream manipulation inside ANTLR
Backing exception hierarchy
The query package has a handful of exceptions which are used for parsing, analysis and verification.
Due to their specialized message ("error occurred at X") it's useful to share them.
Data types
sql.type contains mapping for most ES types.
SQL time specific types, such as Intervals and the utilities around precisions will remain in SQL.
This means likely DataType will transition from an enum into an interface.
Query plan / Logical Plan
The Logical AST and common logical plans such as Filter, EsRelation, OrderBy are common as are
the infrastructure classes LeafPlan and Unresolved**.
SQL specific plans, such as Pivot and With will remain inside SQL.
Expression tree
The sql.expression package is for the most part reusable as things are quite similar.
At first stage only the core expressions will be moved with a minimal set of functions, mainly around math.
Same goes for predicates which would be restricted to arithmetic and comparison operators.
While the expectation is in time more functions could be shared (string, datetime, etc...) that's outside the initial scope.
QueryDSL
The expression tied to the querydsl package which performs logical translation of the query. As this is specific to ES, I would expect most of it to be shareable, regardless of the xQL.
Rule, Analysis and Verification
This set of modules takes care of executing rules to identify patterns either for doing resolution or verification of created trees.
Rules
The rule package is fairly small and contained.
Extract Analyzer, Verifier and Optimizer rules
Extract the rules that are common into their own package. EQL and SQL could then import said rules (likely in the same declaration order) and add on top the ones specific in their own Analyzer or Verifier.
Currently it seems the reuse is around 60-70%, the difference being in the language differences.
At first glance, the Optimizer seems to have the highest reusability factor as it works mainly on expressions.
QueryFolder & QueryTranslator
These two classes wrap the query generation by creating the expression and then folding the logical nodes onto each other. They are connected to each other but in the end they are still based on modular rules.
This would be the last step necessary to share query generation.
Implementation details
To avoid friction in SQL development, the plan is to:
create the ql branch and mark it as a shared project for SQL.
move the basic infrastructure such as exception, parsing utilities, node and rules. For validation, move the tests as well.
Push a PR and once that passes and the SQL team confirms, push it into master.
keep iterating through each item, create appropriate PR then repeat.
As a significant chunk of classes from SQL will be moved, it's worth merging the changes back and have the SQL team ack them instead of developing separately and then doing a big merge.
The upside is that any unforeseen issues can be handled by the whole SQL team instead of just one person.
The text was updated successfully, but these errors were encountered:
A meta issue for identifying and extracting the reusable parts from SQL for EQL.
The new package (`org.elasticsearch.xpack.ql) will be a separate plugin that can be reused
by EQL and SQL.
The issue for EQL support is #49581
The steps below are broken per concept however in practice, due to dependencies, some
steps might be tied together.
Parsing and AST
Tree package
Node
,Source
& co which form the foundation of all types of ASTs. Similar therule
package can be shared.Utility Parsing classes
sql.parser
andsql.utils
for handling identifier and stream manipulation inside ANTLRBacking exception hierarchy
The query package has a handful of exceptions which are used for parsing, analysis and verification.
Due to their specialized message ("error occurred at X") it's useful to share them.
Data types
sql.type
contains mapping for most ES types.SQL time specific types, such as Intervals and the utilities around precisions will remain in SQL.
This means likely DataType will transition from an
enum
into aninterface
.Query plan / Logical Plan
The Logical AST and common logical plans such as
Filter
,EsRelation
,OrderBy
are common as arethe infrastructure classes
LeafPlan
andUnresolved**
.SQL specific plans, such as
Pivot
andWith
will remain inside SQL.Expression tree
The
sql.expression
package is for the most part reusable as things are quite similar.At first stage only the core expressions will be moved with a minimal set of functions, mainly around math.
Same goes for predicates which would be restricted to arithmetic and comparison operators.
While the expectation is in time more functions could be shared (string, datetime, etc...) that's outside the initial scope.
QueryDSL
The
expression
tied to thequerydsl
package which performs logical translation of the query. As this is specific to ES, I would expect most of it to be shareable, regardless of the xQL.Rule, Analysis and Verification
This set of modules takes care of executing rules to identify patterns either for doing resolution or verification of created trees.
Rules
The
rule
package is fairly small and contained.Extract Analyzer, Verifier and Optimizer rules
Extract the rules that are common into their own package. EQL and SQL could then import said rules (likely in the same declaration order) and add on top the ones specific in their own
Analyzer
orVerifier
.Currently it seems the reuse is around 60-70%, the difference being in the language differences.
At first glance, the
Optimizer
seems to have the highest reusability factor as it works mainly on expressions.QueryFolder & QueryTranslator
These two classes wrap the query generation by creating the expression and then folding the logical nodes onto each other. They are connected to each other but in the end they are still based on modular rules.
This would be the last step necessary to share query generation.
Implementation details
To avoid friction in SQL development, the plan is to:
ql
branch and mark it as a shared project for SQL.As a significant chunk of classes from SQL will be moved, it's worth merging the changes back and have the SQL team ack them instead of developing separately and then doing a big merge.
The upside is that any unforeseen issues can be handled by the whole SQL team instead of just one person.
The text was updated successfully, but these errors were encountered: