Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Move PPL Spec & AST into this repository #23

Open
YANG-DB opened this issue Nov 20, 2024 · 3 comments
Open

[RFC] Move PPL Spec & AST into this repository #23

YANG-DB opened this issue Nov 20, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Nov 20, 2024

Is your feature request related to a problem?

The purpose of this RFC is to consolidate all the different PPL execution engines usage of the PPL specification (ANTLR) and the query to AST construction into a single repository.

This repository will maintain the most up-to-date vocabulary and documentations and will be used as a reference for any downstream engine to use.

A single grammar location enables simpler and consistent way to evolve the language and moves the responsibility of updating the downstream engine on the engine implementing the spec rather than the grammar & language maintainers.

Implemented solution

Our goal is to remove the PPL grammar and AST tree structure from each of the downstream engines and consolidate into a single artifact that will be used by any existing or future physical execution engine.

The single responsibility of the execution engine would be to translate PPL's AST tree into that engine logical or physical plan (in case where that engine has no logical layer such as OpenSearch).

In Spark PPL use case for example, we implemented a CatalystQueryPlanVisitor PPL AST logical plan traverser that will travers the PPL AST tree to transform the PPL logical plan into Catalyst logical plan that will be submitted to Spark to generate the subsequent physical plan and execute the query.

Advantages of the selected approach:

Today each engine has to support its own version of PPL ANTLR and documentation. They tend to diverse one from the other due each engine specification.
Once these components would be extracted into a single location the dependency would be immutable and force the engine implementations to follow the grammar more closely and in the unique cases where divergence is needed - adding a distinct UDF to facilitate the difference in the actual grammar.

Advantages:

  • reuse of existing PPL code that is tested and documented in one location and released in its own artifact.
  • simplify development while relying on well known and structured codebase
  • long term support maintaining the grammar and documentation in a single location simplifies the long term support and language evolution .
  • single place of maintenance by reusing the PPL logical model which relies on ppl ANTLR parser, we can use a single repository to maintain and develop the PPL language without the need to constantly merge changes from upstream .

The following diagram shows the high level architecture of the selected implementation solution :

ppl logical architecture

The logical Architecture show the next artifacts:

  • Libraries:

    • PPL ( the ppl core , protocol, parser & logical plan utils)
    • SQL ( the SQL core , protocol, parser - depends on PPL for using the logical plan utils)
  • Drivers:

    • PPL OpenSearch Driver (depends on OpenSearch core)
    • PPL Spark Driver (depends on Spark core)
    • PPL Prometheus Driver (directly translates PPL to PromQL )
    • SQL OpenSearch Driver (depends on OpenSearch core)

Task :

  • Extract PPL logical component outside the SQL plugin into a (none-plugin) library - publish library to maven
  • Separate the PPL / SQL drivers inside the OpenSearch PPL client to better distinguish
  • Create a thin PPL client capable of interaction with the PPL Driver regardless of which driver (Spark , OpenSearch , Prometheus )

Sub-Tasks

This project will be composed of sub-tasks for an incremental and continues process:


Do you have any additional context?

ppl on spark
ppl on opensearch


PPL spark ANTLR grammar
PPL OpenSearch ANTLR grammar

ppl spark implementation issue

@normanj-bitquill
Copy link

Step 6 is just a link to this issue.

Can the dashboard use a Jar artifact from Maven, or would it need something else like an NPM module?

@YANG-DB
Copy link
Member Author

YANG-DB commented Nov 28, 2024

Step 6 is just a link to this issue.

Can the dashboard use a Jar artifact from Maven, or would it need something else like an NPM module?

@normanj-bitquill
Thanks for your comments ...

  • I'll update issue umber 6 soon
  • Yes - the main idea is that any 3rd party using this artifact will have its own library in maven/NPM for it's specific dev language (Java, J/S ...)

@dblock
Copy link
Member

dblock commented Dec 9, 2024

[Catch All Triage - 1, 2, 3, 4]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants