[RFC] Move PPL Spec & AST into this repository #23

YANG-DB · 2024-11-20T22:05:39Z

Is your feature request related to a problem?

The purpose of this RFC is to consolidate all the different PPL execution engines usage of the PPL specification (ANTLR) and the query to AST construction into a single repository.

This repository will maintain the most up-to-date vocabulary and documentations and will be used as a reference for any downstream engine to use.

A single grammar location enables simpler and consistent way to evolve the language and moves the responsibility of updating the downstream engine on the engine implementing the spec rather than the grammar & language maintainers.

Implemented solution

Our goal is to remove the PPL grammar and AST tree structure from each of the downstream engines and consolidate into a single artifact that will be used by any existing or future physical execution engine.

The single responsibility of the execution engine would be to translate PPL's AST tree into that engine logical or physical plan (in case where that engine has no logical layer such as OpenSearch).

In Spark PPL use case for example, we implemented a CatalystQueryPlanVisitor PPL AST logical plan traverser that will travers the PPL AST tree to transform the PPL logical plan into Catalyst logical plan that will be submitted to Spark to generate the subsequent physical plan and execute the query.

Advantages of the selected approach:

Today each engine has to support its own version of PPL ANTLR and documentation. They tend to diverse one from the other due each engine specification.
Once these components would be extracted into a single location the dependency would be immutable and force the engine implementations to follow the grammar more closely and in the unique cases where divergence is needed - adding a distinct UDF to facilitate the difference in the actual grammar.

Advantages:

reuse of existing PPL code that is tested and documented in one location and released in its own artifact.
simplify development while relying on well known and structured codebase
long term support maintaining the grammar and documentation in a single location simplifies the long term support and language evolution .
single place of maintenance by reusing the PPL logical model which relies on ppl ANTLR parser, we can use a single repository to maintain and develop the PPL language without the need to constantly merge changes from upstream .

The following diagram shows the high level architecture of the selected implementation solution :

The logical Architecture show the next artifacts:

Libraries:
- PPL ( the ppl core , protocol, parser & logical plan utils)
- SQL ( the SQL core , protocol, parser - depends on PPL for using the logical plan utils)
Drivers:
- PPL OpenSearch Driver (depends on OpenSearch core)
- PPL Spark Driver (depends on Spark core)
- PPL Prometheus Driver (directly translates PPL to PromQL )
- SQL OpenSearch Driver (depends on OpenSearch core)

Task :

Extract PPL logical component outside the SQL plugin into a (none-plugin) library - publish library to maven
Separate the PPL / SQL drivers inside the OpenSearch PPL client to better distinguish
Create a thin PPL client capable of interaction with the PPL Driver regardless of which driver (Spark , OpenSearch , Prometheus )

Sub-Tasks

This project will be composed of sub-tasks for an incremental and continues process:

Do you have any additional context?

ppl on spark
ppl on opensearch

PPL spark ANTLR grammar
PPL OpenSearch ANTLR grammar

ppl spark implementation issue

normanj-bitquill · 2024-11-28T19:36:34Z

Step 6 is just a link to this issue.

Can the dashboard use a Jar artifact from Maven, or would it need something else like an NPM module?

YANG-DB · 2024-11-28T20:21:36Z

Step 6 is just a link to this issue.

Can the dashboard use a Jar artifact from Maven, or would it need something else like an NPM module?

@normanj-bitquill
Thanks for your comments ...

I'll update issue umber 6 soon
Yes - the main idea is that any 3rd party using this artifact will have its own library in maven/NPM for it's specific dev language (Java, J/S ...)

dblock · 2024-12-09T17:11:22Z

[Catch All Triage - 1, 2, 3, 4]

YANG-DB added enhancement New feature or request untriaged labels Nov 20, 2024

acarbonetto mentioned this issue Dec 3, 2024

Add FILLNULL command in PPL (#3032) opensearch-project/sql#3075

Merged

dblock removed the untriaged label Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Move PPL Spec & AST into this repository #23

[RFC] Move PPL Spec & AST into this repository #23

YANG-DB commented Nov 20, 2024 •

edited

Loading

normanj-bitquill commented Nov 28, 2024

YANG-DB commented Nov 28, 2024 •

edited

Loading

dblock commented Dec 9, 2024

[RFC] Move PPL Spec & AST into this repository #23

[RFC] Move PPL Spec & AST into this repository #23

Comments

YANG-DB commented Nov 20, 2024 • edited Loading

Is your feature request related to a problem?

Implemented solution

Advantages of the selected approach:

Advantages:

Task :

Sub-Tasks

Do you have any additional context?

normanj-bitquill commented Nov 28, 2024

YANG-DB commented Nov 28, 2024 • edited Loading

dblock commented Dec 9, 2024

YANG-DB commented Nov 20, 2024 •

edited

Loading

YANG-DB commented Nov 28, 2024 •

edited

Loading