Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]PPL Extension Mechanism #27

Open
YANG-DB opened this issue Nov 28, 2024 · 2 comments
Open

[RFC]PPL Extension Mechanism #27

YANG-DB opened this issue Nov 28, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Nov 28, 2024

Is your feature request related to a problem?

PPL is a language which has multiple commands, it has the ability to be used in different engines that have a large variety of use cases and functionality.

In order for PPL to utilize these capacities to their full extent , PPL has to be dynamically extended with specific commands which are either domain specific or execution specific.

For example:

  • Security / Observability should have a vocabularies that are distinct to these domains and allow the PPL user to freely use them in the statements.
  • Geospatial / Textual should also provide text/geospatial specific capability designed for usage within a text search based engine or a Geospatial search engine

Such vocabulary should not become part of the standard PPL language due to its specific use case or specific domain usage that is not relevant to other use cases.

What solution would you like?

The PPL grammar is to provide extension point to enable other plugin define their command name, paramaters, output schema.

Backend plugins can also extend resulting data types and functions, making them accessible through PPL data types and functions.

Do you have any additional context?

@YANG-DB
Copy link
Member Author

YANG-DB commented Nov 28, 2024

Extending ANTLR grammer

To dynamically extend a language defined using ANTLR without introducing new code, we can consider a modular grammar design. ANTLR allows to create a grammar in parts and merge these parts dynamically by including external grammar files at runtime.

  1. Grammar Inclusion
    ANTLR supports grammar file modularization through import statements, where new rules can be added via partial grammars. However, ANTLR generates code at compile time, so dynamic discovery of grammar components requires pre-compiling these fragments.

  2. Dynamic Loading with Delegated Parsers
    We could dynamically load additional rules or new syntactical constructs using delegated parsers. The core grammar would handle the base syntax, while extension grammars could handle specific constructs or operations dynamically.

OpenSearchPPLLexer.g4
OpenSearchPPLParser.g4

/extensions
  /grammar
    extensionParser1.g4         # Additional grammar rules for an extension
    extensionLexer1.g4          # Additional grammar rules for an extension
  /compiled
    Extension1Lexer.class # Compiled Lexer for extension1
    Extension1Parser.class
    Extension1BaseVisitor.class
    Extension1BaseListener.class

Workflow for Extension Implementors

Define Grammar Rules:
Extension implementors create new grammar files (e.g., extension1.g4).

Validate Grammar:
Implementors use a validation tool (provided by our utilities) to check:
validate-grammar extension1.g4

Grammar correctness.
Rule compliance with base grammar (e.g., naming conflicts, syntactical integrity, security).

antlr4 -o /compiled/ extensionParser1.g4
javac -d /compiled/ Extension1*.java

Main ANTLR Grammar (Root Folder)

The main grammar should include:
A Rule for Commands: A general rule that includes core commands and allows for extensible commands:

grammar MainGrammar;

query: command+; // Queries are composed of commands.

command
    : standard_command
    | extensible_command
    ;

standard_command
   : ...
   | ...
   ;

extensible_command
    : EXTENSIBLE_COMMAND // Placeholder for extension commands.
    ;
    
EXTENSIBLE_COMMAND
    : .+? // Match unknown commands dynamically (overridden by extensions).
    ;

Hook for Future Rules:
The extensible_command rule is a hook that expects extensions to replace or augment it with specific rules.

Extension Grammar Compatibility:

The EXTENSIBLE_COMMAND is a catch-all placeholder rule. When extensions are loaded, this rule is overridden to parse specific commands provided by the extensions.

Example Extension Grammar (/extensions/grammar/extensionParser1.g4)
The extension defines its additional command:

grammar ProjectView;

extensible_command
    : PROJECT_VIEW_COMMAND
    ;

PROJECT_VIEW_COMMAND
    : 'PROJECT' 'VIEW' 'AS' 'SELECT' '*' 'FROM' IDENTIFIER
    ;

How This Works

  1. The Base Grammar:

The base ANTLR grammar (OpenSearchPPLParser.g4) defines a flexible extensible rule (extensible_command).
It is designed to allow unknown constructs but doesn’t itself define specifics.

  1. Extension Grammar:

Extensions redefine or extend the extensible_command rule, introducing specific syntax like PROJECT VIEW AS SELECT * FROM table.

  1. Runtime Loading:

The ANTLR runtime combines the base grammar with extension grammars at runtime.

  1. Dynamic Behavior:

When the PROJECT VIEW AS SELECT * FROM table query is encountered, the dynamically loaded extension grammar handles the parsing.

@YANG-DB YANG-DB moved this to Design in PPL Commands Nov 28, 2024
@dblock dblock removed the untriaged label Dec 16, 2024
@dblock
Copy link
Member

dblock commented Dec 16, 2024

[Catch All Triage - 1, 2, 3]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Design
Development

No branches or pull requests

2 participants