Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readme, modify setup_grammar.py to show proper usage. #7

Merged
merged 1 commit into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# cratedb-sqlparse

`Antlr4` is a parser generator for reading, processing and executing text, there are several
target languages (Java, Python, JavaScript, Dart...) available. CrateDB uses the Java target.

The repository holds libraries/packages created from some of those available languages, so
far: `Python` and `JavaScript`.
More might be added if needed in the future.

These libraries allow you to parse Crate's SQL dialect without sending it to a CrateDB instance.

- `Python`: https://github.com/crate/cratedb-sqlparse/tree/master/cratedb_sqlparse_py
- `Javascript`: https://github.com/crate/cratedb-sqlparse/tree/master/cratedb_sqlparse_js

## Example:

```python
from cratedb_sqlparse import sqlparse

query = """
SELECT * FROM SYS.SHARDS;
INSERT INTO doc.tbl VALUES (1);
"""
statements = sqlparse(query)

select_query = statements[0]

print(select_query.query)
# 'SELECT * FROM SYS.SHARDS'
```

## Limitations

Listeners are not implemented, which means that you can only: Validate SQL syntax,
split queries and get some Tokens metadata from
the query, if you need some more information like what https://github.com/macbre/sql-metadata does (
e.g. get the columns of this query) open a new issue.

New features should preferably be implemented in all available targets.

## Adding a new target

The target language has to be available in antlr4,
see https://github.com/antlr/antlr4/blob/master/doc/targets.md.

Add the new target and paths to the build script, see `setup_grammar.py`.

There are several features that would need to be implemented, like case-insensitive input stream, native
exceptions as error listener, dollar strings and any new one. See past commits to see how they were
implemented in Python and Javascript, remember that CrateDB's SQLParser written in Java is the most
complete and the default reference.

## Building locally & using a different CrateDB version.

The generated parser is not uploaded to the repository since it's huge, to use the package locally or
to build a different version use the build script.

At the end of the build script `setup_grammar.py` the target and the versions can be modified.

The script needs two dependencies `pip install antlr4-python3-runtime requests` you can either
install those manually or use the `pyproject.toml` from the python target, both options can be used to build targets.

```python
if __name__ == '__main__':
version = '5.6.4'
target = Antlr4Target.python
download_cratedb_grammar(version)
compile_grammar(target)
patch_lexer(target)
set_version(target, version)
```
13 changes: 8 additions & 5 deletions setup_grammar.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,11 @@ def set_version(target: Antlr4Target, version: str):
with open(target_path / index_file, "a") as f:
f.write(f"{variable} = {version}\n")

# if __name__ == '__main__':
# download_cratedb_grammar('5.6.4')
# compile_grammar(Antlr4Target.js)
# patch_lexer(Antlr4Target.js)
set_version(Antlr4Target.js, '5.45.4')

if __name__ == '__main__':
version = '5.6.4'
target = Antlr4Target.python
download_cratedb_grammar(version)
compile_grammar(target)
patch_lexer(target)
set_version(target, version)