diff --git a/README.md b/README.md new file mode 100644 index 0000000..c795280 --- /dev/null +++ b/README.md @@ -0,0 +1,71 @@ +# cratedb-sqlparse + +`Antlr4` is a parser generator for reading, processing and executing text, there are several +target languages (Java, Python, JavaScript, Dart...) available. CrateDB uses the Java target. + +The repository holds libraries/packages created from some of those available languages, so +far: `Python` and `JavaScript`. +More might be added if needed in the future. + +These libraries allow you to parse Crate's SQL dialect without sending it to a CrateDB instance. + +- `Python`: https://github.com/crate/cratedb-sqlparse/tree/master/cratedb_sqlparse_py +- `Javascript`: https://github.com/crate/cratedb-sqlparse/tree/master/cratedb_sqlparse_js + +## Example: + +```python +from cratedb_sqlparse import sqlparse + +query = """ + SELECT * FROM SYS.SHARDS; + INSERT INTO doc.tbl VALUES (1); +""" +statements = sqlparse(query) + +select_query = statements[0] + +print(select_query.query) +# 'SELECT * FROM SYS.SHARDS' +``` + +## Limitations + +Listeners are not implemented, which means that you can only: Validate SQL syntax, +split queries and get some Tokens metadata from +the query, if you need some more information like what https://github.com/macbre/sql-metadata does ( +e.g. get the columns of this query) open a new issue. + +New features should preferably be implemented in all available targets. + +## Adding a new target + +The target language has to be available in antlr4, +see https://github.com/antlr/antlr4/blob/master/doc/targets.md. + +Add the new target and paths to the build script, see `setup_grammar.py`. + +There are several features that would need to be implemented, like case-insensitive input stream, native +exceptions as error listener, dollar strings and any new one. See past commits to see how they were +implemented in Python and Javascript, remember that CrateDB's SQLParser written in Java is the most +complete and the default reference. + +## Building locally & using a different CrateDB version. + +The generated parser is not uploaded to the repository since it's huge, to use the package locally or +to build a different version use the build script. + +At the end of the build script `setup_grammar.py` the target and the versions can be modified. + +The script needs two dependencies `pip install antlr4-python3-runtime requests` you can either +install those manually or use the `pyproject.toml` from the python target, both options can be used to build targets. + +```python +if __name__ == '__main__': + version = '5.6.4' + target = Antlr4Target.python + download_cratedb_grammar(version) + compile_grammar(target) + patch_lexer(target) + set_version(target, version) +``` diff --git a/setup_grammar.py b/setup_grammar.py index 72a837f..6f20f95 100644 --- a/setup_grammar.py +++ b/setup_grammar.py @@ -123,8 +123,11 @@ def set_version(target: Antlr4Target, version: str): with open(target_path / index_file, "a") as f: f.write(f"{variable} = {version}\n") -# if __name__ == '__main__': -# download_cratedb_grammar('5.6.4') -# compile_grammar(Antlr4Target.js) -# patch_lexer(Antlr4Target.js) -set_version(Antlr4Target.js, '5.45.4') + +if __name__ == '__main__': + version = '5.6.4' + target = Antlr4Target.python + download_cratedb_grammar(version) + compile_grammar(target) + patch_lexer(target) + set_version(target, version)