From 8850c740980cb3fe2512120c9b4497646fef5bf8 Mon Sep 17 00:00:00 2001 From: mechatroner Date: Sun, 25 Aug 2019 16:34:48 -0400 Subject: [PATCH] readme --- README.md | 124 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 85 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index 935ec54..f0ccb66 100644 --- a/README.md +++ b/README.md @@ -93,59 +93,109 @@ Supported values: _"latin-1"_, _"utf-8"_ * This Sublime Text plugin is an adaptation of Vim's rainbow_csv [plugin](https://github.com/mechatroner/rainbow_csv) -# RBQL (RainBow Query Language) Description +# RBQL (Rainbow Query Language) Description + RBQL is a technology which provides SQL-like language that supports _SELECT_ and _UPDATE_ queries with Python or JavaScript expressions. +RBQL is distributed with CLI apps, text editor plugins, Python and JS libraries and can work in web browsers. [Official Site](https://rbql.org/) ### Main Features + * Use Python or JavaScript expressions inside _SELECT_, _UPDATE_, _WHERE_ and _ORDER BY_ statements -* Result set of any query immediately becomes a first-class table on it's own. -* Output entries appear in the same order as in input unless _ORDER BY_ is provided. -* Input csv/tsv spreadsheet may contain varying number of entries (but select query must be written in a way that prevents output of missing values) -* Works out of the box, no external dependencies. +* Result set of any query immediately becomes a first-class table on it's own +* Supports input tables with inconsistent number of fields per record +* Output records appear in the same order as in input unless _ORDER BY_ is provided +* Each record has a unique NR (line number) identifier +* Supports all main SQL keywords +* Supports aggregate functions and GROUP BY queries +* Provides some new useful query modes which traditional SQL engines do not have +* Supports both _TOP_ and _LIMIT_ keywords +* Supports user-defined functions (UDF) +* Works out of the box, no external dependencies + +#### Limitations: + +* RBQL doesn't support nested queries, but they can be emulated with consecutive queries +* Number of tables in all JOIN queries is always 2 (input table and join table), use consecutive queries to join 3 or more tables ### Supported SQL Keywords (Keywords are case insensitive) -* SELECT \[ TOP _N_ \] \[ DISTINCT [ COUNT ] \] -* UPDATE \[ SET \] +* SELECT +* UPDATE * WHERE * ORDER BY ... [ DESC | ASC ] -* [ [ STRICT ] LEFT | INNER ] JOIN +* [ LEFT | INNER ] JOIN +* DISTINCT * GROUP BY +* TOP _N_ * LIMIT _N_ -#### Keywords rules All keywords have the same meaning as in SQL queries. You can check them [online](https://www.w3schools.com/sql/default.asp) -But there are also two new keywords: _DISTINCT COUNT_ and _STRICT LEFT JOIN_: -* _DISTINCT COUNT_ is like _DISTINCT_, but adds a new column to the "distinct" result set: number of occurences of the entry, similar to _uniq -c_ unix command. -* _STRICT LEFT JOIN_ is like _LEFT JOIN_, but generates an error if any key in left table "A" doesn't have exactly one matching key in the right table "B". -Some other rules: -* _UPDATE SET_ is synonym to _UPDATE_, because in RBQL there is no need to specify the source table. -* _UPDATE_ has the same semantic as in SQL, but it is actually a special type of _SELECT_ query. -* _JOIN_ statements must have the following form: _ (/path/to/table.tsv | table_name ) ON ai == bj_ -* _TOP_ and _LIMIT_ have identical semantic. ### Special variables -| Variable Name | Variable Type | Variable Description | -|------------------------|---------------|--------------------------------------| +| Variable Name | Variable Type | Variable Description | +|--------------------------|---------------|--------------------------------------| | _a1_, _a2_,..., _a{N}_ |string | Value of i-th column | | _b1_, _b2_,..., _b{N}_ |string | Value of i-th column in join table B | | _NR_ |integer | Line number (1-based) | | _NF_ |integer | Number of fields in line | + +### UPDATE statement + +_UPDATE_ query produces a new table where original values are replaced according to the UPDATE expression, so it can also be considered a special type of SELECT query. This prevents accidental data loss from poorly written queries. +_UPDATE SET_ is synonym to _UPDATE_, because in RBQL there is no need to specify the source table. + + ### Aggregate functions and queries + RBQL supports the following aggregate functions, which can also be used with _GROUP BY_ keyword: -_COUNT()_, _MIN()_, _MAX()_, _SUM()_, _AVG()_, _VARIANCE()_, _MEDIAN()_ +_COUNT()_, _ARRAY_AGG()_, _MIN()_, _MAX()_, _SUM()_, _AVG()_, _VARIANCE()_, _MEDIAN()_ + +#### Limitations +Aggregate functions inside Python (or JS) expressions are not supported. Although you can use expressions inside aggregate functions. +E.g. `MAX(float(a1) / 1000)` - valid; `MAX(a1) / 1000` - invalid + + +### JOIN statements + +Join table B can be referenced either by it's file path or by it's name - an arbitary string which user should provide before executing the JOIN query. +RBQL supports _STRICT LEFT JOIN_ which is like _LEFT JOIN_, but generates an error if any key in left table "A" doesn't have exactly one matching key in the right table "B". + +#### Limitations + +* _JOIN_ statements must have the following form: _ (/path/to/table.tsv | table_name ) ON ai == bj_ + + +### SELECT EXCEPT statement -**Limitations:** -* Aggregate function are CASE SENSITIVE and must be CAPITALIZED. -* It is illegal to use aggregate functions inside Python (or JS) expressions. Although you can use expressions inside aggregate functions. - E.g. `MAX(float(a1) / 1000)` - legal; `MAX(a1) / 1000` - illegal. +SELECT EXCEPT can be used to select everything except specific columns. E.g. to select everything but columns 2 and 4, run: `SELECT * EXCEPT a2, a4` +Traditional SQL engines do not support this query mode. -### Examples of RBQL queries + +### SELECT DISTINCT COUNT statement + +RBQL supports _DISTINCT COUNT_ keyword which is like _DISTINCT_, but adds a new column to the "distinct" result set: number of occurrences of the entry, similar to _uniq -c_ unix command. +`SELECT DISTINCT COUNT a1` is equivalent to `SELECT a1, COUNT(a1) GROUP BY a1` + + +### UNNEST() operator +UNNEST(list) takes a list/array as an argument and repeats the output record multiple times - one time for each value from the list argument. +Example: `SELECT a1, UNNEST(a2.split(';'))` + + +### User Defined Functions (UDF) + +RBQL supports User Defined Functions +You can define custom functions and/or import libraries in two special files: +* `~/.rbql_init_source.py` - for Python +* `~/.rbql_init_source.js` - for JavaScript + + +## Examples of RBQL queries #### With Python expressions @@ -182,7 +232,8 @@ _COUNT()_, _MIN()_, _MAX()_, _SUM()_, _AVG()_, _VARIANCE()_, _MEDIAN()_ ### FAQ #### How does RBQL work? -Python module rbql.py parses RBQL query, creates a new python worker module, then imports and executes it. + +RBQL parses SQL-like user query, creates a new python or javascript worker module, then imports and executes it. Explanation of simplified Python version of RBQL algorithm by example. 1. User enters the following query, which is stored as a string _Q_: @@ -222,24 +273,19 @@ Explanation of simplified Python version of RBQL algorithm by example. ``` ./tmp_script.py < data.tsv > result.tsv ``` -Result set of the original query (`SELECT a3, int(a4) + 100, len(a2) WHERE a1 != 'SELL'`) is in the "result.tsv" file. -It is clear that this simplified version can only work with tab-separated files. +Result set of the original query (`SELECT a3, int(a4) + 100, len(a2) WHERE a1 != 'SELL'`) is in the "result.tsv" file. +Adding support of TOP/LIMIT keywords is trivial and to support "ORDER BY" we can introduce an intermediate array. #### Is this technology reliable? + It should be: RBQL scripts have only 1000 - 2000 lines combined (depending on how you count them) and there are no external dependencies. There is no complex logic, even query parsing functions are very simple. If something goes wrong RBQL will show an error instead of producing incorrect output, also there are currently 5 different warning types. -### Standalone CLI Apps - -You can also use two standalone RBQL Apps: with JavaScript and Python backends - -#### rbql-js -Installation: `$ npm i rbql` -Usage: `$ rbql-js --query "select a1, a2 order by a1" < input.tsv` - -#### rbql-py -Installation: `$ pip install rbql` -Usage: `$ rbql-py --query "select a1, a2 order by a1" < input.tsv` +### References +* [RBQL: Official Site](https://rbql.org/) +RBQL is integrated with Rainbow CSV extensions in [Vim](https://github.com/mechatroner/rainbow_csv), [VSCode](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv), [Sublime Text](https://packagecontrol.io/packages/rainbow_csv) editors. +* [RBQL in npm](https://www.npmjs.com/package/rbql): `$ npm install -g rbql` +* [RBQL in PyPI](https://pypi.org/project/rbql/): `$ pip install rbql`