Skip to content

lukas-mi/vertica-query-analyser

Repository files navigation

SQL analyser for Vertica. A wrapper of uber/queryparser.

Features:

  • Provides the following information (check out uber/queryparser for more detail):
    • What columns were accessed and in what clauses (SORT, GROUP, etc.)
    • What are the join columns (JOIN ON)
    • What tables were accessed
    • Table lineage
  • Built in catalog usage
  • Simple json output
  • HTTP server mode

Build:

  • stack setup
  • stack build

Docker:

Usage:

  • vq-analyser catalog_file [query_file] - specify catalog file and optionally query file, if query file is not specified then stdin is used.
  • vq-analyser catalog_file -d directory - specify catalog file and directory in which all *.sql files will be analysed. Will create files after the original ones: <original>.sql.json on analysis success and <original>.sql.txt on failure.
  • vq-analyser catalog_file -s -p port - specify catalog file and port on which to run http server
    • API
      • 200 and application/json on success
      • 400 and text/plain when input is incorrect (unparsable/unresolvable)
      • 404 and text/plain when resource path is specified
      • 405 and text/plain when not using POST
      • 500 and text/plain on server internal error
      • 503 and text/plain on server processing timeout

Examples: catalog.sql:

CREATE SCHEMA demo;
CREATE TABLE demo.foo (a INT, b INT);
CREATE TABLE demo.bar AS SELECT * FROM demo.foo;

Commands (pipes to stedolan/jq to format the json):

  • Single file - vq-analyser catalog.sql queries.sql | jq '.'
    • Using docker - docker run -it -v /host/path:/container/path lukasmi/vertica-query-analyser /container/path/catalog.sql /container/path/queries.sql | jq '.'
  • Using http server mode:
    • Start the server - vq-analyser catalog.sql -s -p 3000
      • Using docker - docker run -it -p 3000:3000 -v /host/path:/container/path lukasmi/vertica-query-analyser /container/path/catalog.sql -s -p 3000
    • Issue POST - cat queries.sql | curl localhost:3000 --data-binary @- | jq '.'
  1. Column resolving
    • queries.sql:
    SELECT * FROM demo.bar WHERE a IS NOT NULL ORDER BY a;
    
    • output:
    [
      {
        "statement": "SELECT",
        "lineage": [],
        "columns": [
          {
            "clause": "ORDER",
            "column": "demo.bar.a"
          },
          {
            "clause": "SELECT",
            "column": "demo.bar.a"
          },
          {
            "clause": "WHERE",
            "column": "demo.bar.a"
          },
          {
            "clause": "SELECT",
            "column": "demo.bar.b"
          }
        ],
        "tables": [
          "demo.bar"
        ],
        "joins": []
      }
    ]
  2. Joins
    • queries.sql:
    SELECT foo.a, foo.b
    FROM demo.foo foo
    JOIN demo.bar bar
    ON foo.a = bar.a;
    • output:
    [
      {
        "statement": "SELECT",
        "lineage": [],
        "columns": [
          {
            "clause": "JOIN",
            "column": "demo.bar.a"
          },
          {
            "clause": "JOIN",
            "column": "demo.foo.a"
          },
          {
            "clause": "SELECT",
            "column": "demo.foo.a"
          },
          {
            "clause": "SELECT",
            "column": "demo.foo.b"
          }
        ],
        "tables": [
          "demo.bar",
          "demo.foo"
        ],
        "joins": [
          {
            "left": "demo.bar.a",
            "right": "demo.foo.a"
          }
        ]
      }
    ]
  3. Lineage
    • queries.sql:
    CREATE TABLE demo.baz AS
        SELECT foo.a, foo.b
        FROM demo.foo foo
        JOIN demo.bar bar
        ON foo.a = bar.a;
    • output
    [
      {
        "statement": "CREATE_TABLE",
        "lineage": [
          {
            "decedent": "demo.baz",
            "ancestors": [
              "demo.bar",
              "demo.foo"
            ]
          }
        ],
        "columns": [
          {
            "clause": "JOIN",
            "column": "demo.bar.a"
          },
          {
            "clause": "JOIN",
            "column": "demo.foo.a"
          },
          {
            "clause": "SELECT",
            "column": "demo.foo.a"
          },
          {
            "clause": "SELECT",
            "column": "demo.foo.b"
          }
        ],
        "tables": [
          "demo.bar",
          "demo.baz",
          "demo.foo"
        ],
        "joins": [
          {
            "left": "demo.bar.a",
            "right": "demo.foo.a"
          }
        ]
      }
    ]