Skip to content

Commit

Permalink
Add support for materialization and call blocks (#318)
Browse files Browse the repository at this point in the history
* feat: support materialization, call statement blocks

* fix: reset sql depth after some jinja blocks

* chore: update primer refs
  • Loading branch information
tconbeer authored Nov 14, 2022
1 parent 2ee0746 commit 10eccbc
Show file tree
Hide file tree
Showing 8 changed files with 140 additions and 44 deletions.
64 changes: 33 additions & 31 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,24 @@ All notable changes to this project will be documented in this file.

### Formatting Changes + Bug Fixes

- sqlfmt now supports `{% materialization ... %}` and `{% call statement(...) %}` blocks ([#309](https://github.com/tconbeer/sqlfmt/issues/309)).
- sqlfmt now resets the SQL depth of a query after encountering an `{% endmacro %}`, `{% endtest %}`, `{% endcall %}`, or `{% endmaterialization %}` tag.
- fixed a bug where we could have unsafely run *black* against jinja that contained Python keywords and their safe alternatives (e.g., `return(return_())`).

## [0.13.0] - 2022-11-01

### Formatting Changes + Bug Fixes

- sqlfmt now supports `delete` statements and the associated keywords `using` and `returning` ([#281](https://github.com/tconbeer/sqlfmt/issues/281))
- sqlfmt now supports `grant` and `revoke` statements and all associated keywords ([#283](https://github.com/tconbeer/sqlfmt/issues/283))
- sqlfmt now supports `create function` statements and all associated keywords ([#282](https://github.com/tconbeer/sqlfmt/issues/282))
- sqlfmt now supports the `explain` keyword ([#280](https://github.com/tconbeer/sqlfmt/issues/280))
- sqlfmt now supports BigQuery typed table and struct definitions and literals, like `table<a int64, b bytes(5), c string>`
- sqlfmt now supports variables like `$foo` as ordinary identifiers
- sqlfmt now supports `delete` statements and the associated keywords `using` and `returning` ([#281](https://github.com/tconbeer/sqlfmt/issues/281)).
- sqlfmt now supports `grant` and `revoke` statements and all associated keywords ([#283](https://github.com/tconbeer/sqlfmt/issues/283)).
- sqlfmt now supports `create function` statements and all associated keywords ([#282](https://github.com/tconbeer/sqlfmt/issues/282)).
- sqlfmt now supports the `explain` keyword ([#280](https://github.com/tconbeer/sqlfmt/issues/280)).
- sqlfmt now supports BigQuery typed table and struct definitions and literals, like `table<a int64, b bytes(5), c string>`.
- sqlfmt now supports variables like `$foo` as ordinary identifiers.

### Features

- sqlfmt is now tested against Python 3.11 ([#242](https://github.com/tconbeer/sqlfmt/issues/242)). Previous versions of sqlfmt are also compatible
- sqlfmt is now tested against Python 3.11 ([#242](https://github.com/tconbeer/sqlfmt/issues/242)). Previous versions of sqlfmt are also compatible.
with Python 3.11. When installed in 3.11, sqlfmt no longer requires the `tomli` dependency.

## [0.12.0] - 2022-10-14
Expand All @@ -31,10 +33,10 @@ All notable changes to this project will be documented in this file.
- DDL and DML statements (`create`, `insert`, `grant`, etc.) will no longer be formatted ([#243](https://github.com/tconbeer/sqlfmt/issues/243)).
These statements were never supported by sqlfmt, and the existing algorithm produced bad formatting. Support for DDL and DML statements will be gradually added back in in future versions.
For more information, see the [tracking issue for DDL support](https://github.com/tconbeer/sqlfmt/issues/262).
- BigQuery typed array literals like `array<float64>[1, 2]` are now supported, and spaces will no longer be inserted around `<` and `>` ([#212](https://github.com/tconbeer/sqlfmt/issues/212))
- SparkSQL-specific keywords `tablesample`, `cluster by`, `distribute by`, `sort by`, and `lateral view` are now supported by the polyglot dialect ([#264](https://github.com/tconbeer/sqlfmt/issues/264))
- `pivot` and `unpivot` are now supported as word operators, and will have a space between the keyword and the following parentheses
- `values` is now supported as an unterminated keyword; tuples of values will be indented from the `values` keyword if they span more than one line ([#263](https://github.com/tconbeer/sqlfmt/issues/263))
- BigQuery typed array literals like `array<float64>[1, 2]` are now supported, and spaces will no longer be inserted around `<` and `>` ([#212](https://github.com/tconbeer/sqlfmt/issues/212)).
- SparkSQL-specific keywords `tablesample`, `cluster by`, `distribute by`, `sort by`, and `lateral view` are now supported by the polyglot dialect ([#264](https://github.com/tconbeer/sqlfmt/issues/264)).
- `pivot` and `unpivot` are now supported as word operators, and will have a space between the keyword and the following parentheses.
- `values` is now supported as an unterminated keyword; tuples of values will be indented from the `values` keyword if they span more than one line ([#263](https://github.com/tconbeer/sqlfmt/issues/263)).

## [0.11.1] - 2022-09-17

Expand All @@ -50,49 +52,49 @@ All notable changes to this project will be documented in this file.

### Breaking API Changes

- The `files` argument of `api.run` is now a `Collection[pathlib.Path]` that represents an exact collection of files to be formatted, instead of a list of paths to search for files. Use `api.get_matching_paths(paths, mode)` to return the set of exact paths expected by `api.run`
- The `files` argument of `api.run` is now a `Collection[pathlib.Path]` that represents an exact collection of files to be formatted, instead of a list of paths to search for files. Use `api.get_matching_paths(paths, mode)` to return the set of exact paths expected by `api.run`.

### Features

- sqlfmt will now display a progress bar for long runs ([#231](https://github.com/tconbeer/sqlfmt/pull/231)). You can disable this with the `--no-progressbar` option
- sqlfmt will now display a progress bar for long runs ([#231](https://github.com/tconbeer/sqlfmt/pull/231)). You can disable this with the `--no-progressbar` option.
- `api.run` now accepts an optional `callback` argument, which must be a `Callable[[Awaitable[SqlFormatResult]], None]`. Unless the `--single-process` option is used, the callback is executed after each file is formatted.
- sqlfmt can now be called as a python module, with `python -m sqlfmt`
- sqlfmt can now be called as a python module, with `python -m sqlfmt`.

### Formatting Changes + Bug Fixes

- adds more granularity to operator precedence and will merge lines more aggressively that start with high-precedence operators ([#200](https://github.com/tconbeer/sqlfmt/issues/200))
- improves the formatting of `between ... and ...`, especially in situations where the source includes a line break ([#207](https://github.com/tconbeer/sqlfmt/issues/207))
- improves the consistency of formatting long chains of operators that include parentheses ([#214](https://github.com/tconbeer/sqlfmt/issues/214))
- fixes a bug that caused unnecessary copying of the cache when using multiprocessing. Large projects should see dramatically faster (near-instant) runs once the cache is warm
- fixes a bug that could cause lines with long jinja tags to be one character over the line length limit, and could result in unstable formatting ([#237](https://github.com/tconbeer/sqlfmt/issues/237) - thank you [@nfcampos](https://github.com/nfcampos)!)
- fixes a bug that formatted array literals like they were indexing operations ([#235](https://github.com/tconbeer/sqlfmt/issues/235) - thank you [@nfcampos](https://github.com/nfcampos)!)
- adds more granularity to operator precedence and will merge lines more aggressively that start with high-precedence operators ([#200](https://github.com/tconbeer/sqlfmt/issues/200)).
- improves the formatting of `between ... and ...`, especially in situations where the source includes a line break ([#207](https://github.com/tconbeer/sqlfmt/issues/207)).
- improves the consistency of formatting long chains of operators that include parentheses ([#214](https://github.com/tconbeer/sqlfmt/issues/214)).
- fixes a bug that caused unnecessary copying of the cache when using multiprocessing. Large projects should see dramatically faster (near-instant) runs once the cache is warm.
- fixes a bug that could cause lines with long jinja tags to be one character over the line length limit, and could result in unstable formatting ([#237](https://github.com/tconbeer/sqlfmt/issues/237) - thank you [@nfcampos](https://github.com/nfcampos)!).
- fixes a bug that formatted array literals like they were indexing operations ([#235](https://github.com/tconbeer/sqlfmt/issues/235) - thank you [@nfcampos](https://github.com/nfcampos)!).

## [0.10.1] - 2022-08-05

### Features

- sqlfmt now supports the psycopg placeholders `%s` and `%(name)s` ([#198](https://github.com/tconbeer/sqlfmt/issues/198) - thank you [@snorkysnark](https://github.com/snorkysnark)!)
- sqlfmt now supports the psycopg placeholders `%s` and `%(name)s` ([#198](https://github.com/tconbeer/sqlfmt/issues/198) - thank you [@snorkysnark](https://github.com/snorkysnark)!).

### Formatting Changes + Bug Fixes

- sqlfmt now standardizes whitespace inside word tokens ([#201](https://github.com/tconbeer/sqlfmt/issues/201))
- `using` is now treated as a word operator. It gets a space before its brackets and merging with surrounding lines is now much improved ([#218](https://github.com/tconbeer/sqlfmt/issues/218) - thank you [@nfcampos](https://github.com/nfcampos)!)
- `within group` and `filter` are now treated like `over`, and the formatting of those aggregate clauses is improved ([#205](https://github.com/tconbeer/sqlfmt/issues/205))
- sqlfmt now standardizes whitespace inside word tokens ([#201](https://github.com/tconbeer/sqlfmt/issues/201)).
- `using` is now treated as a word operator. It gets a space before its brackets and merging with surrounding lines is now much improved ([#218](https://github.com/tconbeer/sqlfmt/issues/218) - thank you [@nfcampos](https://github.com/nfcampos)!).
- `within group` and `filter` are now treated like `over`, and the formatting of those aggregate clauses is improved ([#205](https://github.com/tconbeer/sqlfmt/issues/205)).

## [0.10.0] - 2022-08-02

### Features

- sqlfmt now supports ClickHouse. When run with the `--dialect clickhouse` option, sqlfmt will not lowercase names that could be case-sensitive in ClickHouse, like function names, aliases, etc. ([#193](https://github.com/tconbeer/sqlfmt/issues/193) - thank you [@Shlomixg](https://github.com/Shlomixg)!)
- sqlfmt now supports ClickHouse. When run with the `--dialect clickhouse` option, sqlfmt will not lowercase names that could be case-sensitive in ClickHouse, like function names, aliases, etc. ([#193](https://github.com/tconbeer/sqlfmt/issues/193) - thank you [@Shlomixg](https://github.com/Shlomixg)!).

### Formatting Changes + Bug Fixes

- formatting for chained boolean operators with complex expressions is now significantly improved ([#189](https://github.com/tconbeer/sqlfmt/issues/189) - thank you [@Rainymood](https://github.com/Rainymood)!)
- formatting for array indexing is now significantly improved ([#209](https://github.com/tconbeer/sqlfmt/issues/209)) and sqlfmt no longer inserts spaces between the `offset()` function and its brackets
- set operators (like `union`) are now formatted differently. They must be on their own line, and will not cause subsequent blocks to be indented ([#188](https://github.com/tconbeer/sqlfmt/issues/188) - thank you [@Rainymood](https://github.com/Rainymood)!)
- `select * except (...)` syntax is now explicitly supported, and formatting is improved. Support added for BigQuery and DuckDB star options: `except`, `exclude`, `replace`
- sqlfmt no longer inserts spaces between nested or repeated brackets, like `(())` or `()[]`
- a bug causing unstable formatting with long/multiline jinja tags has been fixed ([#175](https://github.com/tconbeer/sqlfmt/issues/175))
- formatting for chained boolean operators with complex expressions is now significantly improved ([#189](https://github.com/tconbeer/sqlfmt/issues/189) - thank you [@Rainymood](https://github.com/Rainymood)!).
- formatting for array indexing is now significantly improved ([#209](https://github.com/tconbeer/sqlfmt/issues/209)) and sqlfmt no longer inserts spaces between the `offset()` function and its brackets.
- set operators (like `union`) are now formatted differently. They must be on their own line, and will not cause subsequent blocks to be indented ([#188](https://github.com/tconbeer/sqlfmt/issues/188) - thank you [@Rainymood](https://github.com/Rainymood)!).
- `select * except (...)` syntax is now explicitly supported, and formatting is improved. Support added for BigQuery and DuckDB star options: `except`, `exclude`, `replace`.
- sqlfmt no longer inserts spaces between nested or repeated brackets, like `(())` or `()[]`.
- a bug causing unstable formatting with long/multiline jinja tags has been fixed ([#175](https://github.com/tconbeer/sqlfmt/issues/175)).

## [0.9.0] - 2022-06-02

Expand Down
8 changes: 8 additions & 0 deletions src/sqlfmt/actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,7 @@ def handle_jinja_block(
start_name: str,
end_name: str,
other_names: List[str],
end_reset_sql_depth: bool = False,
) -> None:
"""
An if block, like {% if cond %}code{% else %}other_code{% endif %}
Expand Down Expand Up @@ -469,6 +470,13 @@ def simplify_regex(pattern: str) -> str:
match=next_tag_match,
token_type=TokenType.JINJA_BLOCK_END,
)
if end_reset_sql_depth and analyzer.previous_node:
if previous_node:
analyzer.previous_node.open_brackets = (
previous_node.open_brackets.copy()
)
else:
analyzer.previous_node.open_brackets = []
break
# otherwise, this is an elif or else statement; we add it to
# the buffer, but with the previous node set to the node before
Expand Down
14 changes: 8 additions & 6 deletions src/sqlfmt/jinjafmt.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ class JinjaTag:
"""
A simple representation of a jinja tag.
"verb" is one of {set, do, for, if, elif, else, test, macro}
"verb" is one of {set, do, for, if, elif, else, test, macro, materialization, call}
For example, "{%- set my_var=4 %}" is split into it parts:
(opening_marker, verb, code, closing_marker) = ("{%-", "set", "my_var=4", "%}")
Expand All @@ -233,7 +233,7 @@ def __str__(self) -> str:
return self._multiline_str()
elif self.is_indented_multiline_tag:
return self.source_string
elif self.is_macro_or_test_def and self.is_blackened:
elif self.is_macro_like_def and self.is_blackened:
return self._remove_trailing_comma(self._basic_str())
else:
return self._basic_str()
Expand All @@ -243,9 +243,9 @@ def is_indented_multiline_tag(self) -> bool:
return self.code != "" and self.verb == "" and "\n" in self.code

@property
def is_macro_or_test_def(self) -> bool:
return "%" in self.opening_marker and (
self.verb == "macro " or self.verb == "test "
def is_macro_like_def(self) -> bool:
return "%" in self.opening_marker and any(
[self.verb == f"{v} " for v in ["macro", "test", "materialization", "call"]]
)

def _multiline_str(self) -> str:
Expand Down Expand Up @@ -291,7 +291,9 @@ def from_string(cls, source_string: str, depth: int) -> "JinjaTag":
closing_marker_len = 3 if source_string[-3] == "-" else 2
closing_marker = source_string[-closing_marker_len:]

verb_pattern = r"\s*(set|do|for|if|elif|else|test|macro)\s+"
verb_pattern = (
r"\s*(set|do|for|if|elif|else|test|macro|materialization|call)\s+"
)
verb_program = re.compile(verb_pattern, re.DOTALL | re.IGNORECASE)
verb_match = verb_program.match(source_string[opening_marker_len:])
if verb_match:
Expand Down
39 changes: 39 additions & 0 deletions src/sqlfmt/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ def __post_init__(self) -> None:
start_name="jinja_macro_block_start",
end_name="jinja_macro_block_end",
other_names=[],
end_reset_sql_depth=True,
),
),
Rule(
Expand All @@ -152,6 +153,7 @@ def __post_init__(self) -> None:
start_name="jinja_test_block_start",
end_name="jinja_test_block_end",
other_names=[],
end_reset_sql_depth=True,
),
),
Rule(
Expand All @@ -169,6 +171,7 @@ def __post_init__(self) -> None:
start_name="jinja_snapshot_block_start",
end_name="jinja_snapshot_block_end",
other_names=[],
end_reset_sql_depth=True,
),
),
Rule(
Expand All @@ -177,6 +180,42 @@ def __post_init__(self) -> None:
pattern=group(r"\{%-?\s*endsnapshot\s*-?%\}"),
action=actions.raise_sqlfmt_bracket_error,
),
Rule(
name="jinja_materialization_block_start",
priority=250,
pattern=group(r"\{%-?\s*materialization\s+\w+\s*,.*?-?%\}"),
action=partial(
actions.handle_jinja_block,
start_name="jinja_materialization_block_start",
end_name="jinja_materialization_block_end",
other_names=[],
end_reset_sql_depth=True,
),
),
Rule(
name="jinja_materialization_block_end",
priority=251,
pattern=group(r"\{%-?\s*endmaterialization\s*-?%\}"),
action=actions.raise_sqlfmt_bracket_error,
),
Rule(
name="jinja_call_block_start",
priority=260,
pattern=group(r"\{%-?\s*call\s+(noop_)?statement\(.*?\)\s*-?%\}"),
action=partial(
actions.handle_jinja_block,
start_name="jinja_call_block_start",
end_name="jinja_call_block_end",
other_names=[],
end_reset_sql_depth=True,
),
),
Rule(
name="jinja_call_block_end",
priority=261,
pattern=group(r"\{%-?\s*endcall\s*-?%\}"),
action=actions.raise_sqlfmt_bracket_error,
),
Rule(
name="jinja_statement_start",
priority=500,
Expand Down
10 changes: 5 additions & 5 deletions src/sqlfmt_primer/primer.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def get_projects() -> List[SQLProject]:
SQLProject(
name="gitlab",
git_url="https://github.com/tconbeer/gitlab-analytics-sqlfmt.git",
git_ref="e6fe591", # sqlfmt 6513e33
git_ref="2be3fb5", # sqlfmt b792a79
expected_changed=4,
expected_unchanged=2412,
expected_errored=1,
Expand All @@ -39,7 +39,7 @@ def get_projects() -> List[SQLProject]:
SQLProject(
name="rittman",
git_url="https://github.com/tconbeer/rittman_ra_data_warehouse.git",
git_ref="5cab7e0", # sqlfmt 3e0f900
git_ref="5d838dd", # sqlfmt b792a79
expected_changed=0,
expected_unchanged=307,
expected_errored=4, # true mismatching brackets
Expand Down Expand Up @@ -75,9 +75,9 @@ def get_projects() -> List[SQLProject]:
SQLProject(
name="dbt_utils",
git_url="https://github.com/tconbeer/dbt-utils.git",
git_ref="55c9199", # sqlfmt 3e0f900
expected_changed=1,
expected_unchanged=130,
git_ref="a70e75d", # sqlfmt b792a79
expected_changed=2,
expected_unchanged=129,
expected_errored=0,
sub_directory=Path(""),
),
Expand Down
19 changes: 19 additions & 0 deletions tests/unit_tests/test_actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,25 @@ def test_handle_jinja_for_block(jinja_analyzer: Analyzer) -> None:
assert jinja_analyzer.node_buffer[-1].token.type == TokenType.JINJA_BLOCK_END


def test_handle_jinja_call_block(default_analyzer: Analyzer) -> None:
source_string = """
select 1,
{% call statement() %}
select 2 from foo
{% endcall %}
2
""".strip()
query = default_analyzer.parse_query(source_string=source_string.lstrip())

assert query.lines[1].nodes[0].token.type == TokenType.JINJA_BLOCK_START
assert query.lines[-2].nodes[0].token.type == TokenType.JINJA_BLOCK_END

# ensure endcall block resets sql depth
outer_select = query.nodes[0]
assert query.nodes[-1].depth == (1, 0)
assert query.nodes[-1].open_brackets == [outer_select]


def test_handle_unsupported_ddl(default_analyzer: Analyzer) -> None:
source_string = """
create table foo (bar int);
Expand Down
Loading

0 comments on commit 10eccbc

Please sign in to comment.