Skip to content

Commit

Permalink
Added documenation for all SDF utils directly in the README
Browse files Browse the repository at this point in the history
  • Loading branch information
eliasdefaria committed Jul 6, 2024
1 parent d211cb2 commit 7d2fff8
Showing 1 changed file with 134 additions and 1 deletion.
135 changes: 134 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,142 @@ SELECT
FROM
dates
WHERE
date IN {{ generate_date_strings('2020-01-01', '2020-01-10') }}
date IN {{ sdf_utils.generate_date_strings('2020-01-01', '2020-01-10') }}
```

*Note: SDF is still < v1, as such certain scenarios may result in unexpected behavior. Please follow the [contributing guidelines](./CONTRIBUTING.md) and create an issue in this repo if you find any bugs or issues.*

For an in-depth guide on how to use Jinja macros in SDF, please see the Jinja section of [our official docs](https://docs.sdf.com/guide/macro-processing/jinja)

## SDF Utils

| Test Name |
| --------- |
| [`generate_date_values()`](#generate-date-values) |
| [`generate_date_strings()`](#generate-date-strings) |
| [`generate_integer_values(condition)`](#generate-integer-values) |
| [`group_by(condition)`](#group-by) |
| [`generate_surrogate_key()`](#generate-surrogate-key) |


#### Generate Date Values

[Source Code](./macros/generate_date_values.jinja)

Generates a SQL SELECT statement that produces a single column of dates.
The dates are generated for each day in the range from the 'from' date to the 'to' date, inclusive.

_Parameters:_
- `from`: The start date of the range, as a string in 'YYYY-MM-DD' format.
- `to`: The end date of the range, as a string in 'YYYY-MM-DD' format.

_Returns:_
A string representing a SQL SELECT statement.

_Usage:_
```jinja
{{ sdf_utils.generate_date_values('2022-01-01', '2022-01-03') }}
```
_Output:_
```sql
(SELECT cast(value as date) as "date" FROM (VALUES '2022-01-01', '2022-01-02', '2022-01-03') as dates(value))
```

#### Generate Date Strings

[Source Code](./macros/generate_date_values.jinja)

Generates a SQL VALUES clause with a list of date strings. The date strings are generated for each day in the range from the 'from' date to the 'to' date, inclusive.

_Parameters:_
- `from`: The start date of the range, as a string in 'YYYY-MM-DD' format.
- `to`: The end date of the range, as a string in 'YYYY-MM-DD' format.

_Returns:_
A string representing a SQL VALUES clause with a list of date strings.

_Usage:_
```jinja
{{ sdf_utils.generate_date_strings('2022-01-01', '2022-01-03') }}
```
_Output:_
```sql
(VALUES '2022-01-01', '2022-01-02', '2022-01-03')
```

#### Generate Integer Values

[Source Code](./macros/generate_integer_values.jinja)

Generates a SQL VALUES clause with a list of incrementing integers. The 'from' int to the 'to' int are inclusive.

_Parameters:_
- `from`: The starting integer of the range
- `to`: The ending integer of the range

_Returns:_
A string representing a SQL VALUES clause with a list of incrementing integers.

_Usage:_
```jinja
{{ sdf_utils.generate_integer_values(1,5) }}
```
_Output:_
```sql
(VALUES 1,2,3,4,5)
```

#### Group By

[Source Code](./macros/group_by.jinja)

Builds a group by statement for fields 1...N

_Parameters:_
- `n`: The number of fields to group by

_Returns:_
A string representing a SQL GROUP BY clause with a list of incrementing integers.

_Usage:_
```jinja
{{ sdf_utils.group_by(5) }}
```

_Output:_
```sql
group by 1,2,3,4,5
```

#### Generate Surrogate Key

[Source Code](./macros/generate_surrogate_key.jinja)

Generates a surrogate key for a given list of fields. The surrogate key is a unique identifier for each record in a table. It is generated by concatenating the values of the given fields and applying the MD5 hash function to the result. The fields are cast to varchar type before concatenation. If a field value is null, it is replaced with the string '_jinja_surrogate_key_null_'.

_Parameters:_
- `fields`: A list of field names for which the surrogate key should be generated.

_Returns:_
A string representing a SQL GROUP BY clause with a list of incrementing integers.

_Usage:_
```sql
SELECT
{{ sdf_utils.generate_surrogate_key(['column1', 'column2', 'column3']) }} AS surrogate_key,
```

_Output:_
```sql
SELECT
md5(
coalesce(cast(column1 as varchar),'_jinja_surrogate_key_null_')
|| '-' || coalesce(cast(column2 as varchar),'_jinja_surrogate_key_null_')
|| '-' || coalesce(cast(column3 as varchar),'_jinja_surrogate_key_null_')) AS surrogate_key,
```

_Why the concatenated dash (`|| '-' ||`) in the output?_

Without the delimiter, the inputs `firstName` = 'John Smith' and `lastName` = '', and firstName = 'John ' and lastName = 'Smith' would generate the same hash with generate_surrogate_key(["firstName", "lastName"]). The delimiter prevents this collision

Thanks @Sophie from Obie for the delimiter improvement ^ <3

0 comments on commit 7d2fff8

Please sign in to comment.