diff --git a/README.md b/README.md index 1324e7e..6e0a426 100644 --- a/README.md +++ b/README.md @@ -10,9 +10,142 @@ SELECT FROM dates WHERE - date IN {{ generate_date_strings('2020-01-01', '2020-01-10') }} + date IN {{ sdf_utils.generate_date_strings('2020-01-01', '2020-01-10') }} ``` *Note: SDF is still < v1, as such certain scenarios may result in unexpected behavior. Please follow the [contributing guidelines](./CONTRIBUTING.md) and create an issue in this repo if you find any bugs or issues.* For an in-depth guide on how to use Jinja macros in SDF, please see the Jinja section of [our official docs](https://docs.sdf.com/guide/macro-processing/jinja) + +## SDF Utils + +| Test Name | +| --------- | +| [`generate_date_values()`](#generate-date-values) | +| [`generate_date_strings()`](#generate-date-strings) | +| [`generate_integer_values(condition)`](#generate-integer-values) | +| [`group_by(condition)`](#group-by) | +| [`generate_surrogate_key()`](#generate-surrogate-key) | + + +#### Generate Date Values + +[Source Code](./macros/generate_date_values.jinja) + +Generates a SQL SELECT statement that produces a single column of dates. +The dates are generated for each day in the range from the 'from' date to the 'to' date, inclusive. + + _Parameters:_ + - `from`: The start date of the range, as a string in 'YYYY-MM-DD' format. + - `to`: The end date of the range, as a string in 'YYYY-MM-DD' format. + + _Returns:_ + A string representing a SQL SELECT statement. + + _Usage:_ + ```jinja + {{ sdf_utils.generate_date_values('2022-01-01', '2022-01-03') }} + ``` + _Output:_ + ```sql + (SELECT cast(value as date) as "date" FROM (VALUES '2022-01-01', '2022-01-02', '2022-01-03') as dates(value)) + ``` + +#### Generate Date Strings + +[Source Code](./macros/generate_date_values.jinja) + +Generates a SQL VALUES clause with a list of date strings. The date strings are generated for each day in the range from the 'from' date to the 'to' date, inclusive. + + _Parameters:_ + - `from`: The start date of the range, as a string in 'YYYY-MM-DD' format. + - `to`: The end date of the range, as a string in 'YYYY-MM-DD' format. + + _Returns:_ + A string representing a SQL VALUES clause with a list of date strings. + + _Usage:_ + ```jinja + {{ sdf_utils.generate_date_strings('2022-01-01', '2022-01-03') }} + ``` + _Output:_ + ```sql + (VALUES '2022-01-01', '2022-01-02', '2022-01-03') + ``` + +#### Generate Integer Values + +[Source Code](./macros/generate_integer_values.jinja) + +Generates a SQL VALUES clause with a list of incrementing integers. The 'from' int to the 'to' int are inclusive. + + _Parameters:_ + - `from`: The starting integer of the range + - `to`: The ending integer of the range + + _Returns:_ + A string representing a SQL VALUES clause with a list of incrementing integers. + + _Usage:_ + ```jinja + {{ sdf_utils.generate_integer_values(1,5) }} + ``` + _Output:_ + ```sql + (VALUES 1,2,3,4,5) + ``` + +#### Group By + +[Source Code](./macros/group_by.jinja) + +Builds a group by statement for fields 1...N + + _Parameters:_ + - `n`: The number of fields to group by + + _Returns:_ + A string representing a SQL GROUP BY clause with a list of incrementing integers. + + _Usage:_ + ```jinja + {{ sdf_utils.group_by(5) }} + ``` + + _Output:_ + ```sql + group by 1,2,3,4,5 + ``` + +#### Generate Surrogate Key + +[Source Code](./macros/generate_surrogate_key.jinja) + +Generates a surrogate key for a given list of fields. The surrogate key is a unique identifier for each record in a table. It is generated by concatenating the values of the given fields and applying the MD5 hash function to the result. The fields are cast to varchar type before concatenation. If a field value is null, it is replaced with the string '_jinja_surrogate_key_null_'. + + _Parameters:_ + - `fields`: A list of field names for which the surrogate key should be generated. + + _Returns:_ + A string representing a SQL GROUP BY clause with a list of incrementing integers. + + _Usage:_ + ```sql + SELECT + {{ sdf_utils.generate_surrogate_key(['column1', 'column2', 'column3']) }} AS surrogate_key, + ``` + + _Output:_ + ```sql + SELECT + md5( + coalesce(cast(column1 as varchar),'_jinja_surrogate_key_null_') + || '-' || coalesce(cast(column2 as varchar),'_jinja_surrogate_key_null_') + || '-' || coalesce(cast(column3 as varchar),'_jinja_surrogate_key_null_')) AS surrogate_key, + ``` + + _Why the concatenated dash (`|| '-' ||`) in the output?_ + + Without the delimiter, the inputs `firstName` = 'John Smith' and `lastName` = '', and firstName = 'John ' and lastName = 'Smith' would generate the same hash with generate_surrogate_key(["firstName", "lastName"]). The delimiter prevents this collision + + Thanks @Sophie from Obie for the delimiter improvement ^ <3 \ No newline at end of file