Skip to content

sdf-labs/utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDF Workspace Compiles

Official SDF Utils Library

This SDF library contains the standard utilities available in the default namespace in any SDF workspace. As such, macros defined in this workspace do not need to be prefixed with the workspace name, they can be referenced directly.

Here's an example:

WITH date_spine AS (
  {{ sdf_utils.date_spine(
          datepart = 'day',
          start_date_raw = "2019-01-01",    
          end_date_raw = "2019-12-31"
    )
  }}
)

select * from date_spine;

You'll find documentation for each specific macro below. If no Dialect section is specified, this means the implementation is dialect-agnostic and should work across all supported dialects.

Note: SDF is still < v1, as such certain scenarios may result in unexpected behavior. Please follow the contributing guidelines and create an issue in this repo if you find any bugs or issues.

For an in-depth guide on how to use Jinja macros in SDF, please see the Jinja section of our official docs

SDF Utils

Test Name
date_spine()
generate_date_values()
generate_integer_values(condition)
group_by(condition)
generate_surrogate_key()

Date Spine

Source Code

This macro generates a SQL SELECT statement that produces a single column of dates. The dates are generated for each day in the range from the 'from' date to the 'to' date, where from is inclusive and to is not inclusive.

Parameters: - datepart: The date part to increment. This can be one of the following: 'day', 'week', 'month', 'quarter', 'year'. Others like hour, minute, second are dialect-specific and supported in Snowflake for example. - start_date: The start date of the range, as a SQL Date object. The best dialect-agnostic way is to pass "CAST('2019-01-01' AS DATE)" for example. - end_date: The noninclusive end date of the range, as a SQL Date object. The best dialect-agnostic way is to pass "CAST('2019-01-01' AS DATE)" for example. - start_date_raw: The start date of the range, as a string in 'YYYY-MM-DD' format. - end_date_raw: The end date of the range, as a string in 'YYYY-MM-DD' format.

Returns: A SQL SELECT statement that produces a table with a single column of dates.

Usage:

WITH date_spine AS (
    {{ 
        sdf_utils.date_spine(
            datepart = 'day',
            start_date = "CAST('2019-01-01' AS DATE)",    
            end_date = "CAST('2019-12-31' AS DATE)"
        )
    }}
)

Alternatively, using the raw parameters:

WITH date_spine AS (
    {{ 
        sdf_utils.date_spine(
            datepart = 'day',
            start_date_raw = "2019-01-01",    
            end_date_raw = "2019-12-31"
        )
    }}
)

Output:

WITH date_spine AS (
    WITH
    intervals_list AS (
        SELECT 
            ARRAY_GENERATE_RANGE(
                0, 
                DATEDIFF(day, CAST('2019-01-01' AS DATE), CAST('2019-12-31' AS DATE)) + 1
            ) AS i
    )

    SELECT 
        i.VALUE AS i,
        DATEADD(day, i.VALUE, CAST('2019-01-01' AS DATE)) AS date_spine
    FROM intervals_list l, LATERAL FLATTEN(input => l.i) i
)

Dialects:

Dialect Supported
Snowflake
BigQuery
Redshift
Trino
Postgres

Generate Date Values

Source Code

Generates a SQL VALUES clause with a list of date strings. The date strings are generated for each day in the range from the 'from' date to the 'to' date, inclusive.

Parameters: - from: The start date of the range, as a string in 'YYYY-MM-DD' format. - to: The end date of the range, as a string in 'YYYY-MM-DD' format.

Returns: A string representing a SQL VALUES clause with a list of date strings.

Usage:

{{ sdf_utils.generate_date_values('2022-01-01', '2022-01-03') }}

Output:

(VALUES ('2022-01-01'), ('2022-01-02'), ('2022-01-03'))

Generate Integer Values

Source Code

Generates a SQL VALUES clause with a list of incrementing integers. The 'from' int to the 'to' int are inclusive.

Parameters: - from: The starting integer of the range - to: The ending integer of the range

Returns: A string representing a SQL VALUES clause with a list of incrementing integers.

Usage:

{{ sdf_utils.generate_integer_values(1,5) }}

Output:

(VALUES 1,2,3,4,5)

Group By

Source Code

Builds a group by statement for fields 1...N

Parameters: - n: The number of fields to group by

Returns: A string representing a SQL GROUP BY clause with a list of incrementing integers.

Usage:

{{ sdf_utils.group_by(5) }}

Output:

group by 1,2,3,4,5

Generate Surrogate Key

Source Code

Generates a surrogate key for a given list of fields. The surrogate key is a unique identifier for each record in a table. It is generated by concatenating the values of the given fields and applying the MD5 hash function to the result. The fields are cast to varchar type before concatenation. If a field value is null, it is replaced with the string 'jinja_surrogate_key_null'.

Parameters: - fields: A list of field names for which the surrogate key should be generated.

Returns: A string representing a SQL GROUP BY clause with a list of incrementing integers.

Usage:

SELECT
    {{ sdf_utils.generate_surrogate_key(['column1', 'column2', 'column3']) }} AS surrogate_key,

Output:

SELECT
    md5(
        coalesce(cast(column1 as varchar),'_jinja_surrogate_key_null_')
         || '-' || coalesce(cast(column2 as varchar),'_jinja_surrogate_key_null_')
         || '-' || coalesce(cast(column3 as varchar),'_jinja_surrogate_key_null_')) AS surrogate_key,

Why the concatenated dash (|| '-' ||) in the output?

Without the delimiter, the inputs firstName = 'John Smith' and lastName = '', and firstName = 'John ' and lastName = 'Smith' would generate the same hash with generate_surrogate_key(["firstName", "lastName"]). The delimiter prevents this collision

Thanks @sophieml from Obie for the delimiter improvement ^ <3

About

Official utilities library for SDF

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages