Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs to clarify the way aliases are used in CTEs #5795

18 changes: 11 additions & 7 deletions website/docs/docs/build/custom-aliases.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,22 @@ id: "custom-aliases"

## Overview

When dbt runs a model, it will generally create a relation (either a `table` or a `view`) in the database. By default, dbt uses the filename of the model as the identifier for this relation in the database. This identifier can optionally be overridden using the [`alias`](/reference/resource-configs/alias) model configuration.
When dbt runs a model, it will generally create a relation (either a <Term id="table" /> or a <Term id="view" />) in the database, except in the case of an [ephemeral model](/docs/build/materializations), when it will create a <Term id="cte" /> for use in another model. By default, dbt uses the model's filename as the identifier for the relation or CTE it creates. This identifier can be overridden using the [`alias`](/reference/resource-configs/alias) model configuration.
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved

### Why alias model names?
The names of schemas and tables are effectively the "user interface" of your <Term id="data-warehouse" />. Well-named schemas and tables can help provide clarity and direction for consumers of this data. In combination with [custom schemas](/docs/build/custom-schemas), model aliasing is a powerful mechanism for designing your warehouse.

### Usage
The `alias` config can be used to change the name of a model's identifier in the database. The following <Term id="table" /> shows examples of database identifiers for models both with, and without, a supplied `alias`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked this line to remove the misleading <Term> component, since in this case we're not actually talking about a relation (which is what the term documents), we're just referencing the markup table that follows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Thank you

The file naming scheme that you use to organize your models may also interfere with your data platform's requirements for identifiers. For example, you might wish to namespace your files using a period (`.`), but your data platform's SQL dialect may interpret periods to indicate a separation between schema names and table names in identifiers, or it may forbid periods from being used at all in CTE identifiers. In cases like these, model aliasing can allow you to retain flexibility in the way you name your model files without violating your data platform's identifier requirements.

| Model | Config | Database Identifier |
| ----- | ------ | ------------------- |
| ga_sessions.sql | &lt;None&gt; | "analytics"."ga_sessions" |
| ga_sessions.sql | {{ config(alias='sessions') }} | "analytics"."sessions" |
### Usage
The `alias` config can be used to change the name of a model's identifier in the database. The following table shows examples of database identifiers for models both with and without a supplied `alias`, and with different materializations.

| Model | Config | Relation Type | Database Identifier |
| ----- | ------ | --------------| ------------------- |
| ga_sessions.sql | {{ config(materialization='view') }} | <Term id="view" /> | "analytics"."ga_sessions" |
| ga_sessions.sql | {{ config(materialization='view', alias='sessions') }} | <Term id="view" /> | "analytics"."sessions" |
| ga_sessions.sql | {{ config(materialization='ephemeral') }} | <Term id="cte" /> | "\__dbt\__cte\__ga_sessions" |
| ga_sessions.sql | {{ config(materialization='ephemeral', alias='sessions') }} | <Term id="cte" /> | "\__dbt\__cte\__sessions" |
Comment on lines +19 to +24
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure whether it's helpful to document the CTE identifier behavior here, so I understand if you'd prefer to keep this table focused on the alias rather than explaining the different behavior of ephemeral materialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful! Thank you


To configure an alias for a model, supply a value for the model's `alias` configuration parameter. For example:

Expand Down
3 changes: 2 additions & 1 deletion website/docs/docs/build/materializations.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,8 @@ When using the `table` materialization, your model is rebuilt as a <Term id="tab
* Use incremental models when your `dbt run`s are becoming too slow (i.e. don't start with incremental models)

### Ephemeral
`ephemeral` models are not directly built into the database. Instead, dbt will interpolate the code from this model into dependent models as a common <Term id="table" /> expression.
`ephemeral` models are not directly built into the database. Instead, dbt will interpolate the code from an ephemeral model into its dependent models using a <Term id="cte" />. You can control the identifier for this CTE using a [model alias](/docs/build/custom-aliases), but dbt will always prefix the model identifier with `__dbt__cte__`.
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved

* **Pros:**
* You can still write reusable logic
- Ephemeral models can help keep your <Term id="data-warehouse" /> clean by reducing clutter (also consider splitting your models across multiple schemas by [using custom schemas](/docs/build/custom-schemas)).
Expand Down
2 changes: 2 additions & 0 deletions website/docs/reference/resource-configs/alias.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,5 +116,7 @@ The standard behavior of dbt is:
* If a custom alias is _not_ specified, the identifier of the relation is the resource name (i.e. the filename).
* If a custom alias is specified, the identifier of the relation is the `{{ alias }}` value.

In the special case of an [ephemeral model](/materializations#ephemeral), dbt will always apply the prefix `__dbt__cte__` to the <Term id="cte" /> identifier. This means that if an alias is set on an ephemeral model, then its CTE identifier will be `__dbt__cte__{{ alias }}`, but if no alias is set then its identifier will be `__dbt__cte__{{ filename }}`.
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved

To learn more about changing the way that dbt generates a relation's `identifier`, read [Using Aliases](/docs/build/custom-aliases).

Loading