The Semantic Model Generator
is used to generate a semantic model for use in your Snowflake account.
You can generate semantic models through our Streamlit app, the command line, or directly in your Python code. Please complete the instructions in setup, then proceed to the instructions for your desired approach.
If you want to see what a semantic model looks like, skip to Examples.
We currently leverage credentials saved as environment variables.
A. To find your Account locator, please execute the following sql command in your account.
SELECT CURRENT_ACCOUNT_LOCATOR();
B. To find the SNOWFLAKE_HOST
for your
account, follow these instructions.
- Currently we recommend you to look under the
Account locator (legacy)
method of connection for better compatibility on API. - It typically follows format of:
<accountlocator>.<region>.<cloud>.snowflakecomputing.com
. Ensure that you omit thehttps://
prefix. SNOWFLAKE_HOST
is required if you are using the Streamlit app, but may not be required for the CLI tool depending on your Snowflake deployment. We would recommend setting it regardless.
We recommend setting these environment variables by creating a .env
file in the root directory of this repo. See the
examples in .env.example
for reference and proper syntax for .env
files.
However, if you would like to set these variables directly in your shell/Python environment,
- MacOS/Linux syntax:
export SNOWFLAKE_ROLE="<your-snowflake-role>"
export SNOWFLAKE_WAREHOUSE="<your-snowflake-warehouse>"
export SNOWFLAKE_USER="<your-snowflake-user>"
export SNOWFLAKE_ACCOUNT_LOCATOR="<your-snowflake-account-locator>"
export SNOWFLAKE_HOST="<your-snowflake-host>"
- Windows syntax:
set SNOWFLAKE_ROLE=<your-snowflake-role>
set SNOWFLAKE_WAREHOUSE=<your-snowflake-warehouse>
set SNOWFLAKE_USER=<your-snowflake-user>
set SNOWFLAKE_ACCOUNT_LOCATOR=<your-snowflake-account-locator>
set SNOWFLAKE_HOST=<your-snowflake-host>
- Python syntax:
import os
# Setting environment variables
os.environ['SNOWFLAKE_ROLE'] = '<your-snowflake-role>'
os.environ['SNOWFLAKE_WAREHOUSE'] = '<your-snowflake-warehouse>'
os.environ['SNOWFLAKE_USER'] = '<your-snowflake-user>'
os.environ['SNOWFLAKE_ACCOUNT_LOCATOR'] = '<your-snowflake-account-locator>'
os.environ['SNOWFLAKE_HOST'] = '<your-snowflake-host>'
Our semantic model generators currently support three types of authentication. If no SNOWFLAKE_AUTHENTICATOR
environment variable
is set, the default is snowflake
, which uses standard username/password support.
- Username and Password
Note: If you have MFA enabled, using this default authenticator should send a push notification to your device.
# no SNOWFLAKE_AUTHENTICATOR needed
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
# MacOS/Linux
export SNOWFLAKE_PASSWORD="<your-snowflake-password>"
# Windows
set SNOWFLAKE_PASSWORD=<your-snowflake-password>
# Python
os.environ['SNOWFLAKE_PASSWORD'] = '<your-snowflake-password>'
- Username/Password with MFA passcode
Using a passcode from your authenticator app:
SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
SNOWFLAKE_MFA_PASSCODE="<your-snowflake-mfa-passcode>" # if your authenticator app reads "123 456", fill in "123456" (No spaces)
# MacOS/Linux
export SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
export SNOWFLAKE_PASSWORD="<your-snowflake-password>"
export SNOWFLAKE_MFA_PASSCODE="<your-snowflake-mfa-passcode>"
# Windows
set SNOWFLAKE_AUTHENTICATOR=username_password_mfa
set SNOWFLAKE_PASSWORD=<your-snowflake-password>
set SNOWFLAKE_MFA_PASSCODE=<your-snowflake-mfa-passcode>
# Python
os.environ['SNOWFLAKE_AUTHENTICATOR'] = 'username_password_mfa'
os.environ['SNOWFLAKE_PASSWORD'] = '<your-snowflake-password>'
os.environ['SNOWFLAKE_MFA_PASSCODE'] = '<your-snowflake-mfa-passcode>'
Using a passcode embedded in the password:
SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
SNOWFLAKE_PASSWORD="<your-snowflake-password>"
SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD="true"
# MacOS/Linux
export SNOWFLAKE_AUTHENTICATOR="username_password_mfa"
export SNOWFLAKE_PASSWORD="<your-snowflake-password>"
export SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD="true"
# Windows
set SNOWFLAKE_AUTHENTICATOR=username_password_mfa
set SNOWFLAKE_PASSWORD=<your-snowflake-password>
set SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD=true
# Python
os.environ['SNOWFLAKE_AUTHENTICATOR'] = 'username_password_mfa'
os.environ['SNOWFLAKE_PASSWORD'] = '<your-snowflake-password>'
os.environ['SNOWFLAKE_MFA_PASSCODE_IN_PASSWORD'] = 'true'
- Single Sign-On (SSO) with Okta
# no SNOWFLAKE_PASSWORD needed
SNOWFLAKE_AUTHENTICATOR="externalbrowser"
# MacOS/Linux
export SNOWFLAKE_AUTHENTICATOR="externalbrowser"
# Windows
set SNOWFLAKE_AUTHENTICATOR=externalbrowser
# Python
os.environ['SNOWFLAKE_AUTHENTICATOR'] = 'externalbrowser'
We offer a convenient Streamlit app that supports creating semantic models from scratch as well as iterating on existing ones uploaded to a Snowflake stage.
To install dependencies for the Streamlit app, run
make setup_admin_app
Once installed, you can run the app using the provided Makefile target, or with your current version of Python manually specified:
# Make target
make run_admin_app
# directly
python3.11 -m streamlit run admin_apps/app.py
You may also generate a semantic model directly from the CLI. To do this, first install the CLI tool dependencies, which differ from the Streamlit app's dependencies.
Unlike the Streamlit route above, using the CLI assumes that you will manage your environment using poetry
and pyenv
for Python versions.
This has only been tested on MacOS/Linux.
- If you need brew, run
make install-homebrew
. - If you need pyenv,
make install-pyenv
andmake install-python-3.8
. - Run
make setup
to install all external dependencies into your Poetry environment. This will also installpoetry
if needed. - Spawn a shell in the virtual environment using
poetry shell
. This will activate your virtual environment.
You are now ready to generate semantic models via the CLI! The generation command uses the following syntax:
python -m semantic_model_generator.generate_model \
--base_tables "['<your-database-name-1>.<your-schema-name-1>.<your-base-table-or-view-name-1>','<your-database-name-2>.<your-schema-name-2>.<your-base-table-or-view-name-2>']" \
--semantic_model_name "<a-meaningful-semantic-model-name>" \
--snowflake_account="<your-snowflake-account>"
You may generate a semantic model for a given list of fully qualified tables following the {database}.{schema}.{table}
format. Each table in this list should be a physical table or a view present in your database.
All generated semantic models by default are saved either under semantic_model_generator/output_models
if running from
the root of this project or the current directory you're in.
You may also use the CLI tool to validate one of your semantic models. From inside your Poetry shell, run
python -m semantic_model_generator.validate_model \
--yaml_path="/path/to/your/model_yaml.yaml \
--snowflake_account="<your-account-name>"
You may also create/validate your semantic models from directly within your Python code. First, ensure that you have installed the Python package. Note, the version below should be the latest version under the dist/
directory.
pip install dist/*.whl
from semantic_model_generator.generate_model import generate_base_semantic_model_from_snowflake
BASE_TABLES = ['<your-database-name-1>.<your-schema-name-1>.<your-base-table-or-view-name-1>',
'<your-database-name-2>.<your-schema-name-2>.<your-base-table-or-view-name-2>']
SNOWFLAKE_ACCOUNT = "<your-snowflake-account>"
SEMANTIC_MODEL_NAME = "<a-meaningful-semantic-model-name>"
generate_base_semantic_model_from_snowflake(
base_tables=BASE_TABLES,
snowflake_account=SNOWFLAKE_ACCOUNT,
semantic_model_name=SEMANTIC_MODEL_NAME,
)
from semantic_model_generator.validate_model import validate_from_local_path
YAML_PATH = "/path/to/your/model_yaml.yaml"
SNOWFLAKE_ACCOUNT = "<your-snowflake-account>"
validate_from_local_path(
yaml_path=YAML_PATH,
snowflake_account=SNOWFLAKE_ACCOUNT
)
Due to context window as well as quality constraints, we currently limit the size of the generated semantic model to <30,980 tokens (~123,920 characters).
Please note sample values and verified queries is not counted into this token length constraints. You can include as much sample values or verified queries as you'd like with limiting the overall file to <1MB.
If your snowflake tables and comments do not have comments, we currently leverages cortex LLM function to auto-generate description suggestions. Those generation are suffixed with '__' and additional comment to remind you to confirm/modity the descriptions.
IMPORTANT: After generation, your YAML files will have a series of lines with # <FILL-OUT>
. Please take the time
to fill these out with your business context, or else subsequent validation of your model will fail.
By default, the generated semantic model will contain all columns from the provided tables/views. However, it's highly encouraged to only keep relevant columns and drop any unwanted columns from the generated semantic model.
In addition, consider adding the following elements to your semantic model:
- Logical columns for a given table/view that are expressions over physical columns.
- Example:
col1 - col2
could be theexpr
for a logical column.
- Example:
- Synonyms. Any additional synonyms for column names.
- Filters. Additional filters with their relevant
expr
.
If you have an example table in your account with the following DDL statements.
CREATE TABLE sales.public.sd_data
(
id SERIAL PRIMARY KEY,
dt DATETIME,
cat VARCHAR(255),
loc VARCHAR(255),
cntry VARCHAR(255),
chn VARCHAR(50),
amt DECIMAL(10, 2),
unts INT,
cst DECIMAL(10, 2)
);
Here is an example semantic model, with data elements automatically generated from this repo and filled out by a user.
# Name and description of the semantic model.
name: Sales Data
description: This semantic model can be used for asking questions over the sales data.
# A semantic model can contain one or more tables.
tables:
# A logical table on top of the 'sd_data' base table.
- name: sales_data
description: A logical table capturing daily sales information across different store locations and product categories.
# The fully qualified name of the base table.
base_table:
database: sales
schema: public
table: sd_data
# Dimension columns in the logical table.
dimensions:
- name: product_category
synonyms:
- "item_category"
- "product_type"
description: The category of the product sold.
expr: cat
data_type: NUMBER
unique: false
sample_values:
- "501"
- "544"
- name: store_country
description: The country where the sale took place.
expr: cntry
data_type: TEXT
unique: false
sample_values:
- "USA"
- "GBR"
- name: sales_channel
synonyms:
- "channel"
- "distribution_channel"
description: The channel through which the sale was made.
expr: chn
data_type: TEXT
unique: false
sample_values:
- "FB"
- "GOOGLE"
# Time dimension columns in the logical table.
time_dimensions:
- name: sale_timestamp
synonyms:
- "time_of_sale"
- "transaction_time"
description: The time when the sale occurred. In UTC.
expr: dt
data_type: TIMESTAMP
unique: false
# Measure columns in the logical table.
measures:
- name: sales_amount
synonyms:
- "revenue"
- "total_sales"
description: The total amount of money generated from the sale.
expr: amt
data_type: NUMBER
default_aggregation: sum
- name: sales_tax
description: The sales tax paid for this sale.
expr: amt * 0.0975
data_type: NUMBER
default_aggregation: sum
- name: units_sold
synonyms:
- "quantity_sold"
- "number_of_units"
description: The number of units sold in the transaction.
expr: unts
data_type: NUMBER
default_aggregation: sum
- name: cost
description: The cost of the product sold.
expr: cst
data_type: NUMBER
default_aggregation: sum
- name: profit
synonyms:
- "earnings"
- "net income"
description: The profit generated from a sale.
expr: amt - cst
data_type: NUMBER
default_aggregation: sum
# A table can define commonly used filters over it. These filters can then be referenced in user questions directly.
filters:
- name: north_america
synonyms:
- "North America"
- "N.A."
- "NA"
description: "A filter to restrict only to north american countries"
expr: cntry IN ('canada', 'mexico', 'usa')
In order to push a new build and release, follow the steps below. Note, only admins are allowed to push release/v
tags.
You should follow the setup commands from usage-cli to install poetry and create your environment.
- Checkout a new branch from main. You should name this branch
release/vYYYY-MM-DD
. - Bump the poetry:
poetry version patch
- increments0.1.x
to0.1.(x+1)
poetry version minor
- increments0.x.0
to0.(x+1).0
poetry version major
- incrementsx.0.0
to(x+1).0.0
- Update the
CHANGELOG.md
adding a relevant header for your version number along with a description of the changes made. - Run
make build
to create a new .whl file. - Push your files for approval.
- After approval, run
make release
which will cut a new release and attach the .whl file. - Merge in your pr.
- Note: If you
make release
does not trigger the GH action. Please delete the tag and push again.