This directory contains examples of Airflow DAGs that use apache-airflow-providers-databricks.
load_weather_data_into_dbsql.py is an example of the Airflow DAG that loads weather data for some cities into a Delta table using Databricks SQL Endpoint. DAG consists of following steps:
create_table
- creates a Delta table if it doesn't exist usingDatabricksSqlOperator
get_weather_data
- fetch weather data using calls to REST API and saves data to a local disk usingPythonOperator
upload_weather_data
- uploads data from local disk to Azure Blob Storage usingLocalFilesystemToWasbOperator
import_weather_data
- imports uploaded data withCOPY INTO
SQL command executed viaDatabricksCopyIntoOperator
.
To make it working in your environment you need to change following constants:
WASBS_CONN_ID
- name of a Azure Blob Storage connection.DATABRICKS_SQL_ENDPOINT_NAME
- name of a Databricks SQL endpoint that will be used for creation of the table and importing of data.DATABRICKS_CONN_ID
- name of a Databricks connection that will be used for authentication to Databricks workspace.DESTINATION_TABLE_NAME
- name of Delta table that will be created & loaded with data.LANDING_LOCATION_PREFIX
- name of directory inside the ADLS container.ADLS_CONTAINER_NAME
- name of ADLS container.ADLS_STORAGE_NAME
- name of ADLS storage account (without.dfs.core.windows.net
).