Skip to content
This repository has been archived by the owner on Mar 3, 2021. It is now read-only.

Commit

Permalink
Merge in Azure DW code (#1)
Browse files Browse the repository at this point in the history
* Fixed issue jacobm001#6.

This issue was caused by the `package_data` variable being setup incorrectly in `setup.py`. Apparently the file listing is **not** recursive, which caused the `sdist` files to lack several macro overrides needed to function.

* Added a note about the ODBC driver

As noted in issue jacobm001#5, the use of the `driver` variable was not particularly clear. I've added some info to try and explain that configuration better.

* altered mssql__create_view_as macros

Looks like this issue is being caused by the `mssql__create_view_as` macro. It works fine if the sql it's provided does not contain a CTE. If it does, sql server considers it a syntax error.

This commit removes the parenthesis wrapping the `{{ sql }}` portion.

* Fixed CTEs with insert into statements

SQL Server's `insert into` syntax isn't nearly as forgiving as in other databases. In the previous version I had created a cte as apart of the into statement that could later be referenced.

This worked so long as your source model didn't contain a CTE of its own. If it did, that put a CTE declaration inside another CTE which broke everything. I've taken a que from the dbt-sqlserver package and am now creating a "temporary" view to handle the issue.

Thanks @mikaelene for the example.

* incremented version number

* Update README.md

* updated .gitignore

* delt with empty column names

It appears that issue jacobm001#10 is caused by MSSQL not returning default column names for aggregate functions through the odbc library. When the `''` column name hits the agate library, an error is thrown.

To handle this behavior, I've overridden the class method, `get_result_from_cursor()`. The new method loops through all the column names and replaces any instances of `''` with `unnamed_column-{i}`. This should provide a simple work around that the user doesn't really see, but is also very easy for the user to avoid if it's undesired behaivor.

* incremented version

* Update README.md

* initial commit

* added catalog

* added work from https://github.com/norton120/dbt-azuredatawarehouse

* moved to explicit varchar size per https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-data-types

* handle empty string column name from scalar

* update README to match dockerfile

* added sample profile.yml

* clean up git conflict noise in readme

* known issues in readme

* added process_results class method and updated sql connection type

Co-authored-by: Jacob Mastel <[email protected]>
Co-authored-by: Isaac Chavez <[email protected]>
Co-authored-by: Ethan Knox <[email protected]>
  • Loading branch information
4 people authored Feb 6, 2020
1 parent 1b91eb9 commit 2bc09cb
Show file tree
Hide file tree
Showing 17 changed files with 359 additions and 332 deletions.
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# Compiled python modules.
*.pyc

.idea
# Setuptools distribution folder.
/dist/
/build/

# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
.vscode/settings.json
env/

## editor files
*.swp
.vscode/
84 changes: 49 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,61 @@
# dbt-mssql
# dbt-azuredw

dbt-mssql is a custom adapter for [dbt](https://github.com/fishtown-analytics/dbt) that adds support for Microsoft SQL Server versions 2008 R2 and later. pyodbc is used as the connection driver as that is what is [suggested by Microsoft](https://docs.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server). The adapter supports both windows auth, and specified user accounts.
dbt-azuredw is a custom adapter for [dbt](https://github.com/fishtown-analytics/dbt) that adds support for Azure SQL Data Warehouse.. pyodbc is used as the connection driver as that is what is [suggested by Microsoft](https://docs.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server). The adapter supports both windows auth, and specified user accounts.

dbt-mssql is currently in a beta release. It is passing all of the [dbt integration tests](https://github.com/fishtown-analytics/dbt-integration-tests/) on SQL Server 2008 R2. Considering Microsoft's legendary backwards compatibility, it should work on newer versions, but that testing will come in the near future.
dbt-azuredw is currently in a beta release.

## Connecting to SQL Server
## Connecting to Azure SQL Data Warehouse

Your user profile (located in `~/.dbt/profile`) will need an appropriate entry for your package.
## building your `profiles.yml`
Use the profiles.yml file included as a guide, updating with your creds. You can find all the creds you need under _Home > dbname (account/dbname) - Connection strings_ in Azure, along with the username and password for authentication.

Required parameters are:
## Getting Started
1. Run this to keep your profiles.yml from tracking:

- driver
- host
- database
- schema
- one of the login options:
- SQL Server authentication
- username
- password
- Windows Login
- windows_login: true
```
git update-index --skip-worktree profiles.yml
```

2. Update profiles.yml with your actual Azure Data Warehouse creds.
3. Build the docker image. From the repo root:

```
docker build . -t dbt-azure-dw
```

4. Run a bash shell in the container:

```
docker run -v $(PWD):/dbt_development/plugins -it dbt-azure-dw /bin/bash
```

you can then jump into `jaffle_shop (mssql)` and work on making it run against your ADW!

**Example profile:**

The example below configures a seperate dev and prod environment for the package, _foo_.
**Sample profiles.yml**

```yaml
foo:
target: dev
outputs:
dev:
type: mssql
driver: 'ODBC Driver 17 for SQL Server'
host: sqlserver.mydomain.com
database: dbt_test
schema: foo_dev
windows_login: True
prod:
type: mssql
driver: 'ODBC Driver 17 for SQL Server'
host: sqlserver.mydomain.com
database: dbt_test
schema: foo
username: dbt_user

default:
target: dev
outputs:
dev:
type: azuredw
driver: 'ODBC Driver 17 for SQL Server'
host: account.database.windows.net
database: dbt_test
schema: foo
username: dbt_user
password: super_secret_dbt_password
authentication: ActiveDirectoryPassword
```
## Known Issues
- At this time dbt-azuredw supports only `table`, `view` and `incremental` materializations (no `ephemeral`)
- Only top-level (model) CTEs are supported, ie CTEs in macros are not supported (this is a sqlserver thing)



## Jaffle Shop

Fishtown Analytic's [jaffle shop](https://github.com/fishtown-analytics/jaffle_shop) package is currently unsupported by this adapter. At the time of this writing, jaffle shop uses the `using()` join, and `group by [ordinal]` notation which is not supported in T-SQL. An alternative version has been forked by the author of dbt-mssql [here](https://github.com/jacobm001/jaffle_shop_mssql).
12 changes: 12 additions & 0 deletions dbt/adapters/azuredw/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from dbt.adapters.azuredw.connections import AzureDWConnectionManager
from dbt.adapters.azuredw.connections import AzureDWCredentials
from dbt.adapters.azuredw.impl import AzureDWAdapter

from dbt.adapters.base import AdapterPlugin
from dbt.include import azuredw


Plugin = AdapterPlugin(
adapter=AzureDWAdapter,
credentials=AzureDWCredentials,
include_path=azuredw.PACKAGE_PATH)
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,12 @@

import dbt.compat
import dbt.exceptions
from dbt.contracts.connection import Connection
from dbt.adapters.base import Credentials
from dbt.adapters.sql import SQLConnectionManager

from dbt.logger import GLOBAL_LOGGER as logger

MSSQL_CREDENTIALS_CONTRACT = {
AZUREDW_CREDENTIALS_CONTRACT = {
'type': 'object',
'additionalProperties': False,
'properties': {
Expand All @@ -33,37 +32,37 @@
'PWD': {
'type': 'string',
},
'windows_login': {
'type': 'boolean'
'authentication': {
'type': 'string',
'enum': ['ActiveDirectoryIntegrated','ActiveDirectoryMSI','ActiveDirectoryPassword','SqlPassword','TrustedConnection']
}
},
'required': ['driver','host', 'database', 'schema'],
'required': ['driver','host', 'database', 'schema','authentication'],
}


class MSSQLCredentials(Credentials):
SCHEMA = MSSQL_CREDENTIALS_CONTRACT
class AzureDWCredentials(Credentials):
SCHEMA = AZUREDW_CREDENTIALS_CONTRACT;
ALIASES = {
'user': 'UID'
, 'username': 'UID'
, 'pass': 'PWD'
, 'password': 'PWD'
, 'server': 'host'
, 'trusted_connection': 'windows_login'
}

@property
def type(self):
return 'mssql'
return 'azuredw'

def _connection_keys(self):
# return an iterator of keys to pretty-print in 'dbt debug'
# raise NotImplementedError
return ('server', 'database', 'schema', 'UID')
return ('server', 'database', 'schema', 'UID', 'authentication',)


class MSSQLConnectionManager(SQLConnectionManager):
TYPE = 'mssql'
class AzureDWConnectionManager(SQLConnectionManager):
TYPE = 'azuredw'

@contextmanager
def exception_handler(self, sql):
Expand Down Expand Up @@ -118,6 +117,7 @@ def add_query(self, sql, auto_begin=True, bindings=None,
if bindings is None:
cursor.execute(sql)
else:
logger.debug(f'bindings set as {bindings}')
cursor.execute(sql, bindings)

logger.debug("SQL status: %s in %0.2f seconds",
Expand All @@ -133,21 +133,28 @@ def open(cls, connection):
return connection

credentials = connection.credentials


MASKED_PWD=credentials.PWD[0] + ("*" * len(credentials.PWD))[:-2] + credentials.PWD[-1]
try:
con_str = []
con_str.append(f"DRIVER={{{credentials.driver}}}")
con_str.append(f"SERVER={credentials.host}")
con_str.append(f"Database={credentials.database}")

if credentials.windows_login == False:
if credentials.authentication == 'TrustedConnection':
con_str.append("trusted_connection=yes")
else:
con_str.append(f"AUTHENTICATION={credentials.authentication}")
con_str.append(f"UID={credentials.UID}")
con_str.append(f"PWD={credentials.PWD}")
else:
con_str.append(f"trusted_connection=yes")

con_str_concat = ';'.join(con_str)
logger.debug(f'Using connection string: {con_str_concat}')
con_str[-1] = f"PWD={MASKED_PWD}"
con_str_masked = ';'.join(con_str)

logger.debug(f'Using connection string: {con_str_masked}')
del con_str

handle = pyodbc.connect(con_str_concat, autocommit=True)

Expand Down Expand Up @@ -194,35 +201,28 @@ def add_commit_query(self):
pass
# return self.add_query('COMMIT', auto_begin=False)

def begin(self):
connection = self.get_thread_connection()

if dbt.flags.STRICT_MODE:
assert isinstance(connection, Connection)

if connection.transaction_open is True:
raise dbt.exceptions.InternalException(
'Tried to begin a new transaction on connection "{}", but '
'it already had one open!'.format(connection.get('name')))

self.add_begin_query()

connection.transaction_open = True
return connection

def commit(self):
connection = self.get_thread_connection()
if dbt.flags.STRICT_MODE:
assert isinstance(connection, Connection)

if connection.transaction_open is False:
raise dbt.exceptions.InternalException(
'Tried to commit transaction on connection "{}", but '
'it does not have one open!'.format(connection.name))

logger.debug('On {}: COMMIT'.format(connection.name))
self.add_commit_query()

connection.transaction_open = False
@classmethod
def get_result_from_cursor(cls, cursor):
data = []
column_names = []

if cursor.description is not None:
column_names = [col[0] for col in cursor.description]
## azure likes to give us empty string column names for scalar queries
for i, col in enumerate(column_names):
if col == '':
column_names[i] = f'Column{i+1}'
logger.debug(f'substituted empty column name in position {i} with `Column{i+1}`')
rows = cursor.fetchall()
data = cls.process_results(column_names, rows)
try:
return dbt.clients.agate_helper.table_from_data(data, column_names)
except Exception as e:
logger.debug(f'failure with rows: {rows}')
logger.debug(f'Failure with data: {data}')
logger.debug(f'Failure with column_names: {column_names}')
raise e

return connection
@classmethod
def process_results(cls, column_names, rows):
return [dict(zip(column_names, row)) for row in rows]
8 changes: 4 additions & 4 deletions dbt/adapters/mssql/impl.py → dbt/adapters/azuredw/impl.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
from dbt.adapters.sql import SQLAdapter
from dbt.adapters.mssql import MSSQLConnectionManager
from dbt.adapters.azuredw import AzureDWConnectionManager


class MSSQLAdapter(SQLAdapter):
ConnectionManager = MSSQLConnectionManager
class AzureDWAdapter(SQLAdapter):
ConnectionManager = AzureDWConnectionManager

@classmethod
def date_function(cls):
return 'get_date()'

@classmethod
def convert_text_type(cls, agate_table, col_idx):
return 'varchar(max)'
return 'varchar(8000)'
12 changes: 0 additions & 12 deletions dbt/adapters/mssql/__init__.py

This file was deleted.

File renamed without changes.
5 changes: 5 additions & 0 deletions dbt/include/azuredw/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

name: dbt_azuredw
version: 0.0.1

macro-paths: ["macros"]
Loading

0 comments on commit 2bc09cb

Please sign in to comment.