Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: respect config options in dbt_project.yml #255

Merged
merged 13 commits into from
Dec 28, 2016
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ var/
*.egg-info/
.installed.cfg
*.egg
logs/

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down Expand Up @@ -60,3 +61,6 @@ target/

#Ipython Notebook
.ipynb_checkpoints

#Emacs
*~
36 changes: 20 additions & 16 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
## dbt 0.6.1 (unreleased)

#### Bugfixes

- respect `config` options in profiles.yml ([#255](https://github.com/analyst-collective/dbt/pull/255))

#### Changes

- add `--debug` flag, replace calls to `print()` with a global logger ([#256](https://github.com/analyst-collective/dbt/pull/256))
Expand Down Expand Up @@ -62,7 +66,7 @@ Use `{{ target }}` to interpolate profile variables into your model definitions.

```sql
-- only use the last week of data in development
select * from events
select * from events

{% if target.name == 'dev' %}
where created_at > getdate() - interval '1 week'
Expand Down Expand Up @@ -227,7 +231,7 @@ As `dbt` has grown, we found this implementation to be a little unwieldy and har

The additions of automated testing and a more comprehensive manual testing process will go a long way to ensuring the future stability of dbt. We're going to get started on these tasks soon, and you can follow our progress here: https://github.com/analyst-collective/dbt/milestone/16 .

As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns:
As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns:



Expand All @@ -244,7 +248,7 @@ See https://github.com/analyst-collective/dbt/releases/tag/v0.5.1

## dbt release 0.5.1

### 0. tl;dr
### 0. tl;dr

1. Raiders of the Lost Archive -- version your raw data to make historical queries more accurate
2. Column type resolution for incremental models (no more `Value too long for character type` errors)
Expand Down Expand Up @@ -281,15 +285,15 @@ The archived tables will mirror the schema of the source tables they're generate

1. `valid_from`: The timestamp when this archived row was inserted (and first considered valid)
1. `valid_to`: The timestamp when this archived row became invalidated. The first archived record for a given `unique_key` has `valid_to = NULL`. When newer data is archived for that `unique_key`, the `valid_to` field of the old record is set to the `valid_from` field of the new record!
1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row).
1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row).

dbt models can be built on top of these archived tables. The most recent record for a given `unique_key` is the one where `valid_to` is `null`.

To run this archive process, use the command `dbt archive`. After testing and confirming that the archival works, you should schedule this process through cron (or similar).

### 2. Incremental column expansion https://github.com/analyst-collective/dbt/issues/175

Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field.
Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field.
In practice, this error looks like:

```
Expand Down Expand Up @@ -485,7 +489,7 @@ models:
post-hook: "insert into my_audit_table (model_name, run_at) values ({{this.name}}, getdate())"
```

Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks.
Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks.

Finally, note that the `grant` statement uses the (hopefully familiar) `{{this}}` syntax whereas the `insert` statement uses the `{{this.name}}` syntax. When DBT creates a model:
- A temp table is created
Expand Down Expand Up @@ -516,7 +520,7 @@ config:

![windows](https://pbs.twimg.com/profile_images/571398080688181248/57UKydQS.png)

---
---

dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections.

Expand All @@ -540,7 +544,7 @@ pip install -U dbt
# To run models
dbt run # same as before

# to dry-run models
# to dry-run models
dbt run --dry # previously dbt test

# to run schema tests
Expand All @@ -553,10 +557,10 @@ Previously, dbt calculated "new" incremental records to insert by querying for r

User 1 Session 1 Event 1 @ 12:00
User 1 Session 1 Event 2 @ 12:01
-- dbt run --
-- dbt run --
User 1 Session 1 Event 3 @ 12:02

In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!
In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!

With this release, you can now add a `unique_key` expression to an incremental model config. Records matching the `unique_key` will be `delete`d from the incremental table, then `insert`ed as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run.

Expand All @@ -570,7 +574,7 @@ sessions:

### 3. Run schema validations concurrently https://github.com/analyst-collective/dbt/issues/100

The `threads` run-target config now applies to schema validations too. Try it with `dbt test`
The `threads` run-target config now applies to schema validations too. Try it with `dbt test`

### 4. Connect to database over ssh https://github.com/analyst-collective/dbt/issues/93

Expand All @@ -588,10 +592,10 @@ warehouse:
dbname: my-db
schema: dbt_dbanin
threads: 8
ssh-host: ssh-host-name # <------ Add this line
ssh-host: ssh-host-name # <------ Add this line
run-target: dev
```

### Remove the model-defaults config https://github.com/analyst-collective/dbt/issues/111

The `model-defaults` config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition:
Expand Down Expand Up @@ -688,12 +692,12 @@ from users
where email not in (select email from __dbt__CTE__employees)
```

Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity.
Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity.


### 4. Feature: In-model configs https://github.com/analyst-collective/dbt/issues/88

Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file.
Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file.

An in-model-config looks like this:

Expand All @@ -703,7 +707,7 @@ An in-model-config looks like this:
-- python function syntax
{{ config(materialized="incremental", sql_where="id > (select max(id) from {{this}})") }}
-- OR json syntax
{{
{{
config({"materialized:" "incremental", "sql_where" : "id > (select max(id) from {{this}})"})
}}

Expand Down
14 changes: 10 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
.PHONY: test
.PHONY: test test-unit test-integration

changed_tests := `git status --porcelain | grep '^\(M\| M\|A\| A\)' | awk '{ print $$2 }' | grep '\/test_[a-zA-Z_\-\.]\+.py'`

test:
@echo "Test run starting..."
@docker-compose run test /usr/src/app/test/runner.sh
test: test-unit test-integration

test-unit:
@echo "Unit test run starting..."
tox -e unit-py27,unit-py35

test-integration:
@echo "Integration test run starting..."
@docker-compose run test /usr/src/app/test/integration.sh

test-new:
@echo "Test run starting..."
Expand Down
25 changes: 25 additions & 0 deletions dbt/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import os.path
import yaml

import dbt.project as project


def read_config(profiles_dir):
# TODO: validate profiles_dir
path = os.path.join(profiles_dir, 'profiles.yml')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing None to os.path.join causes a TypeError.

Previously, this file was assumed to be located at ~/.dbt/profiles.yml. We should either remove the default value for profiles_dir and insist that the caller passes a value, or we should coalesce None to the default filepath

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good call. since read_config is part of the private API of this namespace, and calls have to go through send_anonymous_usage_stats, i just removed the default None value.


if os.path.isfile(path):
with open(path, 'r') as f:
profile = yaml.safe_load(f)
return profile.get('config', {})

return {}


def send_anonymous_usage_stats(profiles_dir):
config = read_config(profiles_dir)

if config is not None and config.get("send_anonymous_usage_stats") == False:
return False

return True
14 changes: 2 additions & 12 deletions dbt/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,7 @@
import dbt.task.test as test_task
import dbt.task.archive as archive_task
import dbt.tracking


def is_opted_out(profiles_dir):
profiles = project.read_profiles(profiles_dir)

if profiles is None or profiles.get("config") is None:
return False
elif profiles['config'].get("send_anonymous_usage_stats") == False:
return True
else:
return False
import dbt.config as config

def main(args=None):
if args is None:
Expand All @@ -48,7 +38,7 @@ def handle(args):
initialize_logger(parsed.debug)

# this needs to happen after args are parsed so we can determine the correct profiles.yml file
if is_opted_out(parsed.profiles_dir):
if not config.send_anonymous_usage_stats(parsed.profiles_dir):
dbt.tracking.do_not_track()

res = run_from_args(parsed)
Expand Down
7 changes: 7 additions & 0 deletions test/integration.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

. /usr/src/app/test/setup.sh
workon dbt

cd /usr/src/app
tox -e integration-py27,integration-py35
14 changes: 0 additions & 14 deletions test/runner.sh

This file was deleted.

48 changes: 48 additions & 0 deletions test/unit/test_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import os
import unittest
import yaml

import dbt.config

if os.name == 'nt':
TMPDIR = 'c:/Windows/TEMP'
else:
TMPDIR = '/tmp'

class ConfigTest(unittest.TestCase):

def set_up_empty_config(self):
profiles_path = '{}/profiles.yml'.format(TMPDIR)

with open(profiles_path, 'w') as f:
f.write(yaml.dump({}))

def set_up_config_options(self, send_anonymous_usage_stats=False):
profiles_path = '{}/profiles.yml'.format(TMPDIR)

with open(profiles_path, 'w') as f:
f.write(yaml.dump({
'config': {
'send_anonymous_usage_stats': send_anonymous_usage_stats
}
}))

def tearDown(self):
profiles_path = '{}/profiles.yml'.format(TMPDIR)

try:
os.remove(profiles_path)
except:
pass

def test__implicit_opt_in(self):
self.set_up_empty_config()
self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR))

def test__explicit_opt_out(self):
self.set_up_config_options(send_anonymous_usage_stats=False)
self.assertFalse(dbt.config.send_anonymous_usage_stats(TMPDIR))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


def test__explicit_opt_in(self):
self.set_up_config_options(send_anonymous_usage_stats=True)
self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR))
32 changes: 24 additions & 8 deletions tox.ini
Original file line number Diff line number Diff line change
@@ -1,17 +1,33 @@
# Tox (http://tox.testrun.org/) is a tool for running tests
# in multiple virtualenvs. This configuration file will run the
# test suite on all supported python versions. To use it, "pip install tox"
# and then run "tox" from this directory.

[tox]
envlist = py27, py35
envlist = unit-py27, unit-py35, integration-py27, integration-py35

[testenv:unit-py27]
basepython = python2.7
commands = /bin/bash -c '$(which nosetests) -v test/unit'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv:unit-py35]
basepython = python3.5
commands = /bin/bash -c '$(which nosetests) -v test/unit'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv]
commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/unit test/integration/*'
[testenv:integration-py27]
basepython = python2.7
commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv:integration-py35]
basepython = python3.5
commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv:pywin]
basepython = {env:PYTHON:}\python.exe
Expand Down