Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project profile combined #973

Merged
merged 6 commits into from
Sep 17, 2018
Merged

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Sep 4, 2018

Merge project and profile dicts (and packages and cli vars) into a single, coherent object.

The changes are naturally pretty extensive!

I do at one point in the code make use of multiple inheritance, to represent an is-a relationship (a full config really is both a project and a profile), but I think it came out pretty well.

I believe this handles dbt deps and dbt debug properly. After a lot of massaging and changes to some behaviors, integration tests pass. Unit tests went really well, almost no changes needed there!

I still am going to write some unit tests for the config (there's really not too much that isn't exercised by the integration tests), but since this PR is going to take so long to digest I figured I would let you have at it while I work on those.

I'm pretty confident there are no changes of substance to dbt's actual behavior. That said, here's some changes that might not be obvious to a reviewer.

Breaking changes:

  • I think I broke the package registry stuff, in a way that is very easy to fix. I'd like to go into more depth on that with you and figure out exactly what we need there... and write some tests accordingly. To my knowledge that code is unused so it didn't seem urgent. Correct me if I'm wrong!
  • Merging this will deprecate the repositories entry in dbt_project.yml. I'm not sorry.

Non-breaking internals changes:

  • if you used to pass a dict around, there's a pretty good chance it's now an object.
  • project/profile/packages all unified, of course
  • assorted adapter connection-related functions now mutate the Connection instead of copying+creating new ones. Basically, the copy model doesn't make a lot of sense with how these are instantiated and accessed. Setters call validate(), so that should be fine!
  • "strict mode" means a lot less because we validate stuff all over the place, all the time. dbt's runtime seems to be enormously dominated by db time, followed by time reading from disk, so I don't feel bad about that.

Breaking integration test changes:

  • I changed the integration tests to now look at the test method name to decide what profile to load. When I was changing how the integration tests load their internal adapters I realized the current framework was going to be very difficult to fit in there. Now, your snowflake tests must have snowflake in the name, same goes for bigquery and redshift with postgres as the default. Surprisingly little had to change on that front. It will error at test-import-time if you have two for some reason.
  • setUp is now pretty-much mandatory
  • tests that want to change quoting in a way that could impact the schema handling must be in a class that overrides project_config with the desired quoting. With the new setUp it's just not very sane to rebuild the adapter, and re-doing the adapter building/schema dropping/schema creating on snowflake (the place where we most often mess with quoting in the tests, of course) was agonizing. Tests that change quoting partway through will still work as designed. You can look at the simple copy tests for examples of the horrible machinations we have to go through.
  • I got rid of runtime-target from the tests - it's long gone from the actual code

@beckjake beckjake force-pushed the project-profile-combined branch 2 times, most recently from 8a08574 to 4fcf0fe Compare September 7, 2018 16:14
Copy link
Member

@cmcarthur cmcarthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great start. there are 3 or 4 comments that definitely need to be addressed.

a few other random thoughts:

  • love that we are deprecating repositories. however i imagine this will break a lot of projects & makes this 0.12
  • let's talk about what is broken with packages.yml
  • one thing i'd like to double check here is that the error messages are still sane. have you talked with @drewbanin about that?

@@ -42,6 +42,11 @@ def __str__(self):
def __repr__(self):
return '{}(**{})'.format(self.__class__.__name__, self._contents)

def __eq__(self, other):
if not isinstance(other, self.__class__):
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me that this will behave inconsistently if one is a subclass of the other:

class A(APIObject):
    pass

class B(A):
    pass

a = A()
b = B()

a == b  # False
b == a  # True

is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I'll add a check that the reverse isinstance check is true as well.

'maximum': 65535,
},
{
'type': 'string'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that depending on the python version, this can come in as a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my record-keeping - allowing strings is handled at connection parse time

dbt/main.py Outdated
except project.DbtProjectError as e:
if parsed.which == 'deps':
# deps doesn't need a profile, so don't require one.
cfg = config.Project.from_current_directory()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, but it hurt my brain a little bit when I first read it. can we change the import:

import dbt.config as config

to:

from dbt.config import Project, RuntimeConfig

i think that will make it more obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit more than just those two, but I will add explicit imports for all the config module entries

dbt/runner.py Outdated

def deserialize_graph(self):
logger.info("Loading dependency graph file.")

base_target_path = self.project['target-path']
base_target_path = self.project.target_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.config.target_path?

elif 'bigquery' in test_name:
return 'bigquery'
else:
return 'postgres'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clever but i'm worried that the default postgres is too clever. can we at least add an elif 'postgres' in test_name: return 'postgres' and then warn if none of these are found in the test name?

Copy link
Contributor Author

@beckjake beckjake Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, but be warned, there are a lot!

> docker-compose run --rm test tox -e explicit-py36 -- -a type=postgres --collect-only test/integration 2> /dev/null | grep '^test_' | grep -v 'postgres' | wc -l
      68

self.assertEqual(config.on_run_end, [])
self.assertEqual(config.archive, [])
self.assertEqual(config.seeds, {})
self.assertEqual(config.packages, PackageConfig(packages=[]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fantastic.

try:
return int(result)
except ValueError:
pass # let the validator or connection handle this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see.

@beckjake
Copy link
Contributor Author

love that we are deprecating repositories. however i imagine this will break a lot of projects & makes this 0.12

Yes, for sure!

let's talk about what is broken with packages.yml

It's not clear to me what exactly kind of data we should be getting back from that URL endpoint, basically. We can do a call about it if you'd like!

one thing i'd like to double check here is that the error messages are still sane. have you talked with @drewbanin about that?

Drew and I have discussed some specific messages around dbt debug and dbt deps. In general I think these validation messages come out pretty nice and it's not too hard to figure out where your config went wrong but I'd love for him to look them over and let me know what he thinks, I'm not sure what kind of config issues are common. We have a lot of flexibility here, it may be better to iterate on that in a separate PR if the messages are at least tolerable for now.

@beckjake beckjake force-pushed the project-profile-combined branch from 3a41752 to 18a5e44 Compare September 17, 2018 15:38
@beckjake beckjake changed the base branch from development to dev/guion-bluford September 17, 2018 15:39
@beckjake beckjake dismissed cmcarthur’s stale review September 17, 2018 16:19

discussed in slack, review issues resolved

@beckjake beckjake merged commit 1c67d19 into dev/guion-bluford Sep 17, 2018
@beckjake beckjake deleted the project-profile-combined branch September 17, 2018 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants