Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the database when deciding if two tables are the same (#1708) #1774

Merged

Conversation

beckjake
Copy link
Contributor

Fixes #1708

The problem there was that dbt didn't use the table database in the catalog as part of deciding if a table was "the same" as one in the manifest. That resulted in us combining both tables' column information into each entry.

While I was in here, I converted it to use type hints and dataclasses and hologram and all that good stuff, so we now can generate an actual json schema for our output and while reading the code you can actually tell what types things are...

I also noticed that we iterated over the whole manifest once per catalog entry, and converted it to generate and use a lookup table, so now we only iterate over the manifest once when linking up IDs to catalog entries. On very large projects this should speed things up a bit.

I am pretty sure I managed to keep the output format the same, so no need to update dbt docs for this.

I made dbt a bit more picky about table/column metadata fields coming from adapter.get_catalog(): they now have an exact list of what's required. Mostly because it makes json schema generation easier, but also mypy is happier this way.

I also changed the unit tests to be a bit less unit-y but also to test what we actually care about (input of catalog dict results -> correct structured json output when we write to disk)

…log generation

Convert catalog intermediate structure into something more useful
Make comparing manifests to catalogs faster by generating an explicit identifier to id mapping
Make the identifier to unique ID mapping include databases
Convert catalog to use dataclasses/hologram types
Fix unit tests to test what we actually care about
No changes to integration tests means no need to change dbt docs, hooray
@cla-bot cla-bot bot added the cla:yes label Sep 20, 2019
@beckjake beckjake force-pushed the fix/include-database-comparing-catalog branch from 6c81562 to a9bb1aa Compare September 20, 2019 16:54
@beckjake beckjake requested a review from drewbanin September 20, 2019 17:51
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Great that no updates are required to the docs site with this change. Leaving the database out of the comparison was a pretty big oversight when we introduced sources, but glad that we're fixing it now!

@beckjake beckjake merged commit f6406c9 into dev/louisa-may-alcott Sep 20, 2019
@beckjake beckjake deleted the fix/include-database-comparing-catalog branch September 20, 2019 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt docs has extra, incorrect column names when model name and schema match a source table
2 participants