Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a transformer that adds tags to all tables created in a job #287

Merged
merged 12 commits into from
Jun 17, 2020

Conversation

jdavidheiser
Copy link
Contributor

Summary of Changes

This was a use case that came up for us several times, where we want all tables loaded from a specific source to have the same tag. This transformer makes it easy to add a tag independent of what the extractor is doing.

Tests

It seems like none of the other transformers have tests, so I was not sure what pattern to follow.

Documentation

No new docs, hopefully the transformer is simple enough to be self explanatory.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

@jdavidheiser jdavidheiser marked this pull request as draft June 11, 2020 15:38
@jdavidheiser jdavidheiser changed the title Add a transformer that adds tags to all tables created in a job Feat/Add a transformer that adds tags to all tables created in a job Jun 11, 2020
@jdavidheiser jdavidheiser changed the title Feat/Add a transformer that adds tags to all tables created in a job Feat: Add a transformer that adds tags to all tables created in a job Jun 11, 2020
@jdavidheiser jdavidheiser changed the title Feat: Add a transformer that adds tags to all tables created in a job feat: Add a transformer that adds tags to all tables created in a job Jun 11, 2020
@jdavidheiser jdavidheiser marked this pull request as ready for review June 11, 2020 15:42
@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2020

Codecov Report

Merging #287 into master will increase coverage by 0.10%.
The diff coverage is 96.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
+ Coverage   73.34%   73.45%   +0.10%     
==========================================
  Files         102      103       +1     
  Lines        4307     4328      +21     
  Branches      401      403       +2     
==========================================
+ Hits         3159     3179      +20     
- Misses       1048     1049       +1     
  Partials      100      100              
Impacted Files Coverage Δ
databuilder/transformer/table_tag_transformer.py 94.44% <94.44%> (ø)
databuilder/models/table_metadata.py 91.82% <100.00%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb3e4d3...af94259. Read the comment docs.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, put some comments.

databuilder/transformer/table_tag_transformer.py Outdated Show resolved Hide resolved
from databuilder.models.table_metadata import TableMetadata


class TableTagTransformer(Transformer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to put a doc on https://github.com/lyft/amundsendatabuilder#list-of-transformers as well as some unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests and the docs.


def transform(self, record):
if isinstance(record, TableMetadata):
if record.tags:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, saw you did the split in above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this tag splitting logic into a staticmethod on the TableMetadata object, so we can guarantee the same logic is used in both places. I realized I was missing the part where it called lower on the tags in the copied version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg

@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Jun 16, 2020
Copy link
Contributor Author

@jdavidheiser jdavidheiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ready for review again.

tags = [tag.lower().strip() for tag in tags]
self.tags = tags

self.tags = TableMetadata.format_tags(tags)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This small refactor was so the tag formatting logic could be reused, instead of replicated.

from databuilder.models.table_metadata import TableMetadata


class TableTagTransformer(Transformer):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests and the docs.


def transform(self, record):
if isinstance(record, TableMetadata):
if record.tags:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this tag splitting logic into a staticmethod on the TableMetadata object, so we can guarantee the same logic is used in both places. I realized I was missing the part where it called lower on the tags in the copied version.

@jdavidheiser jdavidheiser requested a review from feng-tao June 17, 2020 14:01

def transform(self, record):
if isinstance(record, TableMetadata):
if record.tags:
print(record.tags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print line that we should remove.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm , we could commit once the print line is removed.

@jdavidheiser
Copy link
Contributor Author

got rid of the print

@feng-tao
Copy link
Member

the CI is failing due to flake8 on the test file.

@jdavidheiser
Copy link
Contributor Author

I am very spoiled by all of our internal tooling that automatically alerts me to things like lint issues! I pushed an update to fix the lint errors, and another to switch from double quotes to single, since that's the norm across the Amundsen code base.

@feng-tao

This comment has been minimized.

@feng-tao
Copy link
Member

        transformer = TableTagTransformer()
        config = ConfigFactory.from_dict({
            TableTagTransformer.TAGS: 'baz',
        })
        transformer.init(conf=config)
    
        result = transformer.transform(TableMetadata(
            database='test_db',
            cluster='test_cluster',
            schema='test_schema',
            name='test_table',
            description='',
            tags='foo,bar',
        ))
>       self.assertEqual(result.tags, ['foo', 'bar', 'baz'])
E       AssertionError: Lists differ: ['foo', 'bar', u'b', u'a', u'z... != ['foo', 'bar', 'baz']
E       
E       First differing element 2:
E       u'b'
E       'baz'
E       
E       First list contains 2 additional elements.
E       First extra element 3:
E       u'a'
E       
E       - ['foo', 'bar', u'b', u'a', u'z']
E       ?                -  ----- -----
E       
E       + ['foo', 'bar', 'baz']
tests/unit/transformer/test_table_tag_transformer.py:59: AssertionError
__________ TestTableTagTransformer.test_multiple_tags_comma_delimited __________
self = <tests.unit.transformer.test_table_tag_transformer.TestTableTagTransformer testMethod=test_multiple_tags_comma_delimited>
    def test_multiple_tags_comma_delimited(self):
        transformer = TableTagTransformer()
        config = ConfigFactory.from_dict({
            TableTagTransformer.TAGS: 'foo,bar',
        })
        transformer.init(conf=config)
    
        result = transformer.transform(TableMetadata(
            database='test_db',
            cluster='test_cluster',
            schema='test_schema',
            name='test_table',
            description='',
        ))
    
>       self.assertEqual(result.tags, ['foo', 'bar'])
E       AssertionError: u'foo,bar' != ['foo', 'bar']
tests/unit/transformer/test_table_tag_transformer.py:42: AssertionError
___________________ TestTableTagTransformer.test_single_tag ____________________
self = <tests.unit.transformer.test_table_tag_transformer.TestTableTagTransformer testMethod=test_single_tag>
    def test_single_tag(self):
        transformer = TableTagTransformer()
        config = ConfigFactory.from_dict({
            TableTagTransformer.TAGS: 'foo',
        })
        transformer.init(conf=config)
    
        result = transformer.transform(TableMetadata(
            database='test_db',
            cluster='test_cluster',
            schema='test_schema',
            name='test_table',
            description='',
        ))
    
>       self.assertEqual(result.tags, ['foo'])
E       AssertionError: u'foo' != ['foo']```

@feng-tao
Copy link
Member

seems to fail on py27, I think we could just remove the py27 unit tests. And once Lyft(I assume only Lyft still does at this point) fully migrates off py2(target end of Q2), we could get rid of py2 entirely.

@jdavidheiser
Copy link
Contributor Author

That looks like the tests are running in Python 2. I thought databuilder required Python 3, so actually removed some Python 2 compatibility stuff I had locally. It would be easy to add back for the sake of simplifying testing for now.

@feng-tao
Copy link
Member

@jdavidheiser that will be great to unlock it :)

@feng-tao
Copy link
Member

thanks @jdavidheiser

@feng-tao feng-tao merged commit d2f4bd3 into amundsen-io:master Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep fresh Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants