feat: Add a transformer that adds tags to all tables created in a job #287

jdavidheiser · 2020-06-11T15:37:04Z

Summary of Changes

This was a use case that came up for us several times, where we want all tables loaded from a specific source to have the same tag. This transformer makes it easy to add a tag independent of what the extractor is doing.

Tests

It seems like none of the other transformers have tests, so I was not sure what pattern to follow.

Documentation

No new docs, hopefully the transformer is simple enough to be self explanatory.

CheckList

Make sure you have checked all steps below to ensure a timely review.

PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
PR includes a summary of changes.
PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
PR passes make test

codecov-commenter · 2020-06-11T15:45:07Z

Codecov Report

Merging #287 into master will increase coverage by 0.10%.
The diff coverage is 96.15%.

@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
+ Coverage   73.34%   73.45%   +0.10%     
==========================================
  Files         102      103       +1     
  Lines        4307     4328      +21     
  Branches      401      403       +2     
==========================================
+ Hits         3159     3179      +20     
- Misses       1048     1049       +1     
  Partials      100      100

Impacted Files	Coverage Δ
databuilder/transformer/table_tag_transformer.py	`94.44% <94.44%> (ø)`
databuilder/models/table_metadata.py	`91.82% <100.00%> (+0.11%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb3e4d3...af94259. Read the comment docs.

feng-tao

thanks, put some comments.

databuilder/transformer/table_tag_transformer.py

feng-tao · 2020-06-11T17:56:17Z

databuilder/transformer/table_tag_transformer.py

+from databuilder.models.table_metadata import TableMetadata
+
+
+class TableTagTransformer(Transformer):


would be good to put a doc on https://github.com/lyft/amundsendatabuilder#list-of-transformers as well as some unit tests.

Added some tests and the docs.

feng-tao · 2020-06-11T18:24:43Z

databuilder/transformer/table_tag_transformer.py

+
+    def transform(self, record):
+        if isinstance(record, TableMetadata):
+            if record.tags:


nvm, saw you did the split in above.

I moved this tag splitting logic into a staticmethod on the TableMetadata object, so we can guarantee the same logic is used in both places. I realized I was missing the part where it called lower on the tags in the copied version.

jdavidheiser

This should be ready for review again.

jdavidheiser · 2020-06-17T13:59:09Z

databuilder/models/table_metadata.py

-            tags = [tag.lower().strip() for tag in tags]
-        self.tags = tags
+
+        self.tags = TableMetadata.format_tags(tags)


This small refactor was so the tag formatting logic could be reused, instead of replicated.

jdavidheiser · 2020-06-17T13:59:30Z

databuilder/transformer/table_tag_transformer.py

+from databuilder.models.table_metadata import TableMetadata
+
+
+class TableTagTransformer(Transformer):


Added some tests and the docs.

jdavidheiser · 2020-06-17T14:00:43Z

databuilder/transformer/table_tag_transformer.py

+
+    def transform(self, record):
+        if isinstance(record, TableMetadata):
+            if record.tags:


I moved this tag splitting logic into a staticmethod on the TableMetadata object, so we can guarantee the same logic is used in both places. I realized I was missing the part where it called lower on the tags in the copied version.

feng-tao · 2020-06-17T17:00:47Z

databuilder/transformer/table_tag_transformer.py


    def transform(self, record):
        if isinstance(record, TableMetadata):
            if record.tags:
+                print(record.tags)


print line that we should remove.

feng-tao

lgtm , we could commit once the print line is removed.

jdavidheiser · 2020-06-17T17:15:11Z

got rid of the print

feng-tao · 2020-06-17T17:43:17Z

the CI is failing due to flake8 on the test file.

jdavidheiser · 2020-06-17T19:53:42Z

I am very spoiled by all of our internal tooling that automatically alerts me to things like lint issues! I pushed an update to fix the lint errors, and another to switch from double quotes to single, since that's the norm across the Amundsen code base.

feng-tao · 2020-06-17T22:06:50Z

        transformer = TableTagTransformer()
        config = ConfigFactory.from_dict({
            TableTagTransformer.TAGS: 'baz',
        })
        transformer.init(conf=config)
    
        result = transformer.transform(TableMetadata(
            database='test_db',
            cluster='test_cluster',
            schema='test_schema',
            name='test_table',
            description='',
            tags='foo,bar',
        ))
>       self.assertEqual(result.tags, ['foo', 'bar', 'baz'])
E       AssertionError: Lists differ: ['foo', 'bar', u'b', u'a', u'z... != ['foo', 'bar', 'baz']
E       
E       First differing element 2:
E       u'b'
E       'baz'
E       
E       First list contains 2 additional elements.
E       First extra element 3:
E       u'a'
E       
E       - ['foo', 'bar', u'b', u'a', u'z']
E       ?                -  ----- -----
E       
E       + ['foo', 'bar', 'baz']
tests/unit/transformer/test_table_tag_transformer.py:59: AssertionError
__________ TestTableTagTransformer.test_multiple_tags_comma_delimited __________
self = <tests.unit.transformer.test_table_tag_transformer.TestTableTagTransformer testMethod=test_multiple_tags_comma_delimited>
    def test_multiple_tags_comma_delimited(self):
        transformer = TableTagTransformer()
        config = ConfigFactory.from_dict({
            TableTagTransformer.TAGS: 'foo,bar',
        })
        transformer.init(conf=config)
    
        result = transformer.transform(TableMetadata(
            database='test_db',
            cluster='test_cluster',
            schema='test_schema',
            name='test_table',
            description='',
        ))
    
>       self.assertEqual(result.tags, ['foo', 'bar'])
E       AssertionError: u'foo,bar' != ['foo', 'bar']
tests/unit/transformer/test_table_tag_transformer.py:42: AssertionError
___________________ TestTableTagTransformer.test_single_tag ____________________
self = <tests.unit.transformer.test_table_tag_transformer.TestTableTagTransformer testMethod=test_single_tag>
    def test_single_tag(self):
        transformer = TableTagTransformer()
        config = ConfigFactory.from_dict({
            TableTagTransformer.TAGS: 'foo',
        })
        transformer.init(conf=config)
    
        result = transformer.transform(TableMetadata(
            database='test_db',
            cluster='test_cluster',
            schema='test_schema',
            name='test_table',
            description='',
        ))
    
>       self.assertEqual(result.tags, ['foo'])
E       AssertionError: u'foo' != ['foo']```

feng-tao · 2020-06-17T22:09:57Z

seems to fail on py27, I think we could just remove the py27 unit tests. And once Lyft(I assume only Lyft still does at this point) fully migrates off py2(target end of Q2), we could get rid of py2 entirely.

jdavidheiser · 2020-06-17T22:10:34Z

That looks like the tests are running in Python 2. I thought databuilder required Python 3, so actually removed some Python 2 compatibility stuff I had locally. It would be easy to add back for the sake of simplifying testing for now.

feng-tao · 2020-06-17T22:12:03Z

@jdavidheiser that will be great to unlock it :)

…builder into tagtransformer

feng-tao · 2020-06-17T22:58:25Z

thanks @jdavidheiser

initial commit

cd46698

jdavidheiser marked this pull request as draft June 11, 2020 15:38

imports

0652279

jdavidheiser changed the title ~~Add a transformer that adds tags to all tables created in a job~~ Feat/Add a transformer that adds tags to all tables created in a job Jun 11, 2020

jdavidheiser changed the title ~~Feat/Add a transformer that adds tags to all tables created in a job~~ Feat: Add a transformer that adds tags to all tables created in a job Jun 11, 2020

jdavidheiser changed the title ~~Feat: Add a transformer that adds tags to all tables created in a job~~ feat: Add a transformer that adds tags to all tables created in a job Jun 11, 2020

jdavidheiser marked this pull request as ready for review June 11, 2020 15:42

remove debugging print

4af5d56

imports

d25a46d

feng-tao reviewed Jun 11, 2020

View reviewed changes

feng-tao added the keep fresh Disables stalebot from closing an issue label Jun 16, 2020

jdavidheiser added 2 commits June 17, 2020 09:57

tests

5e7df7c

tests

8bcf9e8

jdavidheiser commented Jun 17, 2020

View reviewed changes

jdavidheiser requested a review from feng-tao June 17, 2020 14:01

feng-tao reviewed Jun 17, 2020

View reviewed changes

print

998a0b2

jdavidheiser added 2 commits June 17, 2020 15:50

lint

9ea42a6

standardize on single quotes

2875aa9

Merge branch 'master' into tagtransformer

a51a13b

This comment has been minimized.

Sign in to view

jdavidheiser added 2 commits June 17, 2020 18:12

fix python 2 compatibility

809179a

Merge branch 'tagtransformer' of github.com:jdavidheiser/amundsendata…

af94259

…builder into tagtransformer

feng-tao approved these changes Jun 17, 2020

View reviewed changes

feng-tao merged commit d2f4bd3 into amundsen-io:master Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add a transformer that adds tags to all tables created in a job #287

feat: Add a transformer that adds tags to all tables created in a job #287

jdavidheiser commented Jun 11, 2020

codecov-commenter commented Jun 11, 2020 •

edited

Loading

feng-tao left a comment

feng-tao Jun 11, 2020

jdavidheiser Jun 17, 2020

feng-tao Jun 11, 2020

jdavidheiser Jun 17, 2020

feng-tao Jun 17, 2020

jdavidheiser left a comment

jdavidheiser Jun 17, 2020

jdavidheiser Jun 17, 2020

jdavidheiser Jun 17, 2020

feng-tao Jun 17, 2020

feng-tao left a comment

jdavidheiser commented Jun 17, 2020

feng-tao commented Jun 17, 2020

jdavidheiser commented Jun 17, 2020

This comment has been minimized.

feng-tao commented Jun 17, 2020

feng-tao commented Jun 17, 2020

jdavidheiser commented Jun 17, 2020

feng-tao commented Jun 17, 2020

feng-tao commented Jun 17, 2020

		from databuilder.models.table_metadata import TableMetadata


		class TableTagTransformer(Transformer):

feat: Add a transformer that adds tags to all tables created in a job #287

feat: Add a transformer that adds tags to all tables created in a job #287

Conversation

jdavidheiser commented Jun 11, 2020

Summary of Changes

Tests

Documentation

CheckList

codecov-commenter commented Jun 11, 2020 • edited Loading

Codecov Report

feng-tao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdavidheiser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feng-tao left a comment

Choose a reason for hiding this comment

jdavidheiser commented Jun 17, 2020

feng-tao commented Jun 17, 2020

jdavidheiser commented Jun 17, 2020

This comment has been minimized.

feng-tao commented Jun 17, 2020

feng-tao commented Jun 17, 2020

jdavidheiser commented Jun 17, 2020

feng-tao commented Jun 17, 2020

feng-tao commented Jun 17, 2020

codecov-commenter commented Jun 11, 2020 •

edited

Loading