Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AG-838] Support independent transforms #70

Merged
merged 25 commits into from
May 5, 2023

Conversation

BWMac
Copy link
Contributor

@BWMac BWMac commented Apr 25, 2023

This PR refactors the transform module by splitting up transform.py. From transform.py, the apply_custom_transformations function is moved to process.py since it is only used there, and it only applies the transformations already defined in transform. The functions standardize_column_names, standardize_values, rename_columns, and nest_fields are all moved to etl/utils.py to consolidate utility functions for all etl modules, and because they are all used in more than one other module. etl/transform becomes a submodule only containing custom transformation scripts to make contributing new transformations and importing them where needed (in process.py) simpler.

process.py is updated to account for all of these changes.

A tests/transform directory was created and the existing transform tests were split into test_utils.py and transform/test_genes_biodomains.py.

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have an initial few comments. Thanks Brad!

src/agoradatatools/etl/transform/utils.py Outdated Show resolved Hide resolved
src/agoradatatools/etl/transform/utils.py Outdated Show resolved Hide resolved
src/agoradatatools/etl/transform/utils.py Outdated Show resolved Hide resolved
src/agoradatatools/etl/transform/utils.py Outdated Show resolved Hide resolved
@BWMac BWMac changed the title begin transform refactor - split off utils and biodomains [AG-838] Support independent transforms May 2, 2023
@BWMac BWMac requested a review from thomasyu888 May 2, 2023 19:02
@BWMac BWMac marked this pull request as ready for review May 3, 2023 22:14
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM! Going to pre-approve. Will wait for @JessterB and @jaclynbeck-sage to comment. There are many things that can be improved so that it becomes easier to contribute.

We can iterate on adding a CONTRIBUTING.md section after this gets merged in.

Copy link
Contributor

@JessterB JessterB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran my current data validation process for the new version of the files generated by this PR to make sure nothing went sideways during the refactor, and everything is looking good!

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good! What a colossal amount of work, thank you Brad!!

@BWMac BWMac merged commit 0684a29 into dev May 5, 2023
@BWMac BWMac deleted the bwmac/ag-838/independent_transforms branch May 5, 2023 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants