[IBCDPE-792] Implement Great Expectations for the `proteomics_distribution_data` Dataset #118

BWMac · 2024-02-02T18:15:07Z

Description:

This PR implements a GX expectation suite for the proteomics_distribution_data dataset.

Please let me know if there are any additional expectations I should add, or if there are any adjustments to be made to the expectations I have already implemented.

Notes:

Values I have chosen for expectations like expect_column_values_to_be_between are based on looking at the example dataset on Synapse.
This change has been tested by running adt test_config.yaml. Example results can be downloaded from here.
I also fixed a variable naming typo in the neuropath_corr notebook which was not affecting performance/proper functioning.

sonarcloud · 2024-02-02T18:15:24Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

thomasyu888

🔥 LGTM! Will wait for @jaclynbeck-sage or @JessterB to comment on the rules themselves

jaclynbeck-sage · 2024-02-06T21:15:21Z

Looks good! I'm a little torn on having min/max values for the numerical columns. On the one hand, generically these values could be anything and may vary based on the type of proteomics data, so it doesn't entirely make sense to bound them. On the other hand, this matches our current data, and if our current data goes outside these bounds that means something is wrong. Last time this came up we decided to leave it in for that very reason so, it's probably ok?

JessterB

This looks great, thanks @BWMac! I agree with @jaclynbeck-sage about leaving in the min/max checks on the numeric values - we don't expect these to change unless we add new data, and if we add new data we'll likely need to adjust the expectations anyway. Having a check in place that lets us know if things change unexpectedly is a good thing.

BWMac added 3 commits February 2, 2024 11:10

fixes neuropath typo

4589bac

adds proteomics_dd expectation suite

71dd284

updates configs

7f683c7

BWMac marked this pull request as ready for review February 5, 2024 16:28

BWMac requested review from thomasyu888, JessterB and jaclynbeck-sage February 5, 2024 16:29

thomasyu888 approved these changes Feb 5, 2024

View reviewed changes

jaclynbeck-sage approved these changes Feb 6, 2024

View reviewed changes

JessterB approved these changes Feb 8, 2024

View reviewed changes

BWMac merged commit 016acc3 into dev Feb 8, 2024
9 checks passed

BWMac deleted the bwmac/IBCDPE-792/proteomics_distribution_data branch February 8, 2024 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IBCDPE-792] Implement Great Expectations for the `proteomics_distribution_data` Dataset #118

[IBCDPE-792] Implement Great Expectations for the `proteomics_distribution_data` Dataset #118

BWMac commented Feb 2, 2024 •

edited

Loading

sonarcloud bot commented Feb 2, 2024

thomasyu888 left a comment

jaclynbeck-sage commented Feb 6, 2024

JessterB left a comment

[IBCDPE-792] Implement Great Expectations for the proteomics_distribution_data Dataset #118

[IBCDPE-792] Implement Great Expectations for the proteomics_distribution_data Dataset #118

Conversation

BWMac commented Feb 2, 2024 • edited Loading

sonarcloud bot commented Feb 2, 2024

Quality Gate passed

thomasyu888 left a comment

Choose a reason for hiding this comment

jaclynbeck-sage commented Feb 6, 2024

JessterB left a comment

Choose a reason for hiding this comment

[IBCDPE-792] Implement Great Expectations for the `proteomics_distribution_data` Dataset #118

[IBCDPE-792] Implement Great Expectations for the `proteomics_distribution_data` Dataset #118

BWMac commented Feb 2, 2024 •

edited

Loading