[AG-1386] Implement Error Catching for GX Data Validation #129

BWMac · 2024-03-05T21:26:22Z

Problem:

Previously, failures in GX data validation were not causing pipeline failures. As a result, visibility for these failures is poor and our data validation pipeline is relatively ineffective.

Solution:

Leverage our existing error-catching strategy to capture data validation failures by introducing a new error type which is triggered during GX data validation if validation for a dataset fails.

Notes:

ADTDataValidationError is raised when data validation fails. This is caught by the try/except logic in process_all_files and a formatted string is printed when the pipeline concludes. Example.
Needed tests are added for GreatExpectationsRunner.run and the new_function GreatExpectationsRunner.get_failed_expectations which is used to parse out which expectations failed and produce the error message.

tests/test_assets/gx/checkpoint_result_pass.json

tests/test_gx.py

sonarcloud · 2024-03-06T19:46:43Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
3.1% Duplication on New Code

See analysis details on SonarCloud

jaclynbeck-sage

Looks good!

JessterB

This is a more general issue, but I can't align the example CI run you linked with a specific file or manifest version in Synapse, at least not without doing a ton of manual digging around that no one has time for.

The Synapse UI doesn't display full timestamps (just the date portion), there were 6 different "Agora Testing Data" processing runs on 3/5/23, and there is nothing logged in the CI job's output about what version the uploaded files or manifest (and also GE reports, once we collapse the reports into versioned files) end up with.

Would it be an easy update to log the version of each file after it's uploaded?

src/agoradatatools/process.py

BWMac · 2024-03-07T16:22:21Z

@JessterB There wouldn't be a manifest uploaded when that run failed. Nor would there be an output data file for neuropath_corr uploaded because that particular iteration of process_dataset would not have made it to the upload step. Therefore, I think the versioning strategy would only really be useful for GX reports.

BWMac added 8 commits March 1, 2024 15:44

start gx ci updates

b130250

implements gx error catching

355ad6b

adds tests

68a617d

bad neuropath_corr data to trigger fail

5ec1d44

revert to correct neuropath_corr file

f7933a9

use checkpoint result methods where possible

cdeb71a

incorrect typehint

2e7a4c8

split line further

5f1fd17

BWMac marked this pull request as ready for review March 6, 2024 17:46

BWMac requested review from JessterB and jaclynbeck-sage March 6, 2024 17:46

jaclynbeck-sage reviewed Mar 6, 2024

View reviewed changes

tests/test_assets/gx/checkpoint_result_pass.json Outdated Show resolved Hide resolved

updates good checkpoint file

15c6ba9

jaclynbeck-sage reviewed Mar 6, 2024

View reviewed changes

tests/test_gx.py Show resolved Hide resolved

adds assert not called

d46a4d7

jaclynbeck-sage approved these changes Mar 6, 2024

View reviewed changes

JessterB approved these changes Mar 7, 2024

View reviewed changes

src/agoradatatools/process.py Show resolved Hide resolved

BWMac merged commit bbc1adc into dev Mar 7, 2024
9 checks passed

BWMac deleted the bwmac/AG-1386/GX_CI branch March 7, 2024 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AG-1386] Implement Error Catching for GX Data Validation #129

[AG-1386] Implement Error Catching for GX Data Validation #129

BWMac commented Mar 5, 2024 •

edited

Loading

sonarcloud bot commented Mar 6, 2024

jaclynbeck-sage left a comment

JessterB left a comment

BWMac commented Mar 7, 2024

[AG-1386] Implement Error Catching for GX Data Validation #129

[AG-1386] Implement Error Catching for GX Data Validation #129

Conversation

BWMac commented Mar 5, 2024 • edited Loading

sonarcloud bot commented Mar 6, 2024

Quality Gate passed

jaclynbeck-sage left a comment

Choose a reason for hiding this comment

JessterB left a comment

Choose a reason for hiding this comment

BWMac commented Mar 7, 2024

BWMac commented Mar 5, 2024 •

edited

Loading