-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AG-1386] Implement Error Catching for GX Data Validation #129
Conversation
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more general issue, but I can't align the example CI run you linked with a specific file or manifest version in Synapse, at least not without doing a ton of manual digging around that no one has time for.
The Synapse UI doesn't display full timestamps (just the date portion), there were 6 different "Agora Testing Data" processing runs on 3/5/23, and there is nothing logged in the CI job's output about what version the uploaded files or manifest (and also GE reports, once we collapse the reports into versioned files) end up with.
Would it be an easy update to log the version of each file after it's uploaded?
@JessterB There wouldn't be a manifest uploaded when that run failed. Nor would there be an output data file for |
Problem:
Previously, failures in GX data validation were not causing pipeline failures. As a result, visibility for these failures is poor and our data validation pipeline is relatively ineffective.
Solution:
Leverage our existing error-catching strategy to capture data validation failures by introducing a new error type which is triggered during GX data validation if validation for a dataset fails.
Notes:
ADTDataValidationError
is raised when data validation fails. This is caught by the try/except logic inprocess_all_files
and a formatted string is printed when the pipeline concludes. Example.GreatExpectationsRunner.run
and the new_functionGreatExpectationsRunner.get_failed_expectations
which is used to parse out which expectations failed and produce the error message.