Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jbeck/AG-1145/transform overall scores testing #81

Merged
merged 4 commits into from
Aug 15, 2023

Conversation

jaclynbeck-sage
Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage commented Jul 1, 2023

This PR addresses 2 tickets:
AG-1116: Removes the "literaturescore" column from the overall_scores transform output.
AG-1145: Adds data-driven tests for transform_overall_scores. I wrote the test before modifying the transform and confirmed that it passed, then modified the transform / expected test output, and confirmed the test passed again.
There is no failure case for this test, just "good" data and "missing" data from key columns that both pass.

The "good" data has rows that meet at least one of these cases:

  • all <X>score values populated, and all isscored_<X> set to "Y"
  • all <X>score values populated, and all isscored_<X> set to "N"
  • all <X>score values missing, and all isscored_<X> set to "N"
  • There are two rows that are duplicates of each other except for different neuropathscores, which happens in the real data too and should be handled by the transform.
  • There are two rows with all <X>score values populated, but some isscored_<X> columns set to "Y" and some set to "N", just to make sure the transform selectively handles "N" cases.

The "missing" data has rows missing the following values:

  • One row each with a blank value in one of the isscored_<X> columns, where the corresponding <X>score value is filled in.
  • One row each with a blank value in one of the <X>score columns, where the corresponding isscored_<X> value is set to "Y".

I also confirmed that removing the literature score from the transform produces output that is identical to the old JSON file, except for the "literature_score: XXX" line is removed from each gene.

@jaclynbeck-sage jaclynbeck-sage requested review from BWMac and JessterB July 1, 2023 00:16
@jaclynbeck-sage
Copy link
Contributor Author

Something weird is happening with the file uploads to synapse (see the error output of this job: https://github.com/Sage-Bionetworks/agora-data-tools/actions/runs/5427967156/jobs/9871986708) where some of the JSON files successfully upload and some fail because of "access restrictions". I ran CI on the dev branch instead of this one and it had the same failure on the same subset of files so I think something must be up with Synapse right now.

Copy link
Contributor

@BWMac BWMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀

Copy link
Contributor

@JessterB JessterB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I couldn't find a matching set of test files (from the same processing run) with the modified output - probably due to the synapse upload errors you commented on plus the fact that we don't upload duplicate files anymore. I'll make sure to double check the files when this gets merged, and when we next update Agora Live Data.

@jaclynbeck-sage
Copy link
Contributor Author

I think the file from this branch's CI run is this one: https://www.synapse.org/#!Synapse:syn25740976.146
The timestamp in Synapse is June 30, 6:49 pm, and the timestamp in CI is July 1, 1:49 AM UTC which is 6:49 pm PST.

@jaclynbeck-sage jaclynbeck-sage merged commit 769baae into dev Aug 15, 2023
@jaclynbeck-sage jaclynbeck-sage deleted the jbeck/AG-1145/transform_overall_scores_testing branch August 15, 2023 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants