jbeck/AG-1145/transform overall scores testing #81
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses 2 tickets:
AG-1116: Removes the "literaturescore" column from the
overall_scores
transform output.AG-1145: Adds data-driven tests for
transform_overall_scores
. I wrote the test before modifying the transform and confirmed that it passed, then modified the transform / expected test output, and confirmed the test passed again.There is no failure case for this test, just "good" data and "missing" data from key columns that both pass.
The "good" data has rows that meet at least one of these cases:
<X>score
values populated, and allisscored_<X>
set to "Y"<X>score
values populated, and allisscored_<X>
set to "N"<X>score
values missing, and allisscored_<X>
set to "N"neuropathscore
s, which happens in the real data too and should be handled by the transform.<X>score
values populated, but someisscored_<X>
columns set to "Y" and some set to "N", just to make sure the transform selectively handles "N" cases.The "missing" data has rows missing the following values:
isscored_<X>
columns, where the corresponding<X>score
value is filled in.<X>score
columns, where the correspondingisscored_<X>
value is set to "Y".I also confirmed that removing the literature score from the transform produces output that is identical to the old JSON file, except for the "literature_score: XXX" line is removed from each gene.