AG-1147/AG-1148/AG-1098: Add tests for rna seq data and rna distribution data #80
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR includes changes for 3 interrelated tickets:
AG-1098: The
rna_distribution.py
transform had 2 transforms in it (transform_rna_seq_data
andtransform_rna_distribution_data
). The first transform is used both on its own and as a pre-requisite for the second transform. The first transform has been moved to its own file. Additionally, I renamed it (transform_rna_seq_data
->transform_rnaseq_differential_expression
) to match the name that is in the config.yaml file.AG-1147: Added a data-driven test for
transform_rnaseq_differential_expression
. There is a "good" input file and an input file with some missing data in each of the important columns, which should both pass. There is also a "bad" input file, where thelogfc
field is a string instead of a number, and this is an expected failure case.AG-1148: Added a data-driven test for
transform_rna_distribution_data
. There is a "good" input file and an input file with some missing data in each of the important columns, which should both pass. There are TWO failure cases:logfc
field is a string instead of a number, which throws an error while callingtransform_rnaseq_differential_expression
model
entries are blank or alltissue
entries are blank, which causes all rows to be dropped while calculating quartiles, and throws an error.I made sure that both tests pass post-refactor, and that the transformed output on the real data is identical to the previous version.