-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for Top Alleles, added more report options #18
base: develop
Are you sure you want to change the base?
Conversation
mjung2019
commented
Jun 13, 2019
- added support for Top Alleles
- added more options for gtc_final_report.py example
- added new example script gtc_final_report_matrix.py which will create matrix reports depending on the Allele type and optional genotype scores. The format is identical to the matrix export option from GenomeStudio.
…final_report script and added a new example script to support building matrix reports`x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
module/BeadPoolManifest.py
Outdated
Unknown = 0 | ||
TOP = 1 | ||
BOT = 2 | ||
PLUS = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should have different values, otherwise to_string behavior below will be undesirable. If necessary, could generalize get_base_calls generic to take a list of report strands (instead of a single value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed attribute assignments, can now be handled by get_base_calls_generic, see below.
module/BeadPoolManifest.py
Outdated
Raises: | ||
ValueError: Unexpected value for Illumina strand | ||
""" | ||
if ilmn_strand == "U" or ilmn_strand == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't empty string also raise a ValueError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, empty string is now an exception.
Question though:
for class RefStrand it states:
if ref_strand == "U" or ref_strand == "":
return RefStrand.Unknown
Is that the desired behavior for empty string?
module/GenotypeCalls.py
Outdated
The genotype basecalls on the report strand as a list of strings. | ||
The characters are A, C, G, T, or - for a no-call/null. | ||
""" | ||
return self.get_base_calls_generic(snps, ilmn_strand_annotations, IlmnStrand.TOP, RefStrand.Unknown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g., here could potentially pass
return self.get_base_calls_generic(snps, ilmn_strand_annotations, [IlmnStrand.TOP, IlmnStrand.PLUS], IlmnStrand.Unknown)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should be passing IlmnStrand.Unknown here instead of RefStrand.Unknown. (get_base_calls_forward strand also passes RefStrand.Unknown, but that is probably not the correct behavior there either)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added list support for get_base_calls_generic. That should solve the ambiguity. The list type for the method is optional in order to leave the other methods untouched. Within the method converting non list types to list before iterating through every allele as list type checks in the inner loop might be something of a performance drain compared to looping through only one element.
Changed to IlmnStrand.Unknown.
@@ -18,7 +18,7 @@ | |||
sys.exit(-1) | |||
|
|||
try: | |||
manifest = BeadPoolManifest(args.manifest) | |||
manifest = BeadPoolManifest.BeadPoolManifest(args.manifest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the reference to this import need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference should be fine, checked with running gtf_final_report_matrix.py. There is some ambiguity as the module name and class name are the same.
|
||
args = parser.parse_args() | ||
|
||
if len(sys.argv) != NUM_ARGUMENTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument parser should be able to handle this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case yes as only one matrix report is supported per run, also didn't make sense to me to add a default case.
examples/gtc_final_report_matrix.py
Outdated
parser.add_argument("manifest", help="BPM manifest file") | ||
parser.add_argument("gtc_directory", help="Directory containing GTC files") | ||
parser.add_argument("output_file", help="Location to write report") | ||
parser.add_argument("--forward", help="python gtc_final_report_matrix.py <path_to_manifest> <path_to_gtc_directory> <path_to_output_file> --forward 1, print matrix with forward alleles") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an enumerated format option here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any harm in explicitly spelling out the arguments in the parser as long as we are not expecting many more cases. But if you like I can use an enumerator for the options. I would think the trade-off in better readability not that big though.
examples/gtc_final_report_matrix.py
Outdated
parser.add_argument("manifest", help="BPM manifest file") | ||
parser.add_argument("gtc_directory", help="Directory containing GTC files") | ||
parser.add_argument("output_file", help="Location to write report") | ||
parser.add_argument("--forward", help="python gtc_final_report_matrix.py <path_to_manifest> <path_to_gtc_directory> <path_to_output_file> --forward 1, print matrix with forward alleles") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you have to pass a "1" instead of using something like action='store_true' which would make something like args.forward eval to True? Also, why does the "help" have the entire command written, wouldn't that be redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, you can pass action="store_true", default=False to the argument and then you are fine. Ok, I can make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, concerning should we write the entire command. Normally I wouldn't bother but I got so many requests how to run commands for folks not familiar with the console, so doesn't hurt to be too obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added the changes with the latest commit
…into an official release
…into an official release