Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for Top Alleles, added more report options #18

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

mjung2019
Copy link

  • added support for Top Alleles
  • added more options for gtc_final_report.py example
  • added new example script gtc_final_report_matrix.py which will create matrix reports depending on the Allele type and optional genotype scores. The format is identical to the matrix export option from GenomeStudio.

…final_report script and added a new example script to support building matrix reports`x
Copy link
Contributor

@KelleyRyanM KelleyRyanM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Unknown = 0
TOP = 1
BOT = 2
PLUS = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should have different values, otherwise to_string behavior below will be undesirable. If necessary, could generalize get_base_calls generic to take a list of report strands (instead of a single value)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed attribute assignments, can now be handled by get_base_calls_generic, see below.

Raises:
ValueError: Unexpected value for Illumina strand
"""
if ilmn_strand == "U" or ilmn_strand == "":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't empty string also raise a ValueError?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, empty string is now an exception.

Question though:
for class RefStrand it states:
if ref_strand == "U" or ref_strand == "":
return RefStrand.Unknown

Is that the desired behavior for empty string?

The genotype basecalls on the report strand as a list of strings.
The characters are A, C, G, T, or - for a no-call/null.
"""
return self.get_base_calls_generic(snps, ilmn_strand_annotations, IlmnStrand.TOP, RefStrand.Unknown)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., here could potentially pass
return self.get_base_calls_generic(snps, ilmn_strand_annotations, [IlmnStrand.TOP, IlmnStrand.PLUS], IlmnStrand.Unknown)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should be passing IlmnStrand.Unknown here instead of RefStrand.Unknown. (get_base_calls_forward strand also passes RefStrand.Unknown, but that is probably not the correct behavior there either)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added list support for get_base_calls_generic. That should solve the ambiguity. The list type for the method is optional in order to leave the other methods untouched. Within the method converting non list types to list before iterating through every allele as list type checks in the inner loop might be something of a performance drain compared to looping through only one element.

Changed to IlmnStrand.Unknown.

@@ -18,7 +18,7 @@
sys.exit(-1)

try:
manifest = BeadPoolManifest(args.manifest)
manifest = BeadPoolManifest.BeadPoolManifest(args.manifest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the reference to this import need to change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference should be fine, checked with running gtf_final_report_matrix.py. There is some ambiguity as the module name and class name are the same.


args = parser.parse_args()

if len(sys.argv) != NUM_ARGUMENTS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument parser should be able to handle this error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case yes as only one matrix report is supported per run, also didn't make sense to me to add a default case.

parser.add_argument("manifest", help="BPM manifest file")
parser.add_argument("gtc_directory", help="Directory containing GTC files")
parser.add_argument("output_file", help="Location to write report")
parser.add_argument("--forward", help="python gtc_final_report_matrix.py <path_to_manifest> <path_to_gtc_directory> <path_to_output_file> --forward 1, print matrix with forward alleles")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an enumerated format option here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any harm in explicitly spelling out the arguments in the parser as long as we are not expecting many more cases. But if you like I can use an enumerator for the options. I would think the trade-off in better readability not that big though.

parser.add_argument("manifest", help="BPM manifest file")
parser.add_argument("gtc_directory", help="Directory containing GTC files")
parser.add_argument("output_file", help="Location to write report")
parser.add_argument("--forward", help="python gtc_final_report_matrix.py <path_to_manifest> <path_to_gtc_directory> <path_to_output_file> --forward 1, print matrix with forward alleles")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you have to pass a "1" instead of using something like action='store_true' which would make something like args.forward eval to True? Also, why does the "help" have the entire command written, wouldn't that be redundant?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, you can pass action="store_true", default=False to the argument and then you are fine. Ok, I can make this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, concerning should we write the entire command. Normally I wouldn't bother but I got so many requests how to run commands for folks not familiar with the console, so doesn't hurt to be too obvious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added the changes with the latest commit

d-wdj added a commit to d-wdj/BeadArrayFiles that referenced this pull request Aug 25, 2022
d-wdj added a commit to d-wdj/BeadArrayFiles that referenced this pull request Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants