-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GATKTool so that more tools can comfortably extend it directly #4341
Comments
@samuelklee Anything else you'd add to this list? |
This looks like a good start! I'm not quite sure how you are planning to handle dictionary validation, specifically, but you can take a look at the CNV plotting tools (PlotDenoisedCopyRatios and PlotModeledSegments) to see what level of validation we currently do. We can discuss further in person if you like. (Also, note that those tools take a sequence dictionary as an input to specify which contigs should be plotted; typically, this will be a subset of the full dictionary that excludes alt contigs, etc. Requiring this sequence-dictionary input is somewhat vestigial; previous versions of the pipeline did not include dictionaries in the headers of all CNV data files. Part of making these tools into GATKTools could include switching over to -L to specify regions for plotting.) Finally, are the changes to |
@samuelklee For |
Overriding the defaults would get us most of the way there. Right now, we perform the following check on the IAC and fail if the defaults aren't changed to values that the CNV tools require, which is awkward:
If we override defaults, we'd still perform the check to make sure the user didn't muck with them, but it'd still be nicer than forcing the user to change the original defaults on their own. However, there are still two more awkward points: 1) there is no value for |
Note #4439, which concerns a Picard tool that might also need options for interval merging exposed. Just something to be aware of---I'm guessing that it's probably a bit ambitious to have identical options for all interval inputs to both Picard and GATK tools? |
One part of this ticket is done: #4964 added accessors that allow direct descendants of |
Not sure if there's a more relevant open issue, but just making note of #6924 here. |
As discussed in #2471 (comment), we need to refactor
GATKTool
so that all non-Spark tools can comfortably extend it rather than extendingCommandLineProgram
directly, as some tools currently do. In particular, we need to:Provide a mechanism for subclasses to selectively disable engine-wide arguments such as
-I
completely (and also the ability to override with their own version of an argument).Access necessary datasources outside of the engine package.
Add the ability to register input metadata such as sequence dictionaries, so that standard validation rules can be enforced across the toolkit.
Add the ability for each tool to change the defaults for engine arguments such as
--interval-merging-rule
The text was updated successfully, but these errors were encountered: