Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cmd line to VCF generated by GATKSparkTool #4981

Merged
merged 5 commits into from
Jul 23, 2018

Conversation

SHuang-Broad
Copy link
Contributor

There's such feature for GATKTool, but not yet for GATKSparkTool.

Much of the code is copied from the GATKTool version
Engine team, please comment if a refactor is needed and how. Thanks!
(Tagging @droazen @lbergelson and @cmnbroad )

@@ -404,6 +410,34 @@ public boolean useVariantAnnotations() {
return Collections.emptyList();
}

// TODO: 7/3/18 the two functions below are copy-pasted from GATKTool, and probably some refactoring can be done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SHuang-Broad Thanks for doing this. You're right that we definitely don't want to duplicate all of this code. I would factor out a makeDefaultToolVCFHeaderLines method that takes the toolkit short name, toolkit version, the tool class name, and the command line, and returns the set of header lines, and delegate to that from both GATKTool and GATKSparkTool when addOutputVCFCommandLine is true. For now it can probably live in GATKVariantContextUtils with createVCFWriter. (@droazen @lbergelson we might want to create a ToolUtils class with static stuff like this - I think there is a bunch of code we could move into that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cmnbroad .

I looked at GATKVariantContextUtils.createVCFWriter(...), and where GATKTool.getDefaultToolVCFHeaderLines(...) is used, it seems that the individual tools are

  1. constructing their headers by calling explicitly into GATKTool.getDefaultToolVCFHeaderLines(...) , then
  2. get the writer by calling into GATKVariantContextUtils.createVCFWriter(...), then
  3. let the writer writeHeader()

in order. So I'm not quite understanding the comment:

For now it can probably live in GATKVariantContextUtils with createVCFWriter.

Are you suggesting that I factor out the method makeDefaultToolVCFHeaderLines(...), place it in GATKVariantContextUtils, but not let the code be absorbed into createVCFWriter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I'm suspecting more could be done, probably extracting out an abstract class like AbstractGATKBaseTool?

@SHuang-Broad SHuang-Broad force-pushed the sh-add-cmd-line-to-spark-tool-vcf branch 3 times, most recently from 39b2487 to 1f66a73 Compare July 10, 2018 00:59
@codecov-io
Copy link

codecov-io commented Jul 10, 2018

Codecov Report

Merging #4981 into master will increase coverage by 26.014%.
The diff coverage is 94.737%.

@@               Coverage Diff                @@
##              master     #4981        +/-   ##
================================================
+ Coverage     60.157%   86.171%   +26.014%     
- Complexity     12768     28906     +16138     
================================================
  Files           1095      1783       +688     
  Lines          64608    134295     +69687     
  Branches       10395     15240      +4845     
================================================
+ Hits           38866    115723     +76857     
+ Misses         21504     13163      -8341     
- Partials        4238      5409      +1171
Impacted Files Coverage Δ Complexity Δ
...lbender/utils/variant/GATKVariantContextUtils.java 85.045% <100%> (+33.365%) 226 <1> (+114) ⬆️
...hellbender/engine/spark/GATKSparkToolUnitTest.java 100% <100%> (ø) 2 <2> (?)
...stitute/hellbender/cmdline/CommandLineProgram.java 85.906% <100%> (+8.879%) 44 <1> (+5) ⬆️
...org/broadinstitute/hellbender/engine/GATKTool.java 89.595% <100%> (+3.232%) 144 <0> (+51) ⬆️
...stitute/hellbender/engine/spark/GATKSparkTool.java 84.298% <66.667%> (-0.92%) 63 <1> (+1)
...der/tools/walkers/variantutils/SelectVariants.java 78.719% <0%> (-0.563%) 149% <0%> (+27%)
...nstitute/hellbender/engine/MultiVariantWalker.java 96.154% <0%> (-0.142%) 24% <0%> (+12%)
...er/engine/ReadWalkerGCSSupportIntegrationTest.java 4.878% <0%> (ø) 2% <0%> (?)
...specific/AlleleSpecificAnnotationDataUnitTest.java 86.667% <0%> (ø) 4% <0%> (?)
...nder/tools/spark/sv/utils/SVVCFWriterUnitTest.java 97.917% <0%> (ø) 3% <0%> (?)
... and 1278 more

@droazen droazen removed their request for review July 10, 2018 15:58
@SHuang-Broad SHuang-Broad force-pushed the sh-add-cmd-line-to-spark-tool-vcf branch from 1f66a73 to 97e32cb Compare July 11, 2018 00:03
@SHuang-Broad SHuang-Broad force-pushed the sh-add-cmd-line-to-spark-tool-vcf branch from 97e32cb to 2ad08e2 Compare July 12, 2018 14:27
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor changes requested, otherwise looks good.

* @return A set of VCF header lines containing the tool name, version, date and command line.
*/
public static Set<VCFHeaderLine> getDefaultVCFHeaderLines(final String toolkitShortName, final String toolName,
final String versionString, final String dataTime,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataTime -> dateTime

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, javadoc should include the param list with short descriptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* TODO: This should be refactored and moved up into CommandLineProgram, with this value
* TODO: stored in the jar manifest, like {@link CommandLineProgram#getToolkitName}
*/
protected String getToolkitShortName() { return "GATK"; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we might as well move this method up into CommandLineProgram now, adjacent to getToolkitName, along with a DEFAULT_TOOLKIT_SHORT_NAME static constant, and return that. Then we can remove the two identical implementations. Also, lets keep the second part of the TODO comment with the new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored, not sure if they are what you mean?

@SHuang-Broad
Copy link
Contributor Author

@cmnbroad , I've done the suggested refactoring and documentation changes.
Can you take a look again please? Thanks!

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SHuang-Broad Two very minor changes requested, then we can merge.

@@ -60,6 +60,8 @@
// abstract, this is fine (as long as no logging has to happen statically in this class).
protected final Logger logger = LogManager.getLogger(this.getClass());

public static final String DEFAULT_TOOLKIT_SHORT_NAME = "GATK";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be private unless there is a compelling reason otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @return An abbreviated name of the toolkit for this tool. Subclasses may override to provide
* a custom toolkit name.
*
* TODO: stored in the jar manifest, like {@link CommandLineProgram#getToolkitName}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this TODO out to a separate comment in the body so it doesn't show up in the javadoc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, but not sure if that's what you'd like?

@cmnbroad cmnbroad merged commit 8ad39d7 into master Jul 23, 2018
@cmnbroad cmnbroad deleted the sh-add-cmd-line-to-spark-tool-vcf branch July 23, 2018 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants