-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cmd line to VCF generated by GATKSparkTool #4981
Conversation
@@ -404,6 +410,34 @@ public boolean useVariantAnnotations() { | |||
return Collections.emptyList(); | |||
} | |||
|
|||
// TODO: 7/3/18 the two functions below are copy-pasted from GATKTool, and probably some refactoring can be done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SHuang-Broad Thanks for doing this. You're right that we definitely don't want to duplicate all of this code. I would factor out a makeDefaultToolVCFHeaderLines
method that takes the toolkit short name, toolkit version, the tool class name, and the command line, and returns the set of header lines, and delegate to that from both GATKTool and GATKSparkTool when addOutputVCFCommandLine
is true. For now it can probably live in GATKVariantContextUtils with createVCFWriter. (@droazen @lbergelson we might want to create a ToolUtils class with static stuff like this - I think there is a bunch of code we could move into that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cmnbroad .
I looked at GATKVariantContextUtils.createVCFWriter(...)
, and where GATKTool.getDefaultToolVCFHeaderLines(...)
is used, it seems that the individual tools are
- constructing their headers by calling explicitly into
GATKTool.getDefaultToolVCFHeaderLines(...)
, then - get the writer by calling into
GATKVariantContextUtils.createVCFWriter(...)
, then - let the writer
writeHeader()
in order. So I'm not quite understanding the comment:
For now it can probably live in GATKVariantContextUtils with createVCFWriter.
Are you suggesting that I factor out the method makeDefaultToolVCFHeaderLines(...)
, place it in GATKVariantContextUtils
, but not let the code be absorbed into createVCFWriter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, I'm suspecting more could be done, probably extracting out an abstract class like AbstractGATKBaseTool
?
39b2487
to
1f66a73
Compare
Codecov Report
@@ Coverage Diff @@
## master #4981 +/- ##
================================================
+ Coverage 60.157% 86.171% +26.014%
- Complexity 12768 28906 +16138
================================================
Files 1095 1783 +688
Lines 64608 134295 +69687
Branches 10395 15240 +4845
================================================
+ Hits 38866 115723 +76857
+ Misses 21504 13163 -8341
- Partials 4238 5409 +1171
|
1f66a73
to
97e32cb
Compare
…TKSparkTool (allow Spark-tool-generated VCF to have cmd line history)
97e32cb
to
2ad08e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor changes requested, otherwise looks good.
* @return A set of VCF header lines containing the tool name, version, date and command line. | ||
*/ | ||
public static Set<VCFHeaderLine> getDefaultVCFHeaderLines(final String toolkitShortName, final String toolName, | ||
final String versionString, final String dataTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataTime -> dateTime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, javadoc should include the param list with short descriptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* TODO: This should be refactored and moved up into CommandLineProgram, with this value | ||
* TODO: stored in the jar manifest, like {@link CommandLineProgram#getToolkitName} | ||
*/ | ||
protected String getToolkitShortName() { return "GATK"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we might as well move this method up into CommandLineProgram now, adjacent to getToolkitName, along with a DEFAULT_TOOLKIT_SHORT_NAME
static constant, and return that. Then we can remove the two identical implementations. Also, lets keep the second part of the TODO comment with the new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored, not sure if they are what you mean?
@cmnbroad , I've done the suggested refactoring and documentation changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SHuang-Broad Two very minor changes requested, then we can merge.
@@ -60,6 +60,8 @@ | |||
// abstract, this is fine (as long as no logging has to happen statically in this class). | |||
protected final Logger logger = LogManager.getLogger(this.getClass()); | |||
|
|||
public static final String DEFAULT_TOOLKIT_SHORT_NAME = "GATK"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be private unless there is a compelling reason otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* @return An abbreviated name of the toolkit for this tool. Subclasses may override to provide | ||
* a custom toolkit name. | ||
* | ||
* TODO: stored in the jar manifest, like {@link CommandLineProgram#getToolkitName} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this TODO out to a separate comment in the body so it doesn't show up in the javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, but not sure if that's what you'd like?
There's such feature for GATKTool, but not yet for GATKSparkTool.
Much of the code is copied from the GATKTool version
Engine team, please comment if a refactor is needed and how. Thanks!
(Tagging @droazen @lbergelson and @cmnbroad )