Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read BED file stored in HDFS as interval list file #4852

Closed
2 tasks done
SHuang-Broad opened this issue Jun 6, 2018 · 0 comments · Fixed by #4854
Closed
2 tasks done

Cannot read BED file stored in HDFS as interval list file #4852

SHuang-Broad opened this issue Jun 6, 2018 · 0 comments · Fixed by #4854
Assignees
Milestone

Comments

@SHuang-Broad
Copy link
Contributor

Bug Report

Affected tool(s) or class(es)

All Spark tools that takes parameter -L

Affected version(s)

  • Latest public release version [4.0.4.0]
  • Latest master branch as of [2018-06-30]

Description

When running a Spark tool and passing in interval arguments via the standard -L argument, if the interval file (only BED file is tested) is stored in HDFS, we see errors like below

org.broadinstitute.hellbender.exceptions.UserException$MalformedGenomeLoc: Badly formed genome unclippedLoc: Query interval "hdfs://shuang-g94794-chmi-chmi3-wgs1-cram-bam-feature-m:8020/data/merged_commonFPDel.bed" is not valid for this input.
	at org.broadinstitute.hellbender.utils.GenomeLocParser.getUnambiguousInterval(GenomeLocParser.java:350)
	at org.broadinstitute.hellbender.utils.GenomeLocParser.parseGenomeLoc(GenomeLocParser.java:309)
	at org.broadinstitute.hellbender.utils.IntervalUtils.parseIntervalArguments(IntervalUtils.java:300)
	at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:226)
	at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:174)
	at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:155)
	at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)
	at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeIntervals(GATKSparkTool.java:514)
	at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeToolInputs(GATKSparkTool.java:451)
	at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:439)
	at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
	at org.broadinstitute.hellbender.Main.main(Main.java:289)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.spark) Job [91a5d7391a4647a89e50717b96eb50e0] entered state [ERROR] while waiting for [DONE].

Steps to reproduce

Run a tool in the following way

gatk ToolNameSpark \
-I hdfs://path/to/bam/test.bam \
-L hdfs://path/to/interval/file/interval.bed \
-O hdfs://path/to/output \
....

Expected behavior

Intervals to be parsed correctly

Actual behavior

Engine tries to interpret the file name as an actual interval.

lbergelson added a commit that referenced this issue Jun 6, 2018
* expand -L support for Feature Files to work with Paths
* previously interval files could be read from Paths, but not feature
files like vcf and bed
* fixes #4852
@lbergelson lbergelson self-assigned this Jun 6, 2018
@lbergelson lbergelson added the bug label Jun 6, 2018
lbergelson added a commit that referenced this issue Jun 7, 2018
* expand -L support for Feature Files to work with Paths
* previously interval files could be read from Paths, but not feature
files like vcf and bed
* fixes #4852
@droazen droazen added this to the Engine-2Q2018 milestone Jun 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants