Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picard AddOrReplaceReadGroups can be erroneously run without setting PU #1736

Closed
blankenberg opened this issue Feb 21, 2018 · 14 comments
Closed

Comments

@blankenberg
Copy link
Member

You can execute Picard AddOrReplaceReadGroups tool without setting Platform Unit (PU), however setting PU is required and results in a runtime error:

ERROR: Option 'RGPU' is required.

USAGE: AddOrReplaceReadGroups [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

Replace read groups in a BAM file.This tool enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file.

For more information about read groups, see the GATK Dictionary entry. (https://www.broadinstitute.org/gatk/guide/article?id=6472) 

 This tool accepts INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH) (see http://ga4gh.org/#/documentation).
Usage example:

java -jar picard.jar AddOrReplaceReadGroups \
      I=input.bam \
      O=output.bam \
      RGID=4 \
      RGLB=lib1 \
      RGPL=illumina \
      RGPU=unit1 \
      RGSM=20


Version: 2.7.1-SNAPSHOT

This should be a validation error that prevents the job from being submitted.

@jennaj
Copy link
Member

jennaj commented Oct 19, 2018

Including a default value (as for a the other required options) is also an option. Or, have do what Dan states and have the form not submit and highlight the required value.

Came up at Gitter, so is still impacting users in general (not just coursica students): https://gitter.im/galaxyproject/Lobby?at=5bca0c1eae7be9401682bebe

Prior ticket for same issue when was under devteam (from 2016, closing it as a duplicate): galaxyproject/tools-devteam#423

@jennaj
Copy link
Member

jennaj commented Jan 21, 2019

Test in release 19.01

@jennaj
Copy link
Member

jennaj commented Mar 1, 2019

@davebx The change is not present in version 2.18.2.1 as installed at main or eu. That is the most current tool version in the MTS.

Is this because tool not get a revision bump? Or has the MTS not been updated? Or do we just need to install again from the MTS? (everywhere, all servers).

These smaller changes without revision bumps are hard to track.. appreciate help figuring it out!

@jennaj
Copy link
Member

jennaj commented Mar 1, 2019

Never put linked in test histories for public servers, here they are. Once fixed, can test org.au too and let them know to update as needed

https://usegalaxy.org:/u/jen/h/test-ncbi-sra-tools
https://usegalaxy.eu:/u/jenj/h/test-picard-addorreplacereadgroups-21821

@jennaj
Copy link
Member

jennaj commented May 7, 2019

The MTS was not updated to include this fix yet. Any ideas about why or how to move this forward? Ping @davebx @bgruening cc @jmchilton

Code in MTS has the old code, not what this PR does #2211

from read_group_macros.xml browsed in MTS

    #if $rg_param("PU")
        #set $rg_pu = str($rg_param("PU"))
    #else
        #set $rg_pu = ''

@nsoranzo
Copy link
Member

nsoranzo commented May 8, 2019

@jennaj #2211 only added the default value run for PU, which is present in revision 22:f6ced08779c4 on the MTS. The code you pasted above was not change by #2211.

You probably may want to test by starting the tool from scratch instead of re-running?

@jennaj
Copy link
Member

jennaj commented May 9, 2019

Hum, I coped that code from the repo "view tip files" in the MTS. Weird. Could you double check? Or is there a MTS problem?

@jennaj
Copy link
Member

jennaj commented May 9, 2019

Retested on both Main and EU with fresh test data/histories. Fails both places.

AddOrReplaceReadGroups add or replaces read group information (Galaxy Version 2.18.2.1)

https://usegalaxy.org:/u/jen/h/test-picard-addorreplacereadgroups
https://usegalaxy.eu:/u/jenj/h/test-picard-addorreplacereadgroups

Tool form doesn't have an "Auto-assign" toggle and doesn't add in "run" for PU value at runtime. The command-line doesn't have the PU default info either.

Thanks for helping to sort this out!

Error

Dataset Error
An error occured while running the tool toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_AddOrReplaceReadGroups/2.18.2.1.

Tool execution generated the following messages:

Fatal error: Exit code 1 ()
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/023/422/23422629/_job_tmp -Xmx7g -Xms256m
ERROR: Option 'RGPU' is required.

USAGE: AddOrReplaceReadGroups [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

Assigns all the reads in a file to a single new read-group.

This tool accepts INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH)
(http://ga4gh.org/#/documentation).

Usage example:

java -jar picard.jar AddOrReplaceReadGroups \
I=input.bam \
O=output.bam \
RGID=4 \
RGLB=lib1 \
RGPL=illumina \
RGPU=unit1 \
RGSM=20


Caveats

The value of the tags must adhere (according to the SAM-spec (https://samtools.github.io/hts-specs/SAMv1.pdf)) with the
regex '^[ -~]+$'</code> (one or more characters from the ASCII range 32 through 126). In particular <Space> is the only
non-printing character allowed.

The program enables only the wholesale assignment of all the reads in the INPUT to a single read-group. If your file
already has reads assigned to multiple read-groups, the original RG value will be lost. 

For more information about read-groups, see the GATK Dictionary entry.
(https://www.broadinstitute.org/gatk/guide/article?id=6472)
Version: 2.18.2-SNAPSHOT


Options:

--help
-h                            Displays options specific to this tool.

--stdhelp
-H                            Displays options specific to this tool AND options common to all Picard command line
                              tools.

--version                     Displays program version.

INPUT=String
I=String                      Input file (BAM or SAM or a GA4GH url).  Required. 

OUTPUT=File
O=File                        Output file (BAM or SAM).  Required. 

SORT_ORDER=SortOrder
SO=SortOrder                  Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. 
                              Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate,
                              unknown} 

RGID=String
ID=String                     Read-Group ID  Default value: 1. This option can be set to 'null' to clear the default
                              value. 

RGLB=String
LB=String                     Read-Group library  Required. 

RGPL=String
PL=String                     Read-Group platform (e.g. illumina, solid)  Required. 

RGPU=String
PU=String                     Read-Group platform unit (eg. run barcode)  Required. 

RGSM=String
SM=String                     Read-Group sample name  Required. 

RGCN=String
CN=String                     Read-Group sequencing center name  Default value: null. 

RGDS=String
DS=String                     Read-Group description  Default value: null. 

RGDT=Iso8601Date
DT=Iso8601Date                Read-Group run date  Default value: null. 

RGKS=String
KS=String                     Read-Group key sequence  Default value: null. 

RGFO=String
FO=String                     Read-Group flow order  Default value: null. 

RGPI=Integer
PI=Integer                    Read-Group predicted insert size  Default value: null. 

RGPG=String
PG=String                     Read-Group program group  Default value: null. 

RGPM=String
PM=String                     Read-Group platform model  Default value: null. 

Job info/command line:

Screen Shot 2019-05-08 at 6 02 44 PM

@jennaj
Copy link
Member

jennaj commented May 9, 2019

BTW -- this tool might need a test case that triggers the auto-assign & default input functions .. thoughts on that? Not even sure if possible.

  <tests>
    <test>
      <param name="inputFile" value="picard_ARRG.bam" />
      <param name="LB" value="tumor-a" />
      <param name="PL" value="ILLUMINA" />
      <param name="PU" value="run-1" />
      <param name="SM" value="sample-a" />
      <param name="ID" value="id-1" />
      <output name="outFile" file="picard_ARRG_test1.bam" ftype="bam" />
    </test>
  </tests>

@nsoranzo
Copy link
Member

nsoranzo commented May 9, 2019

Retested on both Main and EU with fresh test data/histories. Fails both places.

That just means that both servers need to update the tool, the default for "Platform unit (PU)" is "run" on our Galaxy instance, which is updated to 22:f6ced08779c4 .
Ping @martenson @bgruening

@martenson
Copy link
Member

added to https://github.com/galaxyproject/usegalaxy-playbook/projects/3

@jennaj
Copy link
Member

jennaj commented May 9, 2019

@nsoranzo Ok, going to trust you on that. I don't see the change in the Tool Shed when browsing tip files (specifically read_group_macros.xml). But maybe something else is going with that.

@jennaj
Copy link
Member

jennaj commented May 9, 2019

and I added it to our tool update tracking at usegalaxy.org here https://github.com/galaxyproject/usegalaxy-playbook/projects/3#column-5164217

and pinged eu that they will want to update, too

@jennaj
Copy link
Member

jennaj commented May 13, 2019

OK, works now on Main. Thanks everyone!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants