Rename output in workflow fails on paired dataset collection #1675

dmaticzka · 2016-02-04T13:32:01Z

The "Rename dataset" workflow feature fails for paired datasest collections. When trying to rename the output using e.g. #{input_1} or #{library}.bam the filenames generated on saving as file only contain an empty string, e.g. "Galaxy3-[.bam].bam". The naming of the pairs in the output dataset collection displayed by galaxy is fine, however.

This happens for fastq-join, bowtie2 and hisat2 so it does not seem to be tool-related. For bowtie2 also reported at biostars: https://biostar.usegalaxy.org/p/14911/

bgruening · 2016-02-18T10:51:29Z

I would like to high-jack this issue and raise again the general naming dataset issue.
We had this discussion on how we should name dataset and preserve a useful filename over datasets several times before. For example here: https://trello.com/c/dQA7Y5vS.

The problem now gets even more complicated that we started to use collections. Such lines https://github.com/galaxyproject/tools-devteam/blob/master/tools/bowtie2/bowtie2_wrapper.xml#L485 are only (limited) useful in single-input mode, but in collections it nearly useless. In the workflow editor we have the possibility to rename datasets with #{input_1} or similar constructs, this is doable but not user-friendly and more importantly we do not have such a mechanism for the analysis mode.

Not to mention downloading datasets. These often ends up in not-usable filenames. I guess it's time to discuss this issue once and for all and fix it finally. Maybe the Galaxy team can make a start during the retreat and discuss possibilities?

mvdbeek · 2016-02-18T10:56:43Z

https://github.com/galaxyproject/tools-devteam/blob/master/tools/bowtie2/bowtie2_wrapper.xml#L485

~~I guess this should be .element_identifier, instead of .name, which defaults to .name outside of collections.~~

Maybe the resolution of ${on_string} could be improved.

In general, I think element_identifier should be available in the workflow editor.
Though I have to say I haven't used paired dataset collections at all, perhaps this is more complicated then I am aware :/

bwlang · 2016-02-18T22:20:17Z

Yes yes yes

This is the single largest complaint I get from users.

Please could things be named using both the tool and the original input
file names?

Brad

On Thursday, February 18, 2016, Björn Grüning [email protected]
wrote:

I would like to high-jack this issue and raise again the general naming
dataset issue.
We had this discussion on how we should name dataset and preserve a
useful filename over datasets several times before. For example here:
https://trello.com/c/dQA7Y5vS.

The problem now gets even more complicated as we starting to use
collections. Such lines
https://github.com/galaxyproject/tools-devteam/blob/master/tools/bowtie2/bowtie2_wrapper.xml#L485
are only (limited) useful in single-input mode, but in collections it
nearly useless. In the workflow editor we have the possibility to rename
datasets with #{input_1} or similar constructs, this is doable but not
user-friendly and more importantly we do not have such a mechanism for the
analysis mode.

Not to mention downloading datasets. These often ends up in not-usable
filenames. I guess it's time to discuss this issue once and for all and fix
it finally. Maybe the Galaxy team can make a start during the retreat and
discuss possibilities?

—
Reply to this email directly or view it on GitHub
#1675 (comment)
.

lparsons · 2016-05-24T14:16:58Z

Dataset naming is the single biggest issue I have now. Here is an attempt to collect various related issues. Perhaps someone on the Galaxy team would like to create one large ticket to collect dataset naming issues (and add to the roadmap #1928?) @jmchilton, @martenson?

Enhancements:

Ability to name datasets using the element identifier: Expose element_identifier as a workflow parameter variable. #2006
Ability to define the collection name in a workflow: Enhancement: Ability to name collection in a workflow #2398
Name datasets according to both collection name and element identifier: Feature request - Filenames based on dataset collection identifier #2140, Download files in a list of datasets with the name from the list #2023

Bug Fixes:

Renaming using the input from paired dataset collections: Rename output in workflow fails on paired dataset collection #1675, Rename output file on workflow #1686

lparsons · 2016-08-17T16:17:51Z

+1 to get this fixed in 16.10 (and backported?)

jmchilton · 2016-09-30T14:44:23Z

I'm going to skip the middle comments here - they are serious issues and they need to be addressed - it is just that we don't really know how to address them and there isn't agreement across the team or community on how to. It is too big for this particular issue.

The issue here is that the GUI isn't showing you the "name" of the dataset - it is showing you the element identifier for that element in the collection. I don't consider this to be a bug - in most cases you want the element identifier and the "name" of collection items is irrelevant. If there is a rename post job action on a collection mapping step - the collection itself should probably be renamed usually instead of the items in the collection. There is a feature request issue I created for that - #1680. There should also be a way to see the dataset name in the GUI for people that want to IMO - but I doubt @carlfeberhard agrees and I can see the case against it pretty easily.

tl;dr) The names have changed - we just aren't showing them.

dmaticzka · 2016-10-06T09:15:58Z

I'm not concerned with what is shown by the GUI, my problem is that the rename does not work for paired collections on the file name level. Rename just drops everything resulting in filenames like "Galaxy12-[].bam". With that, there's no way to know what data this was and where it came from.

The current alternative of not doing the rename action results in filenames like "Galaxy9-[Bowtie2_on_data_2_and_data_1__aligned_reads_(sorted_BAM)].bam", here also no association between files and the elements shown by the GUI is possible. Being able to show the dataset name in the GUI would allow this, but it wouldn't be pretty :)

I have no issue with the use of element identifiers by the GUI --- my concern here is being able to identify which set belongs to which input and that works nicely when using only the GUI.

…ollections. xref galaxyproject#1675 This is of limited utility since we don't really expose the name - and intentionally so. Related open bugs/enhancements that still need to be addressed are: - Applying rename to the collection (in addition to the elements) - galaxyproject#1680. - Download of collection elements with element identifier instead of the name: galaxyproject#2023 / galaxyproject#2140.

jmchilton · 2017-04-27T18:29:45Z

The current alternative of not doing the rename action results in filenames like "Galaxy9-[Bowtie2_on_data_2_and_data_1__aligned_reads_(sorted_BAM)].bam", here also no association between files and the elements shown by the GUI is possible.

#3985 fixes the downloaded name so hopefully this whole issue is now moot. As such I guess I'm going to close this as a duplicate of #2140.

(If this proves not quite enough and what is actually desired is for the collection itself to be renamed by the PJA - there is another open issue #1680. Hopefully #3985 is good enough though.)

dpryan79 · 2017-05-09T09:59:15Z

Just to clarify, after #3985, will the post-job action on paired-collections actually work to rename the individual (usually hidden) history elements? My current issue is related to what @dmaticzka reported, though in my case trying to use a post-job rename action results in the following error:

galaxy.workflow.run ERROR 2017-05-09 11:41:40,511 Failed to schedule Workflow[id=1533,name=PE DNA mapping (May 9th 2017)], problem occurred on WorkflowStep[index=2,type=tool].
Traceback (most recent call last):
  File "/galaxy-central/lib/galaxy/workflow/run.py", line 169, in invoke
    jobs = self._invoke_step( step )
  File "/galaxy-central/lib/galaxy/workflow/run.py", line 239, in _invoke_step
    jobs = step.module.execute( self.trans, self.progress, self.workflow_invocation, step )
  File "/galaxy-central/lib/galaxy/workflow/modules.py", line 1110, in execute
    self._handle_post_job_actions( step, job, invocation.replacement_dict )
  File "/galaxy-central/lib/galaxy/workflow/modules.py", line 1153, in _handle_post_job_actions
    ActionBox.execute( self.trans.app, self.trans.sa_session, pja, job, replacement_dict )
  File "/galaxy-central/lib/galaxy/jobs/actions/post.py", line 398, in execute
    ActionBox.actions[pja.action_type].execute(app, sa_session, pja, job, replacement_dict)
  File "/galaxy-central/lib/galaxy/jobs/actions/post.py", line 151, in execute
    replacement = hdca.name
AttributeError: 'DatasetCollectionElement' object has no attribute 'name'

The major annoyance is that without post-job renaming, while everything is still nicely labeled inside of collections, the element identifier isn't actually changed, so feeding a collection of mapped bam files into multiBamSummary (to use an example of a tool that uses element identifiers to label samples) still results in everything being labeled "bowtie2 on data 6 and 2" or something like that.

jmchilton · 2017-05-09T12:56:51Z

@dpryan79 The element identifier shouldn't be "bowtie2 on data 6 and 2" - that would be really odd. The element identifier should be preserved from the beginning of the workflow throughout in most cases. This is the newest multiBamSummary that includes deeptools/deepTools#500?

dpryan79 · 2017-05-09T13:03:34Z

Yes, this the most recent version, so that's what the element identifier is actually getting set as (this also matches what the hidden history items are named as). I'm running Galaxy 17.01, so if this is changed in the upcoming 17.05 then consider me already happy :)

chambm · 2017-06-21T19:38:08Z

@dpryan79 I'm running up to date release_17.05 (as of yesterday) and got the same error you posted above when trying a PJA rename on paired collection.

dpryan79 · 2017-06-21T19:45:05Z

@chambm :(

chambm · 2017-06-21T21:27:33Z

Changing post.py:150 from:
replacement = hdca.name
to
replacement = hdca.element_identifier
Fixed it for me.

nsoranzo added kind/bug area/workflows labels Feb 4, 2016

mvdbeek self-assigned this Feb 18, 2016

mvdbeek added this to the 16.04 milestone Feb 18, 2016

martenson modified the milestones: 16.07, 16.04 Apr 5, 2016

martenson modified the milestones: 16.10, 16.07 Jul 27, 2016

lparsons mentioned this issue Sep 20, 2016

Input naming deficiencies #2752

Open

zipho mentioned this issue Sep 29, 2016

Input naming flexibility - tracking the input name throughout the analysis #2980

Open

jmchilton mentioned this issue Sep 30, 2016

Rename output file on workflow #1686

Open

martenson modified the milestones: 17.01, 16.10 Nov 16, 2016

martenson added the popularity/significant label Jan 12, 2017

martenson assigned jmchilton and mvdbeek and unassigned mvdbeek Jan 12, 2017

martenson modified the milestones: 17.01, 17.05 Jan 12, 2017

This was referenced Apr 27, 2017

Add test case clarifying datasets do get "renamed" by rename PJA in collections. #3983

Merged

Dialog w/options for Downloading Collections #3984

Open

jmchilton closed this as completed Apr 27, 2017

mblue9 mentioned this issue Sep 18, 2017

HISAT2 in workflows: renaming output is broken galaxyproject/tools-iuc#1478

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename output in workflow fails on paired dataset collection #1675

Rename output in workflow fails on paired dataset collection #1675

dmaticzka commented Feb 4, 2016

bgruening commented Feb 18, 2016

mvdbeek commented Feb 18, 2016

bwlang commented Feb 18, 2016

lparsons commented May 24, 2016

lparsons commented Aug 17, 2016

jmchilton commented Sep 30, 2016 •

edited

Loading

dmaticzka commented Oct 6, 2016

jmchilton commented Apr 27, 2017

dpryan79 commented May 9, 2017

jmchilton commented May 9, 2017

dpryan79 commented May 9, 2017

chambm commented Jun 21, 2017

dpryan79 commented Jun 21, 2017

chambm commented Jun 21, 2017

Rename output in workflow fails on paired dataset collection #1675

Rename output in workflow fails on paired dataset collection #1675

Comments

dmaticzka commented Feb 4, 2016

bgruening commented Feb 18, 2016

mvdbeek commented Feb 18, 2016

bwlang commented Feb 18, 2016

lparsons commented May 24, 2016

lparsons commented Aug 17, 2016

jmchilton commented Sep 30, 2016 • edited Loading

dmaticzka commented Oct 6, 2016

jmchilton commented Apr 27, 2017

dpryan79 commented May 9, 2017

jmchilton commented May 9, 2017

dpryan79 commented May 9, 2017

chambm commented Jun 21, 2017

dpryan79 commented Jun 21, 2017

chambm commented Jun 21, 2017

jmchilton commented Sep 30, 2016 •

edited

Loading