Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes for mapping over collections #6278

Merged
merged 6 commits into from
Jun 11, 2018

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Jun 5, 2018

  • cb97f89 fixes mapping over when multiple collection types, like collection_type="list,paired" are allowed in a input section
  • 9b19d5c fixes wrong collection structure when mapping over
  • 258ccb8 fixes mapping over in conjunction with dynamically discovered output collections

@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 5, 2018

@jmchilton I was hoping we can include this in #6266

@galaxybot galaxybot added this to the 18.09 milestone Jun 5, 2018
@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 5, 2018

This is almost right, the last issue is that we get list:list as collection type when this should be list:paired. Curiously the nested collection type is the correct paired.

@jmchilton
Copy link
Member

My intention was that lists could be reduced by a multi-data parameter but not pairs. So a multi-data parameter could function a list data collection input but not a type="collection" collection_type="list,paired" parameter. Does this change that? I think both are defensible I just want to make sure I understand the intention regarding that.

@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 6, 2018

Yes, it does make it possible to reduce a pair using a multidata input. Like you I'm not sure that's really needed, but I think there were multiple demands for that. (I'd add the issue but I just can't find it ... I think @nekrut wanted to merge paired fastqs at some point ?).
That is something you can easily do with collection operations as well ...

This PR also fixes mapping over list:pair inputs over collection_type="list,paired" inputs. I'll separate that from reducing paired collections.

@jmchilton
Copy link
Member

I've slept on this and given your hesitation also - I'm feeling stronger that we should just not allows pairs to be reduced as lists. Even if it is a sort of borderline thing - it is not something we can undo once we do it right? Is this the issue you are looking for (galaxyproject/tools-iuc#1658 (comment))?

If one expects a tool to have some knowledge of pairing - like one might expect with a QC tool - and it isn't going to exploit the pairing - allowing this is going to yield results that aren't as the user would expect in very subtle ways I assume?

If the user is certain they want to abandon the paired structure information and treat data as a list, they definitely have that option in 18.05 - though it is heavy handed still and we should add a collection operation tool that transforms this directly without defining rules or needing extra options - I'll add it to the list of Tuple-like operations here (#6061).

That said, you are a lot closer to the analysis than me (and smarter) - so I will definitely defer to you. If you still think this is a good idea I'll merge the change.

This PR also fixes mapping over list:pair inputs over collection_type="list,paired" inputs.

Ah - okay I'm working on tool tests today so I might add one for that unless you have WIP somewhere. One thing I want to do is use the new upload API throughout to just upload collections directly with one job and save a lot of time during testing.

@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 6, 2018

Ah - okay I'm working on tool tests today so I might add one for that unless you have WIP somewhere. One thing I want to do is use the new upload API throughout to just upload collections directly with one job and save a lot of time during testing.

Yeah, I'm working on that. Part one is easy, just flip any one of the test tools from collection_type="paired" to collection_type="list,paired" and mapping over will break. The other thing is to make sure that the resulting collection is list:pair and not list:list, but I seem to have forgotten how to get the history contents ...

@jmchilton
Copy link
Member

I'd look at test_zip_list_inputs - it does this check after mapping execution.

@mvdbeek mvdbeek force-pushed the fix_paired_mapping_over branch from 5e6fb5e to b9be148 Compare June 6, 2018 16:48
@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 6, 2018

So my particular choice of test tool uncovered another missing feature -- it looks like dynamically discovering collection output is not compatible with mapping over :/

@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 7, 2018

We do have a test for this though test_map_over_with_discovered_output_collection_elements -- something must be different there. The obvious candidate is that the output type is fixed instead of inferred from the input

@jmchilton
Copy link
Member

The obvious candidate is that the output type is fixed instead of inferred from the input.

I can imagine that being a lot less well supported. The original test has passed in the past but the actual functionality broken since the test wasn't asserting enough about the outputs.

mvdbeek added 2 commits June 8, 2018 10:41
This will break tests trying to map over list:pair collections,
for instance test_extract_workflows_with_dataset_collections.
We'd only try mapping over the first collection type description
otherwise. I suppose we didn't notice because the first type is `list`
and therefore much more common. Noticed this while working on galaxyproject#5640.
@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 11, 2018

I've not been able yet to make the mapping over play nice with dynamically discovering the collection content. It looks like one problem is that we produce the correct output collection structure when setting up the job, where the output child collection is uninitialized. When we collect the data the unpopulated collection that we fetch from the database is then missing the uninitialized child collection, and that causes problems when the CollectionBuilder instance tries to populate the collection

@mvdbeek mvdbeek force-pushed the fix_paired_mapping_over branch from b9be148 to 350c8eb Compare June 11, 2018 08:59
Previously this would have always generated `list:list` structures,
even when the input is a list:pair. I'm not 100% convinced
this is correct, but this seems to work.
@mvdbeek mvdbeek force-pushed the fix_paired_mapping_over branch from 350c8eb to c7e4d66 Compare June 11, 2018 09:35
mvdbeek added 2 commits June 11, 2018 12:20
This doesn't quite work because of the dynamic output collection,
which fails with:

```
galaxy.tools.parameters.output_collect DEBUG 2018-06-06 18:40:38,784 (3) Add dynamic collection datasets to history for output [reverse] (171.970 ms)
galaxy.tools.parameters.output_collect ERROR 2018-06-06 18:40:38,839 Problem gathering output collection.
Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/parameters/output_collect.py", line 340, in collect_dynamic_outputs
    collection_builder.populate()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/dataset_collections/builder.py", line 88, in populate
    elements = self.build_elements()
  File "/Users/mvandenb/src/galaxy/lib/galaxy/dataset_collections/builder.py", line 59, in build_elements
    new_elements[identifier] = element.build()
AttributeError: 'HistoryDatasetAssociation' object has no attribute 'build'
```

This seems to happen because the inner collection is wrongly detected as
nested. In general I doubt that mapping over colleciton output works
with dynamically discovered output collections.
@mvdbeek mvdbeek force-pushed the fix_paired_mapping_over branch from c7e4d66 to 258ccb8 Compare June 11, 2018 12:10
@mvdbeek
Copy link
Member Author

mvdbeek commented Jun 11, 2018

OK, I think 258ccb8 was the important commit to fix mapping over when dynamically discovering collection output.

@mvdbeek mvdbeek changed the title Fix can_map_over for mapping over paired inputs Various fixes for mapping over collections Jun 11, 2018
@jmchilton
Copy link
Member

Sorry about all the edge cases, thanks for sticking with this - these are important, awesome fixes!

@jmchilton jmchilton merged commit 178486b into galaxyproject:dev Jun 11, 2018
@mvdbeek mvdbeek deleted the fix_paired_mapping_over branch June 12, 2018 12:53
@mvdbeek mvdbeek restored the fix_paired_mapping_over branch June 12, 2018 12:53
@mvdbeek mvdbeek deleted the fix_paired_mapping_over branch June 12, 2018 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants