-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various fixes for mapping over collections #6278
Conversation
@jmchilton I was hoping we can include this in #6266 |
This is almost right, the last issue is that we get |
My intention was that lists could be reduced by a multi-data parameter but not pairs. So a multi-data parameter could function a list data collection input but not a |
Yes, it does make it possible to reduce a pair using a multidata input. Like you I'm not sure that's really needed, but I think there were multiple demands for that. (I'd add the issue but I just can't find it ... I think @nekrut wanted to merge paired fastqs at some point ?). This PR also fixes mapping over |
I've slept on this and given your hesitation also - I'm feeling stronger that we should just not allows pairs to be reduced as lists. Even if it is a sort of borderline thing - it is not something we can undo once we do it right? Is this the issue you are looking for (galaxyproject/tools-iuc#1658 (comment))? If one expects a tool to have some knowledge of pairing - like one might expect with a QC tool - and it isn't going to exploit the pairing - allowing this is going to yield results that aren't as the user would expect in very subtle ways I assume? If the user is certain they want to abandon the paired structure information and treat data as a list, they definitely have that option in 18.05 - though it is heavy handed still and we should add a collection operation tool that transforms this directly without defining rules or needing extra options - I'll add it to the list of Tuple-like operations here (#6061). That said, you are a lot closer to the analysis than me (and smarter) - so I will definitely defer to you. If you still think this is a good idea I'll merge the change.
Ah - okay I'm working on tool tests today so I might add one for that unless you have WIP somewhere. One thing I want to do is use the new upload API throughout to just upload collections directly with one job and save a lot of time during testing. |
Yeah, I'm working on that. Part one is easy, just flip any one of the test tools from |
I'd look at test_zip_list_inputs - it does this check after mapping execution. |
5e6fb5e
to
b9be148
Compare
So my particular choice of test tool uncovered another missing feature -- it looks like dynamically discovering collection output is not compatible with mapping over :/ |
We do have a test for this though |
I can imagine that being a lot less well supported. The original test has passed in the past but the actual functionality broken since the test wasn't asserting enough about the outputs. |
This will break tests trying to map over list:pair collections, for instance test_extract_workflows_with_dataset_collections.
We'd only try mapping over the first collection type description otherwise. I suppose we didn't notice because the first type is `list` and therefore much more common. Noticed this while working on galaxyproject#5640.
I've not been able yet to make the mapping over play nice with dynamically discovering the collection content. It looks like one problem is that we produce the correct output collection structure when setting up the job, where the output child collection is uninitialized. When we collect the data the unpopulated collection that we fetch from the database is then missing the uninitialized child collection, and that causes problems when the CollectionBuilder instance tries to populate the collection |
b9be148
to
350c8eb
Compare
Previously this would have always generated `list:list` structures, even when the input is a list:pair. I'm not 100% convinced this is correct, but this seems to work.
350c8eb
to
c7e4d66
Compare
This doesn't quite work because of the dynamic output collection, which fails with: ``` galaxy.tools.parameters.output_collect DEBUG 2018-06-06 18:40:38,784 (3) Add dynamic collection datasets to history for output [reverse] (171.970 ms) galaxy.tools.parameters.output_collect ERROR 2018-06-06 18:40:38,839 Problem gathering output collection. Traceback (most recent call last): File "/Users/mvandenb/src/galaxy/lib/galaxy/tools/parameters/output_collect.py", line 340, in collect_dynamic_outputs collection_builder.populate() File "/Users/mvandenb/src/galaxy/lib/galaxy/dataset_collections/builder.py", line 88, in populate elements = self.build_elements() File "/Users/mvandenb/src/galaxy/lib/galaxy/dataset_collections/builder.py", line 59, in build_elements new_elements[identifier] = element.build() AttributeError: 'HistoryDatasetAssociation' object has no attribute 'build' ``` This seems to happen because the inner collection is wrongly detected as nested. In general I doubt that mapping over colleciton output works with dynamically discovered output collections.
…scovered collections
c7e4d66
to
258ccb8
Compare
OK, I think 258ccb8 was the important commit to fix mapping over when dynamically discovering collection output. |
…llection is being mapped over
Sorry about all the edge cases, thanks for sticking with this - these are important, awesome fixes! |
collection_type="list,paired"
are allowed in a input section