-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow drag and drop datasets from collections to the tool form. #5384
Conversation
There are some shortcomings here - the original name will still be used in the tool since we are treating the dataset as a stand-alone dataset and not an element in the collection for instance. This is an issue in the display of the selected dataset and in tools that may want to leverage element_identifier. Still - this prevents the user from having to find the right dataset and unhide it to use it in the tool form - good enough for a step forward?
I can see this being useful, of course, but I'm wondering if we could perhaps create an implicit job that creates a new dataset collection from the selected element(s) ? Not sure if this would be more or less of a hack, but the advantage would be that you can maintain this as a collection, so the element identifier problem would be solved. I guess it really depends on the actual use-case for picking out datasets from a collection. I have the (maybe overly) simplistic view that you want to have everything as a dataset collection, with the exception of a final reduction step, perhaps. So far that has worked well for me and I'm having a hard time coming up with a scenario that wouldn't work, but I know some people have problems with this workflow (#740 (comment), #740 (comment) come to mind), so it would be great to have input from that direction. I think the original issue where this came up was #740. @erasche provided a suggestion that I still think is worth exploring, which is being able to drill down into collections for selecting a subset of a collection. I think on the implementation side this could actually trigger a filter collection job. So then users can either explicitly create the correct collection by hand (probably preferable when you have more than 3 replicates and/or multiple conditions, this would be difficult with drag-and-drop as well), or create a subset of a collection in the tool form (for the simple cases). |
Still sort of untrackable from a workflow extraction perspective but it would be a good way of getting that element identifier working the way one would expect in the tool. To evaluate CWL workflows I do have a concept I called "ephemeral collections" (common-workflow-lab@e8495f5 - grep for ephemeral) - this is required in CWL because one can map over the result of expressions and get mapping that doesn't reflect in-database organization of datasets at all. They allow collected operations over "uncollected" datasets the way you are describing here. These aren't things users would want to see ever I think but I haven't decided if keeping them truly ephemeral (in-memory) is a good idea of if there should be something in the database associated with the history but hidden somehow. That said I'm not sure it is worth the effort at this time. Its still an approximation of how things should be working right and only truly useful for flat lists? I can create a collection around the element identifier for a nested list of pairs for instance and I'd just get "forward" as the identifier - I'm not sure that is better than the name? Especially if there are a bunch of forwards in a list - it is actually worse. It is better to work on writing tools that can naturally exploit collection structure, writing tool framework enhancement to adapt collections for tools, and work on tooling and GUI elements giving users the power to reason and reorganize collections at a high level for the data flow they require IMO - and seems like you agree. Another thing to consider is that I want to end HDCA datasets having names at all - the names should be either the deepest element identifier or some path to the identifier (e.g. __forward). These things having names at all was a huge mistake on my part - it has become very clear to me that the things in collections should be collection dataset associations not history dataset associations (a new third kind of
I'm on it - see comments on #740. Thanks for the feedback @mvdbeek. I've argued against doing the very thing in this PR in the past for the reasons you outlined above about it being better to keep collections in collections - I won't be sad if this PR fails. I just did this work because I understand people are very frustrated they can't just pull apart their collection and use the individual pieces the way they want in an ad-hoc fashion and it seemed like low hanging fruit. And while I wouldn't recommend doing this at scale or as part of a standard operating procedure for a particular analysis - during the exploratory phase maybe just quickly flinging datasets around in an ad-hoc fashion lets one get a sense of the data and how to proceed faster? |
I agree, making it easier to reshape collections is the way forward, and creating collections on the fly has some additional complexities (i.e what structure to use when pulling out elements of a nested list:pair collections is a good example). I also like the fact that you can pull out collection elements to figure out why a tool is failing, so there's nothing wrong with the PR, I guess I'm just sad whenever I see a HDA and think about why it is not in a collection. |
I tried to test it, but DnD didn't seem to work for me. Maybe I'm doing something wrong? I built a list of pairs of text files and then tried to drag the pairs, and individual items from within a pair, onto the toolform and dragging didn't seem to happen. Am I doing it wrong?
indeed, that's great for people who really want to use collections for NGS stuff, but those of us who are trying to balance users who want to just do one-off analyses with users who want to use collections to do batch work... something like this would be helpful.
this is an interesting suggestion, I'd only considered selecting single datasets, but a subset+filter job is a neat idea.
yeah, pity that we don't include the full collection specifier, like |
@erasche Just to make sure - did you build the client before testing this? |
@jmchilton that'd prolly be it. Soz mate. Gone for a month and it's a whole new world. |
(afk for 2-4 days, don't wait this on me/my testing.) |
Moving this to WIP - @guerler wants to take a stab at making the implementation... less... unfortunate. |
Dropping in collections from nested collections would be an important feature as well. |
In addition to being a somewhat hacky implementation - there are some shortcomings here from the end-user perspective I think - the original name will still be used in the tool since we are treating the dataset as a stand-alone dataset and not an element in the collection for instance. This is an issue in the display of the selected dataset and in tools that may want to leverage element_identifier. Still - this prevents the researcher from having to find the right dataset and unhide it in the history panel to use it in the tool form - good enough for a step forward?