Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow drag and drop datasets from collections to the tool form. #5384

Closed
wants to merge 1 commit into from

Conversation

jmchilton
Copy link
Member

In addition to being a somewhat hacky implementation - there are some shortcomings here from the end-user perspective I think - the original name will still be used in the tool since we are treating the dataset as a stand-alone dataset and not an element in the collection for instance. This is an issue in the display of the selected dataset and in tools that may want to leverage element_identifier. Still - this prevents the researcher from having to find the right dataset and unhide it in the history panel to use it in the tool form - good enough for a step forward?

There are some shortcomings here - the original name will still be used in the tool since we are treating the dataset as a stand-alone dataset and not an element in the collection for instance. This is an issue in the display of the selected dataset and in tools that may want to leverage element_identifier. Still - this prevents the user from having to find the right dataset and unhide it to use it in the tool form - good enough for a step forward?
@mvdbeek
Copy link
Member

mvdbeek commented Jan 25, 2018

I can see this being useful, of course, but I'm wondering if we could perhaps create an implicit job that creates a new dataset collection from the selected element(s) ? Not sure if this would be more or less of a hack, but the advantage would be that you can maintain this as a collection, so the element identifier problem would be solved.

I guess it really depends on the actual use-case for picking out datasets from a collection. I have the (maybe overly) simplistic view that you want to have everything as a dataset collection, with the exception of a final reduction step, perhaps. So far that has worked well for me and I'm having a hard time coming up with a scenario that wouldn't work, but I know some people have problems with this workflow (#740 (comment), #740 (comment) come to mind), so it would be great to have input from that direction.

I think the original issue where this came up was #740. @erasche provided a suggestion that I still think is worth exploring, which is being able to drill down into collections for selecting a subset of a collection. I think on the implementation side this could actually trigger a filter collection job. So then users can either explicitly create the correct collection by hand (probably preferable when you have more than 3 replicates and/or multiple conditions, this would be difficult with drag-and-drop as well), or create a subset of a collection in the tool form (for the simple cases).

@jmchilton
Copy link
Member Author

Not sure if this would be more or less of a hack, but the advantage would be that you can maintain this as a collection, so the element identifier problem would be solved.

Still sort of untrackable from a workflow extraction perspective but it would be a good way of getting that element identifier working the way one would expect in the tool. To evaluate CWL workflows I do have a concept I called "ephemeral collections" (common-workflow-lab@e8495f5 - grep for ephemeral) - this is required in CWL because one can map over the result of expressions and get mapping that doesn't reflect in-database organization of datasets at all. They allow collected operations over "uncollected" datasets the way you are describing here. These aren't things users would want to see ever I think but I haven't decided if keeping them truly ephemeral (in-memory) is a good idea of if there should be something in the database associated with the history but hidden somehow.

That said I'm not sure it is worth the effort at this time. Its still an approximation of how things should be working right and only truly useful for flat lists? I can create a collection around the element identifier for a nested list of pairs for instance and I'd just get "forward" as the identifier - I'm not sure that is better than the name? Especially if there are a bunch of forwards in a list - it is actually worse. It is better to work on writing tools that can naturally exploit collection structure, writing tool framework enhancement to adapt collections for tools, and work on tooling and GUI elements giving users the power to reason and reorganize collections at a high level for the data flow they require IMO - and seems like you agree.

Another thing to consider is that I want to end HDCA datasets having names at all - the names should be either the deepest element identifier or some path to the identifier (e.g. __forward). These things having names at all was a huge mistake on my part - it has become very clear to me that the things in collections should be collection dataset associations not history dataset associations (a new third kind of DatasetInstance). I consider this all part of #1810. If the "name" was the element identifier as far as the API and tool evaluator was concerned - we would be building that collection for no reason I think.

@erasche provided a suggestion that I still think is worth exploring, which is being able to drill down into collections for selecting a subset of a collection. I think on the implementation side this could actually trigger a filter collection job.

I'm on it - see comments on #740.

Thanks for the feedback @mvdbeek. I've argued against doing the very thing in this PR in the past for the reasons you outlined above about it being better to keep collections in collections - I won't be sad if this PR fails. I just did this work because I understand people are very frustrated they can't just pull apart their collection and use the individual pieces the way they want in an ad-hoc fashion and it seemed like low hanging fruit. And while I wouldn't recommend doing this at scale or as part of a standard operating procedure for a particular analysis - during the exploratory phase maybe just quickly flinging datasets around in an ad-hoc fashion lets one get a sense of the data and how to proceed faster?

@mvdbeek
Copy link
Member

mvdbeek commented Jan 25, 2018

I agree, making it easier to reshape collections is the way forward, and creating collections on the fly has some additional complexities (i.e what structure to use when pulling out elements of a nested list:pair collections is a good example).

I also like the fact that you can pull out collection elements to figure out why a tool is failing, so there's nothing wrong with the PR, I guess I'm just sad whenever I see a HDA and think about why it is not in a collection.

@jmchilton jmchilton modified the milestones: 18.01, 18.05 Jan 25, 2018
@hexylena
Copy link
Member

this pr

I tried to test it, but DnD didn't seem to work for me. Maybe I'm doing something wrong? I built a list of pairs of text files and then tried to drag the pairs, and individual items from within a pair, onto the toolform and dragging didn't seem to happen. Am I doing it wrong?

I have the (maybe overly) simplistic view that you want to have everything as a dataset collection, with the exception of a final reduction step, perhaps.

indeed, that's great for people who really want to use collections for NGS stuff, but those of us who are trying to balance users who want to just do one-off analyses with users who want to use collections to do batch work... something like this would be helpful.

which is being able to drill down into collections for selecting a subset of a collection

this is an interesting suggestion, I'd only considered selecting single datasets, but a subset+filter job is a neat idea.

I can create a collection around the element identifier for a nested list of pairs for instance and I'd just get "forward" as the identifier

yeah, pity that we don't include the full collection specifier, like collection name / pair name / forward as the used identifier which would provide more information than just a bunch of forwards.

@jmchilton
Copy link
Member Author

@erasche Just to make sure - did you build the client before testing this?

@hexylena
Copy link
Member

@jmchilton that'd prolly be it. Soz mate. Gone for a month and it's a whole new world.

@hexylena
Copy link
Member

(afk for 2-4 days, don't wait this on me/my testing.)

@jmchilton
Copy link
Member Author

Moving this to WIP - @guerler wants to take a stab at making the implementation... less... unfortunate.

@mvdbeek
Copy link
Member

mvdbeek commented Feb 3, 2018

Dropping in collections from nested collections would be an important feature as well.
I was checking out if I could actually use nested collections in #740 (comment), but unfortunately there isn't a way to pull them out in a way that would be usable right now. Drag-and-drop would help, though that is still not workflow compatible.

@hexylena
Copy link
Member

hexylena commented Feb 3, 2018

just a small note @mvdbeek, for most of my users, solving the UX problem in #740 is significantly more important than having it be workflow compatible. they (and I) would happily trade the ability to extract a workflow for the ability to just do their analyses interactively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants