Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Dependency Resolver Mapping #3444

Merged
merged 12 commits into from
Jan 23, 2017

Conversation

jmchilton
Copy link
Member

  • 495802d: Implements the ability to map requirement names and/or versions into other names and/or versions for specific sites. This is implemented for the Conda, Galaxy Package, and environment module dependency resolver. I will follow up with more documentation if merged - but the unit tests show off the format. Implements framework Ship a package-to-conda mapping file with Galaxy #1927.
  • 81d71d2: Implements specification expansion as outlined in the CWL, but per the discussion in Allow unambiguous specification of requirement names #3117 I have stripped the ability for Galaxy tools to actually leverage this functionality in 2474da0. It is still important library functionality for galaxy-lib though so the guts of dependency resolution still allow it when coming from CWL tools.
  • ca9adfd: Refactors how different resolvers can be configured. It is more modular and a better design (it implements an important TODO block from the code and allows some of these other modifications).
  • 6de74a7: Previously in Resolve Conda Dependencies All at Once #3391 we added the ability for resolvers to resolve all dependencies for a tool together. This introduced some design technical debt because resolve_all took in ToolRequirement objects whereas resolve took in name, version, and type specifically. Mapping and specification handling is a lot cleaner the interface more cohesive when both methods just consume ToolRequirements - so this change is made in 6de74a7.
  • More minor refactorings are found in 5fc5ab1 and 4a544e9 that generally increase the consistency of this package.

This is labelled as 17.05 but if anyone wants to merge it before #3433 is merged and we branch - I wouldn't complain 😉. It will give us greater flexibility to adapt old tools to new Conda packages.

@jmchilton jmchilton added area/tool-dependencies kind/enhancement kind/refactoring cleanup or refactoring of existing code, no functional changes labels Jan 18, 2017
@jmchilton jmchilton added this to the 17.05 milestone Jan 18, 2017
@jxtx
Copy link
Contributor

jxtx commented Jan 18, 2017

image

@mvdbeek
Copy link
Member

mvdbeek commented Jan 18, 2017

This looks excellent! 👍 (also for merging before branching off 17.01)
I also wouldn't mind if we do implement the spec part, this should help with perfectly valid arguments like this, right?

It will give us greater flexibility to adapt old tools to new Conda packages.

For this we'd need to ship a template with galaxy, right?

I think we could even strip out the cached_dependency stuff ... this should become a corner case where a tool defines both toolshed and conda dependencies that differ in name or version. So if people do run into locking issues, all they (or us if we ship a template) have to do is map the redundant dependency onto the dependency that can be resolved. We just need to do a set on the requirements.

@jmchilton
Copy link
Member Author

For this we'd need to ship a template with galaxy, right?

Yeah - but I think the pieces to do that are here in this file. I'd consider it to be a bug fix to ship this file after the release myself. I can add an empty variant of such a file if you'd like.

I also wouldn't mind if we do implement the spec part

Obviously I'm on board for that too but lets keep that out of this PR. I feel like a lot of what happened in the failed #3117 was people stepping up and describing use cases and the desire to implement the mapping stuff (now in this PR) - lets get this in as is and then we can think about use cases driving the specification stuff and make an argument for with the mapping stuff as given.

if people do run into locking issues, all they (or us if we ship a template) have to do is map the redundant dependency onto the dependency that can be resolved. We just need to do a set on the requirements.

That is an excellent point.

@jmchilton jmchilton force-pushed the deps_framework_overhaul branch from 495802d to 039b206 Compare January 18, 2017 19:29
jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 18, 2017
@jmchilton
Copy link
Member Author

jmchilton commented Jan 18, 2017

Rebased so all the links above are invalid. But I've added a variant of #3433 and a default Conda mapping file in 039b206 along with a test case that verifies R is mapped to "r-base" when using the Conda resolver.

Update: Actually forget to git add some stuff - maybe a95eebb is a better attempt at the sample and default stuff.

@jmchilton jmchilton force-pushed the deps_framework_overhaul branch 2 times, most recently from a95eebb to 89b920d Compare January 19, 2017 14:57
jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 19, 2017
@jmchilton
Copy link
Member Author

I'm fairly confident that failing API test is unrelated.

@bgruening
Copy link
Member

@jmchilton can we also map such things? https://github.com/galaxyproject/galaxy/blob/dev/tools/plotting/bar_chart.xml#L39
it would make a lot of sense to map the tools shipped with Galaxy.

@jmchilton
Copy link
Member Author

@bgruening We can just update those to use the actual conda requirements directly because requirements of type python-module have never done anything functional - there is no worry about breaking backward compatibility.

@bgruening
Copy link
Member

Fair enough, but then we can remove these deps from the tool and add real conda deps? And you could iterate during startup over all internal tools and install deps?

jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 19, 2017
…ency resolver.

I'm not sure if we need to update the version since previously these tags were purely for documentation and had no functional purpose.

xref this comment galaxyproject#3444 (comment) from @bgruening
@jmchilton
Copy link
Member Author

@bgruening I suppose - the PR that adds the script that iterates over all the tools and creates missing Conda environments is #3430 - no one really clamored to have it in the release so it remains as WIP. If we want to have it in 17.01 - I can rebase it on top of this PR and keep working on polishing it up. It might take another day or so. It takes so long though and certain things are always going to fail - that I think it would be better to just document that you can run the script (perhaps each time there is a failed dependency resolution). This would still be a significant step forward in terms of getting a working Galaxy running quickly.

I'm open to working on that - it is up to everyone else how much we delay the release for this stuff. Also it would be nice to have a UI - that just showed all the tools and gave a little install button to run through and do this work.

@mvdbeek
Copy link
Member

mvdbeek commented Jan 21, 2017

It takes so long though and certain things are always going to fail - that I think it would be better to just document that you can run the script (perhaps each time there is a failed dependency resolution).

I agree with you here. Maybe we can collapse identical requirements, so that it's not attempting X times to install the thing that fails, but even then I think this needs a UI and shouldn't delay branching.

I'm open to working on that - it is up to everyone else how much we delay the release for this stuff. Also it would be nice to have a UI - that just showed all the tools and gave a little install button to run through and do this work.

I agree on that as well, but for me this PR is good and ready to go -- I would be in favor of merging.

@mvdbeek
Copy link
Member

mvdbeek commented Jan 21, 2017

@galaxybot test this

@bgruening
Copy link
Member

I'm a little bit scared by this large PR, but I'm also in favor of merging it now.
@jmchilton the script that iterates over all tools (maybe datatypes) can wait until the next release I think. We might test this more extensively for all tools and also add features maybe to solve datatype dependencies?

@jmchilton
Copy link
Member Author

I'm also in favor of merging it now.

Merge it then 😉, we can delay the tool installation portion for the next release I'm fine with that.

@bgruening
Copy link
Member

bgruening commented Jan 23, 2017

Sorry, can you merge in dev, I will merge it asap then!

@jmchilton
Copy link
Member Author

Doh - that went very stale, very quickly. I'm working on it.

@jmchilton jmchilton force-pushed the deps_framework_overhaul branch from 89b920d to 47e39b0 Compare January 23, 2017 14:17
jmchilton added a commit to jmchilton/galaxy that referenced this pull request Jan 23, 2017
@jmchilton
Copy link
Member Author

Okay - I rebased everything on top of dev again. There was some oddness when merging with differences in #3461. I've looked over the resulting diff and ran the tool dependency related tests though and things seem fine.

return ToolRequirement( name=name, type=type, version=version, specs=specs )

def __eq__(self, other):
return self.name == other.name and self.type == other.type and self.version == other.version
Copy link
Member

@mvdbeek mvdbeek Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only correct if we assume that specs are the same if name type and version are the same, otherwise we also need to check that the specs are equal (same for __hash__).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it looks like something went wrong (duplication of lines 75-82 41-48?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I messed something up - I had hand copied something are the rebase from around here - let me take a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I figured out what happened - it took these lines and added them to RequirementSpecification (or put the class definition inside of ToolRequirement) and I didn't notice. Then the tests were failing so I added the eq, hash, etc... lines back to ToolRequirement not noticing that RequirementSpecification had the wrong lines.

I've pushed a commit to fix this I think and I think I also handled specs correctly in the eq and hash calculations - but let me know if you think they could be improved or need to be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looking good. I'm gonna merge once the tests pass.

This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve galaxyproject#1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can.

To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this:

```
<requirement type="package" version="2.2.31" name="blast+">
```

Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both.

This introduces the following syntax for tools with profile >= 16.10:

```
<requirement type="package" version="2.2.31" name="blast+">
    <specification uri="https://anaconda.org/bioconda/blast" />
    <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" />
</requirement>
```

This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so).

This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10:

```
<requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/[email protected],https://packages.debian.org/jessie/[email protected]">blast+</requirement>
```

This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies.

For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage.

Conflicts:
	lib/galaxy/tools/deps/__init__.py
	test/functional/tools/samples_tool_conf.xml
Per discussion on galaxyproject#3117 - some people were in favor of the idea but some were opposed. I think it is still important to implement this for CWL tools and useful from a library perspective. This just removes changes to Galaxy tool XML.
The handling of global configuration options for Conda specific options was hacky and a poor design. This is much more modular and allows the elegant cascading option sources for all resolvers.
This follows the existing pattern within the module.
If resolve_all takes in requirements, then so should resolve.
…laxy.

- Construct a DependencyManager according to Galaxy's configuration before Galaxy starts to allow auto installation of Conda.
- Add script for managing tool dependencies outside of a Galaxy environment.
- Refactor to allow this to be done without duplicating logic related to defaults, configuration handling, etc....

Implements galaxyproject#3426.
Everyone gets a Conda when Galaxy starts up.
- Map R to r-base and blast+ to blast for conda dependencies.
- Add sample environment modules file showing off the new mapping feature.
- Add integration test case to verify the default conda mapping (R -> r-base in particular).
This became the default behavior with conda_auto_init defaulting to True.
@jmchilton jmchilton force-pushed the deps_framework_overhaul branch from 47e39b0 to 3e3d41f Compare January 23, 2017 14:47
@mvdbeek mvdbeek merged commit 98ee91f into galaxyproject:dev Jan 23, 2017
@mvdbeek
Copy link
Member

mvdbeek commented Jan 23, 2017

Alright, thanks a lot @jmchilton, I think this is a big step ahead!

@@ -22,10 +25,11 @@ class DependencyResolver(Dictifiable, object):
# resolution.
disabled = False
resolves_simple_dependencies = True
config_options = {}
__metaclass__ = ABCMeta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is not Python3-safe, the @six.add_metaclass(ABCMeta) decorator should be used instead. There are a few more instances in this file and elsewhere in lib/galaxy/, I will take care of fixing them all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #3491.

@@ -98,6 +103,11 @@ if [ -z "$GALAXY_CONFIG_FILE" ]; then
export GALAXY_CONFIG_FILE
fi

if [ $INITIALIZE_TOOL_DEPENDENCIES -eq 1 ]; then
# Install Conda environment if needed.
python ./scripts/manage_tool_dependencies.py -c "$GALAXY_CONFIG_FILE" init_if_needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmchilton @natefoo Should this be run as part of ./scripts/common_startup.sh (called at line 66) instead of only when Galaxy is started through run.sh ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think so - this is of no use to the reports app or the tool shed for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, common_startup.sh is not the right place for the reason you mentioned. But I was concerned about servers started through e.g. supervisor by invoking directly uwsgi or ./scripts/paster.py .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will need to setup Conda - and it would be best to do that manually before hand but Galaxy will attempt to do it on its own also I think (with the multi-process problems that entails). I guess this becomes a documentation problem in my mind - at least until we get to the point where we have a GUI for managing this stuff (doing it from the GUI when an admin logs in would be ideal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tool-dependencies kind/enhancement kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants