Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay dependency checks #447

Closed
wants to merge 5 commits into from
Closed

Conversation

wm75
Copy link
Contributor

@wm75 wm75 commented May 10, 2022

With these changes:

  • pangolin will not fail on missing dependencies unless these are required
    by the specific command line. In particular:
    • it will not fail on missing scorpio and/or constellations unless scorpio
      has to be run as part of the command
    • it will not fail on missing pangolin-data unless the data is required
      by the command and is not provided via --datadir
  • combining the --datadir and --update options is prohibited because of the
    complicated effects this would need to have on environment- and
    datadir-installed packages
  • the --datadir command line help has been adjusted to match recent changes

With these changes:

- pangolin will not fail on missing dependencies unless these are required
  by the specific command line. In particular:
  - it will not fail on missing scorpio and/or constellations unless scorpio
    has to be run as part of the command
  - it will not fail on missing pangolin-data unless the data is required
    by the command and is not provided via --datadir
- combining the --datadir and --update options is prohibited because of the
  complicated effects this would need to have on environment- and
  datadir-installed packages
- the --datadir command line help has been adjusted to match recent changes
@wm75
Copy link
Contributor Author

wm75 commented May 10, 2022

This is just a bit of extra work on top of what @pvanheus did in #444.
Most interestingly, it makes it possible to run pangolin without pangolin-data in its environment if you can instead provide a version of the package via --datadir.
In theory, the same should be possible for constellations, but that would require more work since scorpio expets constellations to be importable.

@wm75
Copy link
Contributor Author

wm75 commented May 10, 2022

Ok, I've managed to solve the harder constellations part, too.
With that the pangolin code can finally treat the constellations data exactly like everything else in --datadir, and can leave the discovery of constellation files completely to scorpio.

Together these changes will now allow for running pangolin from a really minimal environment with only scorpio, and no "data" repo dependencies - provided that there is a --datadir with all necessary data components.

@pvanheus @AngieHinrichs @aineniamh ready for review :)

@AngieHinrichs
Copy link
Member

While this is really neat and clean...

when people have pangolin v4, but don't have pangolin-data installed in their conda environment, it usually means that they just ran pangolin --update and did not update their conda environment. And not updating the conda environment causes all sorts of problems, not just the lack of pangolin-data. They will have out-of-date usher, ucsc-fatovcf, probably scikit-learn, etc. There have been a lot of issues filed in this repo because of out-of-date conda environments. So I'm not sure that becoming even more tolerant to the conda environment missing things that really should be present is overall a good thing. ☹️ @aineniamh what do you think?

I would rather move in the direction of making the conda environment as up-to-date as possible. For example, if pangolin could detect that the conda environment is outdated, and even update the conda environment, then pangolin --update would actually do what users assume it does! However, I know that a lot of users have installs that they cannot modify, and running conda update might fail. (But then maybe those users can't run pangolin --update anyway?)

I floated the idea of pangolin running conda update in a comment in #439, kind of hoping that someone more knowledgeable would say 'sure, give it a try' or 'oh god no, don't do that!', but it seems that with all things conda, you just have to do your best, release it, wait and see in what ways it breaks in someone else's environment.

@wm75
Copy link
Contributor Author

wm75 commented May 13, 2022

I agree with most of what you are saying about the state of the conda dependency management @AngieHinrichs, but I'm seeing the increased flexibility with the proposed changes here more as a starting point for addressing these issues rather than as a source of new issues.

when people have pangolin v4, but don't have pangolin-data installed in their conda environment, it usually means that they just ran pangolin --update and did not update their conda environment.

That is true, however the proposed changes will not allow pangolin to run in that situation beyond something simple like pangolin all-versions (which I would argue is a good thing to do for diagnosing the issue). Any pangolin analysis run would still require pangolin-data at least in --datadir, and the only ways to get it in there would be for the user to download it into there deliberately. In particular, I don't propose making --update-data work as a way to install things de novo, but it will continue requiring existing data resources.

Comment on lines 128 to +130
for dependency, version in version_dictionary.items():

if version is None:
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngieHinrichs I introduced this check to make sure you will still not be able to "update" something that isn't there already.

@wm75
Copy link
Contributor Author

wm75 commented May 13, 2022

To explain a bit my overall idea: I've come to think of pangolin v4 as a piece of software that orchestrates lineage assignment tools, and should do this one thing well, rather than an analysis software. As such, I envision it to work from a "pangolin-core" package/environment where starting from there it would be able to discover additional available components and (and, yes, that's important!) do compatibility checks and inform users of important updates for those components.
Of course, all of this still requires a lot more work, but I think such work will be much easier to start from a flexible code base.

@wm75
Copy link
Contributor Author

wm75 commented May 13, 2022

@aineniamh I hope this general idea of pangolin matches yours more or less :)
If so, then I'd be willing to commit a bit more of my time into making dependency management and checks more robust and modular.

@wm75
Copy link
Contributor Author

wm75 commented May 13, 2022

I would rather move in the direction of making the conda environment as up-to-date as possible. For example, if pangolin could detect that the conda environment is outdated, and even update the conda environment, then pangolin --update would actually do what users assume it does! However, I know that a lot of users have installs that they cannot modify, and running conda update might fail. (But then maybe those users can't run pangolin --update anyway?)

I'm very much biased here by my Galaxy perspective (where we would never want to change an existing environment of a tool), but pangolin --update is not something I would ever desire to run. Package management, if required, should be left to conda, otherwise I would rather install what I need with pip or manually.
What I think would be good to have is pangolin reporting compatibility issues clearly and instructing users what to do. Something like: "It seems you're running {component} version {x.y}, but your {data source} is at {a.b}. Please update {component} by running ...".

@aineniamh
Copy link
Member

Hi @wm75 and @AngieHinrichs, this has gone very stale - I was away on extended leave when it was originally flagged and was not checking notifications. I'm not sure it's worth all the extra development time to do this- I think the system that we have works pretty well for now. I'm going to close the PR but happy to have more discussion about it if there are very strong feelings!

@aineniamh aineniamh closed this Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants