Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing conda in the setup page #50

Open
tbooth opened this issue Sep 21, 2023 · 13 comments
Open

Introducing conda in the setup page #50

tbooth opened this issue Sep 21, 2023 · 13 comments
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17

Comments

@tbooth
Copy link
Collaborator

tbooth commented Sep 21, 2023

from @tkphd

Conda is a common and useful tool, but it is simply invoked, not
introduced. Explain what it is (a Python distribution with virtual environment
isolation), how it helps (simplifies dependency management), and how to use it.

The instructions as written appear to update an existing environment, not
create a new one.
The environment is named "snakemake_dash". Why?
The conda_env.yaml file contains a whole lot of specific packages.
Consider filtering this to specify just those packages you would install
manually: snakemake, fastqc, kallisto, etc. Let conda fill in the full
dependency graph.

@tbooth tbooth added the reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17 label Sep 21, 2023
@tbooth
Copy link
Collaborator Author

tbooth commented Sep 21, 2023

Conda is a common and useful tool, but it is simply invoked, not
introduced. Explain what it is (a Python distribution with virtual environment
isolation), how it helps (simplifies dependency management), and how to use it.

I'm not sure the setup page is the place for a Conda tutorial, but I've linked to Ep 10 which has this info.

@tbooth
Copy link
Collaborator Author

tbooth commented Sep 21, 2023

The instructions as written appear to update an existing environment, not
create a new one.

I've emphasised that conda env update really does create the environment. I agree that many people would assume this command would only update an existing environment. Conda has many quirks.

@tbooth
Copy link
Collaborator Author

tbooth commented Sep 21, 2023

The environment is named "snakemake_dash". Why?

Good point. A relic of where the material originated. I've renamed it to `snakemake_capentry' and modified/removed some other references to the DaSH project.

@tbooth
Copy link
Collaborator Author

tbooth commented Sep 21, 2023

The conda_env.yaml file contains a whole lot of specific packages.
Consider filtering this to specify just those packages you would install
manually: snakemake, fastqc, kallisto, etc. Let conda fill in the full
dependency graph.

I did this and made files/conda_env_min.yaml, but I believe that, on Linux at least, using the full manifest is more likely to work and get the desired environment. My experience of Conda is it tends to assume that upgrades of dependent packages are always compatible, but in fact breaking changes are common. The tbb dependency mentioned in ep 10 is a case in point.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

From @cmeesters:

the environment file is frighteningly complex: too many dependencies are defined explicitly - conda (and its implementations) are defined to resolve dependencies automatically. Please restrict yourself to the necessary applications.

versions are pinned for Snakemake and other dependencies, rather than defining minimum versions >=. The way it is, becoming outdated is only a question of time.

new users should be introduced into creating separate environments, a conda environment can be created with the required files in one command

As Snakemake recommends mamba, mamba should be mentioned, too.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

the environment file is frighteningly complex: too many dependencies are defined explicitly - conda
(and its implementations) are defined to resolve dependencies automatically.
Please restrict yourself to the necessary applications.

This is addressed in my comment above. It's not a stylistic choice, but a practical one borne out of experience. Having a frozen Conda environment avoids random breakage as available dependencies change - as long as the packages are on the server it will always work. If the file looks "frightening" it's not my fault - this is just what the files created by conda env export always look like.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

versions are pinned for Snakemake and other dependencies, rather than defining minimum versions >=.
The way it is, becoming outdated is only a question of time.

Pinning the Snakemake version is a deliberate choice. Snakemake has been known to change command syntax even outside of major version changes (see snakemake/snakemake#283) and last time I allowed a Snakemake upgrade (earlier this year) I had to go through and make several changes to the course text. We absolutely cannot allow conda to install the latest version of Snakemake - it has to be done as part of manual lesson maintenance, and tested.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

As Snakemake recommends mamba, mamba should be mentioned, too.

My understanding is that the explicit mamba command is obsolete, so if the Snakemake documentation is still recommending running mamba I think this is what should be updated.

This is already stated in the callout: "Setting solver: libmamba is nowadays preferred to explicitly running the mamba command."

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

new users should be introduced into creating separate environments, a conda environment
can be created with the required files in one command

This option is addressed in episode 10, which I already link to from the setup page. I don't wan't to start a Conda tutorial within the setup page - the idea here is to provide commands that will work reliably, not present all possible options.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

Additionally from @cmeesters

the directory for conda/mamba/micromamba does not need to be created, deviating from the
standard setup is not necessary.

I've removed the mkdir command. I guess we can reasonably assume the user does not have a file named installer.sh already in their current directory. If they do, these instructions now break.

@cmeesters
Copy link

Ah, but conda env export is specific to recreate one environment known to be working. This assumption might or might not hold.

Occasionally, scientific software sees bug fixes. When you then need to update one application, and you are facing too many pinned libraries, conda will cowardly refuse to comply as you narrowed the options of the solver. Consider the setup for the "snakemake-tutorial-data" - it only defines the necessary application / libraries. Notice the fallback to pip for pysaml? That is because the library dependencies in conda are temporarily broken.

In your specific case, the opposite happened: I had to downgrade kallisto because on my AMD Ryzen, the current version always crashed with "illegal instruction".

So for better maintainability, I recommend boiling the environment specs to the minimum.

However, if you wish to have this enormous environment file, I will not argue further. After all, the community of developers for this review ought to decide, not a particularly grumpy reviewer. ;-)

The pinning of Snakemake is a different matter: If you strive for long-term maintainability with this course, you should consider keeping up to date with the main tool you are teaching.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

As far as the minimal environment specs are concerned, they are here:

https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/files/conda_env_min.yaml

This file isn't new, but its not currently linked in the setup page because what I did was create a new environment from this file and then freeze it to make conda_env.yaml. If you use the minimal file directly, you'll get something a bit different as package versions change over time, and you'll quite possibly find that things are then out of step with the examples in the episodes.

I don't see that using a minimal environment would help with your Kallisto problem. In fact, the fix here is to re-generate the frozen environment using the more compatible version of Kallisto. Then it will work for everyone. Can you let me know what version works for you on Ryzen?

If you strive for long-term maintainability with this course, you should consider keeping up to date with the
main tool you are teaching.

I think we're in full agreement about this. In maintaining the course I have an ongoing job to test the material with the newest Snakemake and make changes as necessary. The process is:

  • Build a new conda environment using conda_env_min.yaml
  • Test everything (commands, solutions, sample output)
  • Fix the expisodes as necessary
  • Tweak the working conda env as necessary
  • Freeze that conda env to make a new conda_env.yaml
  • Release updated course

But this isn't covered in the setup.md because it's not the tutor's job, and certainly not the learners job. It's the course maintainers job. So I'm going to add this explanation into CONTRIBUTING.md and put a link in setup.md for anyone who is curious to see.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 26, 2024

I have fixed the frozen Conda environment to use a more compatible version of Kallisto. Fortunately I was able to do this without changing any of the other software versions, aside from the hdf5 library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17
Projects
None yet
Development

No branches or pull requests

2 participants