Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the setup instructions #42

Closed
tbooth opened this issue Sep 18, 2023 · 4 comments
Closed

Improve the setup instructions #42

tbooth opened this issue Sep 18, 2023 · 4 comments
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17

Comments

@tbooth
Copy link
Collaborator

tbooth commented Sep 18, 2023

Fromn @jdblischak

I think the Setup would benefit from a few improvements, especially if self-learners are going to follow the instructions alone.

For installing the data, the wget command is provided explicitly, but then users are left to remember the tar flags on their own. Best to remove this early barrier and provide the explicit commands to prepare the data. Something like below:

wget --content-disposition https://ndownloader.figshare.com/files/35058796
tar xJf data-for-snakemake-novice-bioinformatics.tar.xz
ls -R data/

In lesson 10 on conda integration, it states:

We’ll not talk about installing Conda, since it is already set up on the systems we are using.

But you provide a link to Miniconda in the setup instructions. I recommend replacing the above text with a link back to the Setup instructions.

Also, I wasn't able to run conda env update --file conda_env.yaml with the recommended setting of channel_priority: strict. I had to temporarily disable it in my .condarc. As Snakemake strongly encourages users to set strict channel priority when using --use-conda, this could potentially trip up false beginners that have already started using Snakemake. Another suggestion is to use conda list --explicit > conda_env.frozen.yaml to bypass the conda solver altogether, and allow users to immediately install the exact packages that you used (though this would only work for linux)

@tbooth tbooth added the reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17 label Sep 18, 2023
@tbooth
Copy link
Collaborator Author

tbooth commented Sep 18, 2023

From @tkphd

Suggest breaking this up into (at least) 3 sections: Software, Data, and
Editor.

The instructor may need to show this page at the beginning of the lesson for
those who did not already work through the setup. Recommend printing links in
their entirety rather than using anchor tags, e.g.,
Download and unpack the sample dataset tarball from https://ndownloader.figshare.com/files/35058796

Software
Conda is a common and useful tool, but it is simply invoked, not
introduced. Explain what it is (a Python distribution with virtual environment
isolation), how it helps (simplifies dependency management), and how to use it.

The instructions as written appear to update an existing environment, not
create a new one.
The environment is named "snakemake_dash". Why?
The conda_env.yaml file contains a whole lot of specific packages.
Consider filtering this to specify just those packages you would install
manually: snakemake, fastqc, kallisto, etc. Let conda fill in the full
dependency graph.
Data
Specify where to download (home directory? Desktop?) and how to extract this
file. Tarballs are unfamiliar to most Windows users. The linked file is
also an xzip-compressed Tar archive, which may require extra packages on
some Linux distributions.
The provided wget command results in "403: Forbidden" on a current Debian
system. With the updated URL, this worked: wget https://figshare.com/ndownloader/files/35058796 -O data.tar.xz
The contents of this file are nested two directories deep: the top-level
"data" folder is extraneous.
Package this slice of a dataset with a README explaining its provenance and
intended usage, with citations and attribution to the original authors.
(Aspire to FAIR principles.)
It is unclear whether CC BY-SA applies to a pure dataset, which is not
typically eligible for copyright protection: this is not a creative work.
Was the source dataset released under a license agreement?
Editor
Two editors are mentioned here, but no editor is invoked in the lesson material.
Throughout, when showing changes to a file, preface it with the command the
instructor should use to launch the editor.
Provide installation instructions or suggest a framework (like
gitforwindows) that provides an editor.
"Setup" is meant to be run by the learner hours or days ahead of the
workshop. Any alias they set will be lost by the time they need it.
Recommend editing ~/.nanorc to set appropriate flags instead, or editing
~/.bashrc to retain the alias, and revisiting this at the beginning of the
lesson to make sure everyone has a consistent editing environment.

@tbooth
Copy link
Collaborator Author

tbooth commented Sep 26, 2023

In [lesson 10 on conda integration](https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/10-
conda_integration/index.html), it states:

We’ll not talk about installing Conda, since it is already set up on the systems we are using.

But you provide a link to Miniconda in the setup instructions. I recommend replacing the above text with a link back to > the Setup instructions.

Done.

@tbooth
Copy link
Collaborator Author

tbooth commented Sep 28, 2023

Also, I wasn't able to run conda env update --file conda_env.yaml with the recommended setting of channel_priority:
strict. I had to temporarily disable it in my .condarc. As Snakemake strongly encourages users to set strict channel
priority when using --use-conda, this could potentially trip up false beginners that have already started using
Snakemake. Another suggestion is to use conda list --explicit > conda_env.frozen.yaml to bypass the conda solver
altogether, and allow users to immediately install the exact packages that you used (though this would only work for
linux)

I've re-made the conda_env.yaml file to work with strict channel priority. The trick was to only use packages from bioconda and conda-forge and disable the 'defaults' channel.

This suggests that maybe I should be using Miniforge (https://github.com/conda-forge/miniforge/#download) instead of Miniconda as the base installer, but I'm not sure if this might cause other problems. I'll leave this for now as I seem to have it working.

@tbooth
Copy link
Collaborator Author

tbooth commented Sep 28, 2023

I've broken out the comments above from @tkphd into other issues - #52, #50, #49, so I'm going to close this issue as I believe I've addressed all points from @jdblischak, and also most from @tkphd.

@tbooth tbooth closed this as completed Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17
Projects
None yet
Development

No branches or pull requests

1 participant