Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more detailed conda setup instructions to the GATK README #9001

Merged
merged 2 commits into from
Oct 16, 2024

Conversation

droazen
Copy link
Contributor

@droazen droazen commented Oct 15, 2024

No description provided.

@droazen
Copy link
Contributor Author

droazen commented Oct 15, 2024

@lbergelson / @KevinCLydon , please review for accuracy / completeness of the revised conda setup instructions

Copy link
Member

@KevinCLydon KevinCLydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty clear. I have a couple quick questions, but I'm totally happy with this being merged regardless.

README.md Outdated
@@ -81,13 +81,29 @@ releases of the toolkit.
* GATK4 uses the [Conda](https://conda.io/docs/index.html) package manager to establish and manage the
Python environment and dependencies required by GATK tools that have a Python dependency. This environment also
includes the R dependencies used for plotting in some of the tools. The ```gatk``` environment
requires hardware with AVX support for tools that depend on TensorFlow (e.g. CNNScoreVariant). The GATK Docker image
requires hardware with AVX support for tools that depend on TensorFlow. The GATK Docker image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other tools in GATK that use Tensorflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, the mention of Tensorflow will be obsolete once the CNN tools are removed. Pytorch, however, does apparently depend on the Intel MKL, which uses CPU-based features for acceleration:

https://www.reddit.com/r/MachineLearning/comments/iap6yo/discussion_pytorch_favors_intel_against_amds/

* To establish the environment when not using the Docker image, a conda environment must first be "created", and
then "activated":
* First, make sure [Miniconda or Conda](https://conda.io/docs/index.html) is installed (Miniconda is sufficient).
* First, make sure [Miniconda or Conda](https://conda.io/docs/index.html) is installed. We recommend installing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a parenthetical note saying that you might run into issues if you try to use a different conda version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we definitely should!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the opt folder and why do we install there? I've never understood linux folder structures...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to one of my AI friends:

The /opt directory in Linux is a reserved space for installing optional or add-on software packages. The name /opt stands for "optional".

The /opt directory is important for several reasons, including:

  • System management
    • The /opt directory helps keep the Linux system organized and modular, which makes it easier to manage and maintain.
  • Security
    • By separating optional software from the core system, the /opt directory reduces the risk of accidentally modifying or deleting critical system files.
  • Software storage
    • The /opt directory is often used to store manually compiled software, which is built from source code instead of being installed from distribution repositories.

Within the /opt directory, applications are usually stored in /opt/bin, while libraries are placed in /opt/lib

and rely on Mac OS's built-in x86 emulation.
* Set up miniconda:
* Install miniconda to a location on your PATH such as ```/opt/miniconda```, and then restart your shell:
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I ended up running conda init. I'm not clear on what that did or if it was necessary but maybe it was?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.... @KevinCLydon did you have to do that too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KevinCLydon doesn't see this command in the history of commands he ran to setup his environment, so I'm going to assume it's not actually necessary.

@droazen droazen merged commit a070efc into master Oct 16, 2024
20 checks passed
@droazen droazen deleted the dr_conda_readme_updates branch October 16, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants