Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on creating the Singularity container and CD #2

Merged
merged 27 commits into from
Oct 13, 2020

Conversation

halirutan
Copy link
Contributor

@halirutan halirutan commented Sep 25, 2020

TODO:

@halirutan halirutan mentioned this pull request Sep 25, 2020
4 tasks
@bernt-matthias
Copy link
Contributor

Ping @halirutan @mai00fti : Could you create a release of the repos? Maybe v0.1? Maybe set the pre-release checkbox?

I would like to create a conda package.

@bernt-matthias
Copy link
Contributor

Potentially stupid question: Why is the tensorflow-gpu conda package installed? Is the really needed in the tensorflow container? Intuitively I would guess that all functionality should be already in the container?

@bernt-matthias
Copy link
Contributor

One more question: in tests/ there are a bunch of python scripts. How are they called and in which order are they executed?

@halirutan
Copy link
Contributor Author

Potentially stupid question: Why is the tensorflow-gpu conda package installed? Is the really needed in the tensorflow container?

Simple answers: 1. Because in the environment.yml we need it if someone wants to create the conda env. 2. I was too lazy to test if it works in the singularity container without it. After all we're setting up a new conda env and I wasn't totally sure if it actually reuses the packages that are already available. Since the singularity built takes ages, I simply haven't investigated yet.

One more question: in tests/ there are a bunch of python scripts. How are they called and in which order are they executed?

The run... files are not for unit-testing and they cannot be run automatically because they rely on the /data which is not available in the repo. Therefore, only the test_... files are executed with pytest.

Could you create a release of the repos?

Not sure if I'm allowed to do this. I need to check what privileges I have.

@halirutan
Copy link
Contributor Author

halirutan commented Sep 25, 2020

Ping @halirutan @mai00fti : Could you create a release of the repos? Maybe v0.1? Maybe set the pre-release checkbox?

Done: https://github.com/yigbt/deepFPlearn/releases/tag/v0.1

@bernt-matthias Damn, I guess you needed the release from this branch, right?

@bernt-matthias
Copy link
Contributor

I guess you needed the release from this branch, right?

Should be fine.

@bernt-matthias
Copy link
Contributor

One step further: I successfully built an image using pip only: https://github.com/bernt-matthias/deepFPlearn/runs/1166177083?check_suite_focus=true (in my original branch https://github.com/bernt-matthias/deepFPlearn/tree/topic/singularity-CD)

In contrast to the conda based container the image is 'only' 1.5GB (the other one has nearly 4GB). One problem might be that I need to install rdkit via apt-get, which seems to install to python 2.7. Not sure if this leads to problems. Tests would be nice (also in general adding CI for the project).

Not sure yet why the conda image gets that large. I will play around with multi stage builds in the style of https://pythonspeed.com/articles/conda-docker-image-size/

@bernt-matthias
Copy link
Contributor

If I'm not wrong, then single stage build now works now because I use the slim version of quay.io/singularity/singularity

Nevertheless I got also multstage builds to work:

  • multi stage build without tensorflow-gpu tensorboard scipy and scikit-learn 3.1GB
  • multi stage build without scipy and scikit-learn 4.1GB
  • single stage build without scipy and scikit-learn 5.4GB

Sizes are from building on my machine, here I have seen different sizes .

I have the three container definitions in my branch https://github.com/bernt-matthias/deepFPlearn/tree/topic/singularity-CD

  • conda_rdkit2019.def (single stage conda)
  • dfpl.def (pip based)
  • multi.def (multistage conda)

Not sure if the commented packages in the environment are really not needed, but at least I could not find them used with grep.

If the pip version is running I guess this may be the preferred way.

@halirutan
Copy link
Contributor Author

@bernt-matthias I need to look at this closer.

Tests would be nice

The rdkit can be tested with tests/test_fingerprint.py since it uses rdkit to build fingerprints. A call to pytest from the deepFPlearn directory (with activated env) is sufficient.

Not sure yet why the conda image gets that large.

My current understanding is that the conda dependency resolution is superior to that of pip. I mean, it's not only important what packages we use but also which packages the dependencies depend on. I don't know what magic conda uses but it takes a very long time to resolve all dependencies.

A definite test is if we re-run the training. A small subset of our cases should be sufficient. But I need to test this.

Not sure if the commented packages in the environment are really not needed, but at least I could not find them used with grep.

The whole thing is historically grown from Jana's first tries. So there is a good chance that not all packages are needed.

Can you push the commits from your branch to this one?

On a side note: We really need to fix the naming of the environment :) This is also a historical thing and conda_rdkit2019 does not really fit anymore 😂

@bernt-matthias
Copy link
Contributor

The rdkit can be tested with tests/test_fingerprint.py since it uses rdkit to build fingerprints. A call to pytest from the deepFPlearn directory (with activated env) is sufficient.

Then we should add a %test section to the final stage

My current understanding is that the conda dependency resolution is superior to that of pip.

True

I mean, it's not only important what packages we use but also which packages the dependencies depend on. I don't know what magic conda uses but it takes a very long time to resolve all dependencies.

Conda computes the complete dependency graph of the packages and its dependencies and gets an optimal solution by solving a SAT problem .. that's why it takes a long time some times. Problem is known, here are some recommendations bioconda/bioconda-recipes#13774 (e.g. fixing python / R versions to a reasonable range .. for instance py>=3.5)

A definite test is if we re-run the training. A small subset of our cases should be sufficient. But I need to test this.

Would be a cool addition. Wondering if the gpu container will run on the github actions virtual machines which probably do not have GPUs.

Can you push the commits from your branch to this one?

Done

On a side note: We really need to fix the naming of the environment :) This is also a historical thing and conda_rdkit2019 does not really fit anymore joy

Jep. That's an easy one. We just need to decide which to take.

@halirutan
Copy link
Contributor Author

halirutan commented Sep 30, 2020

@bernt-matthias

Not sure if the commented packages in the environment are really not needed, but at least I could not find them used with grep.

I was having a look at this while working on another issue with Java Jana (have to remember her name properly). We definitely use sklearn as direct import but I was able to scrape a lot of the packages. I also fixed the versions to values I'm seeing here because in the bioconda issue you linked, it was mentioned as one major slow-down in the resolution of dependencies. I'm currently running the whole analysis on my machine with the new environment and will do the same on the HPC. Right now, the list of packages looks like this:

  - python=3.8.2
  - pip=20.1.1
  - numpy=1.19.1
  - conda=4.8.4
  - conda-build=3.20.3
  - tensorflow-gpu=2.2.0
  - scikit-learn=0.23.1
  - pandas=1.0.5
  - keras=2.4.3
  - rdkit=2020.03.4
  - matplotlib=3.2.2
  - pytest=5.4.3
  - jsonpickle=1.4.1

Then we should add a %test section to the final stage

Agreed! It would be really nice if we had some sort of test-data that we could use for training. Just to be sure. Will talk to Jana about it.

Would be a cool addition. Wondering if the gpu container will run on the github actions virtual machines which probably do not have GPUs.

Should work because we basically have the same situation on your UFZ HPC where we can't access the GPU atm. It just falls back to CPU.

@halirutan
Copy link
Contributor Author

I tested the adjusted environment locally and ran some tests. You see that it still builds here. I'd say when the analysis on the HPC I'm starting now runs through with the changed environment, we should discuss what we need to change in the GitHub action to push the container automatically on each new tagged release on master.

It's really awesome that we made such progress and I highly appreciate your help in all this! 😀

@halirutan
Copy link
Contributor Author

@bernt-matthias Surprisingly we now run into the same error we had before about "no space left on device" :(

@bernt-matthias
Copy link
Contributor

Hrm. Habe den Job nochmal neu gestartet...

Gestern noch kurz mit @mai00fti drueber geschaut. Denken es ware gut, wenn die/das def file(s) unabhaengig von CI laufen wuerden. Dafuer muesste man eigentlich nur sources per git auschecken.

@bernt-matthias bernt-matthias force-pushed the feat_singularity_CD branch 3 times, most recently from 28d7c2c to daf009e Compare October 8, 2020 12:03
@bernt-matthias
Copy link
Contributor

Attention I did force push to this branch :(

@halirutan
Copy link
Contributor Author

@bernt-matthias I took some time tonight to work on this, and I got everything running:

  • using a simple Ubuntu-20.04 and install all necessary package for...
  • ...building Singularity from the sources which takes much less disk space than the Docker thing
  • building our Singularity container works just fine
  • I use an encrypted env variable to store the token that gives me access to my cloud.sylabs.io account
  • I upload the container to my syslab account and it's available here

I have this on a private repo for now, but I would work with Jana on it tomorrow. We need to set up a sylabs account for the group and add access-tokens to this repo (which I'm not allowed to do). Then we need to fix the description, collection, etc of the container so that has the right information on it.

I'd also force push to this branch when everything is running.

@halirutan
Copy link
Contributor Author

halirutan commented Oct 9, 2020

@bernt-matthias @mai00fti Yeah, works. Right now, we decided to push the containers to Jana's sylabs account until a better solution with more quota is available. I'll push some more documentation to this branch and then it should be ready for merging.

We'll build the container on each release (and not on each push) and after merging this branch, I'll add the latest changes we made to the dfpl package to master and create a first real release.

Copy link
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a test in the container definition it would be perfect :)

@@ -15,51 +15,22 @@ All other steps are as pointed out in the documentation linked above.

Building the container using the provided `conda_rdkit2019.def` definition file requires only a few steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container name should also be changed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks. This is one of the places I need to change. And the main readme should also get an appropriate entry now.

@@ -0,0 +1,20 @@
Bootstrap: docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the other def files can be removed


- name: Build Container
env:
SINGULARITY_RECIPE: singularity_container/conda_dfpl.def
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also doch das single stage definition file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hab ich noch nicht entschieden. Ich baue mir mal beide lokal und schaue auf Laufzeiten und Größen. Nun hast du dir einmal die Mühe gemacht, da will ich das nicht einfach so wegschmeissen. Ich denke ich passe das dann noch an und nehme dein multi stage.

@halirutan
Copy link
Contributor Author

halirutan commented Oct 9, 2020

@bernt-matthias On my local machine, both the dfpl.def and multi.def run through smoothly. However, there is a huge difference in size

image

Have you tried building the multi.def yourself and do you get other results?

Edit:

Since Jana mentioned it that this wasn't clear. The singularity build works out of the box on every machine and you don't have to set up anything special. Just clone the whole repo, go into the singularity_container folder and call

SING=$(command -v singularity)
sudo $SING build dfpl-multi.sif multi.def

@bernt-matthias
Copy link
Contributor

For me the local build of the multi_conda def file is 4.1G.

Questions:

  • is this line in the definition files copying the .git dir as well? Then this adds 1.2G on normal checkout. For the CI it should not be a problem because the checkout action should only checkout the state of the latest commit and not the complete git history (depth=1 .. I think).
  • when you build locally do you remove potentially existing sif files before. otherwise the copying of the previous point would lead to the inclusion of complete containers in the new container...

@bernt-matthias
Copy link
Contributor

Maybe try building locally on a fresh clone using git clone -b feat_singularity_CD --single-branch --depth 1

@halirutan
Copy link
Contributor Author

@bernt-matthias You are right. I wasn't paying attention that the whole file-tree is copied into the container and it happily included the sifs I already built. Keeping this in mind, I suggest taking your multi.def to build the container. I will make all necessary changes. Thanks for looking into it.

@halirutan halirutan merged commit daa2658 into master Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants