-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work on creating the Singularity container and CD #2
Conversation
Ping @halirutan @mai00fti : Could you create a release of the repos? Maybe v0.1? Maybe set the pre-release checkbox? I would like to create a conda package. |
Potentially stupid question: Why is the |
One more question: in |
Simple answers: 1. Because in the
The
Not sure if I'm allowed to do this. I need to check what privileges I have. |
Done: https://github.com/yigbt/deepFPlearn/releases/tag/v0.1 @bernt-matthias Damn, I guess you needed the release from this branch, right? |
Should be fine. |
One step further: I successfully built an image using pip only: https://github.com/bernt-matthias/deepFPlearn/runs/1166177083?check_suite_focus=true (in my original branch https://github.com/bernt-matthias/deepFPlearn/tree/topic/singularity-CD) In contrast to the conda based container the image is 'only' 1.5GB (the other one has nearly 4GB). One problem might be that I need to install rdkit via apt-get, which seems to install to python 2.7. Not sure if this leads to problems. Tests would be nice (also in general adding CI for the project). Not sure yet why the conda image gets that large. I will play around with multi stage builds in the style of https://pythonspeed.com/articles/conda-docker-image-size/ |
If I'm not wrong, then single stage build now works now because I use the slim version of quay.io/singularity/singularity Nevertheless I got also multstage builds to work:
Sizes are from building on my machine, here I have seen different sizes . I have the three container definitions in my branch https://github.com/bernt-matthias/deepFPlearn/tree/topic/singularity-CD
Not sure if the commented packages in the environment are really not needed, but at least I could not find them used with grep. If the pip version is running I guess this may be the preferred way. |
@bernt-matthias I need to look at this closer.
The rdkit can be tested with
My current understanding is that the conda dependency resolution is superior to that of pip. I mean, it's not only important what packages we use but also which packages the dependencies depend on. I don't know what magic conda uses but it takes a very long time to resolve all dependencies. A definite test is if we re-run the training. A small subset of our cases should be sufficient. But I need to test this.
The whole thing is historically grown from Jana's first tries. So there is a good chance that not all packages are needed. Can you push the commits from your branch to this one? On a side note: We really need to fix the naming of the environment :) This is also a historical thing and |
Then we should add a
True
Conda computes the complete dependency graph of the packages and its dependencies and gets an optimal solution by solving a SAT problem .. that's why it takes a long time some times. Problem is known, here are some recommendations bioconda/bioconda-recipes#13774 (e.g. fixing python / R versions to a reasonable range .. for instance py>=3.5)
Would be a cool addition. Wondering if the gpu container will run on the github actions virtual machines which probably do not have GPUs.
Done
Jep. That's an easy one. We just need to decide which to take. |
I was having a look at this while working on another issue with
Agreed! It would be really nice if we had some sort of test-data that we could use for training. Just to be sure. Will talk to Jana about it.
Should work because we basically have the same situation on your UFZ HPC where we can't access the GPU atm. It just falls back to CPU. |
I tested the adjusted environment locally and ran some tests. You see that it still builds here. I'd say when the analysis on the HPC I'm starting now runs through with the changed environment, we should discuss what we need to change in the GitHub action to push the container automatically on each new tagged release on master. It's really awesome that we made such progress and I highly appreciate your help in all this! 😀 |
@bernt-matthias Surprisingly we now run into the same error we had before about "no space left on device" :( |
Hrm. Habe den Job nochmal neu gestartet... Gestern noch kurz mit @mai00fti drueber geschaut. Denken es ware gut, wenn die/das def file(s) unabhaengig von CI laufen wuerden. Dafuer muesste man eigentlich nur sources per git auschecken. |
28d7c2c
to
daf009e
Compare
Attention I did force push to this branch :( |
@bernt-matthias I took some time tonight to work on this, and I got everything running:
I have this on a private repo for now, but I would work with Jana on it tomorrow. We need to set up a sylabs account for the group and add access-tokens to this repo (which I'm not allowed to do). Then we need to fix the description, collection, etc of the container so that has the right information on it. I'd also force push to this branch when everything is running. |
@bernt-matthias @mai00fti Yeah, works. Right now, we decided to push the containers to Jana's sylabs account until a better solution with more quota is available. I'll push some more documentation to this branch and then it should be ready for merging. We'll build the container on each release (and not on each push) and after merging this branch, I'll add the latest changes we made to the dfpl package to master and create a first real release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a test in the container definition it would be perfect :)
@@ -15,51 +15,22 @@ All other steps are as pointed out in the documentation linked above. | |||
|
|||
Building the container using the provided `conda_rdkit2019.def` definition file requires only a few steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container name should also be changed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks. This is one of the places I need to change. And the main readme should also get an appropriate entry now.
@@ -0,0 +1,20 @@ | |||
Bootstrap: docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the other def files can be removed
.github/workflows/release.yml
Outdated
|
||
- name: Build Container | ||
env: | ||
SINGULARITY_RECIPE: singularity_container/conda_dfpl.def |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also doch das single stage definition file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hab ich noch nicht entschieden. Ich baue mir mal beide lokal und schaue auf Laufzeiten und Größen. Nun hast du dir einmal die Mühe gemacht, da will ich das nicht einfach so wegschmeissen. Ich denke ich passe das dann noch an und nehme dein multi stage.
@bernt-matthias On my local machine, both the Have you tried building the Edit:Since Jana mentioned it that this wasn't clear. The singularity build works out of the box on every machine and you don't have to set up anything special. Just clone the whole repo, go into the
|
For me the local build of the multi_conda def file is 4.1G. Questions:
|
Maybe try building locally on a fresh clone using |
@bernt-matthias You are right. I wasn't paying attention that the whole file-tree is copied into the container and it happily included the sifs I already built. Keeping this in mind, I suggest taking your |
…s the Sylabs repo so that we have disk-space again.
TODO: