A repository of base templates for new Renku projects. The next sections outline what different files in the template are used for.
Dockerfile
- File for building a docker image that you can launch from renkulab,
to work on your project in the cloud. Template-supplied contents will allow you to
launch an interactive environment from renkulab, with pre-installed renku CLI and
software dependencies that you put into your requirements.txt
, environment.yml
,
or install.R
. You can and should add to this Dockerfile
if libraries you install
require linux software installations as well; for more information see:
https://github.com/SwissDataScienceCenter/renkulab-docker.
.gitlab-ci.yml
- Configuration for running gitlab CI, which builds a docker image
out of the project on git push
to renkulab so that you can launch your interactive
environment (don't remove, but you can modify to add extra CI functionality).
.dockerignore
- Files and directories to be excluded from docker build (you can
append to this list); https://docs.docker.com/engine/reference/builder/#dockerignore-file.
The default version of the renku CLI used in the interactive environment is specified in the Dockerfile in a line similar to this:
ARG RENKU_VERSION={{ __renku_version__ | default("0.16.0") }}
The client creating the project (either via the UI in RenkuLab or the renku CLI) can override this default setting. The version is set as follows:
- if the client (the renku core service or the renku CLI) is using a released version, then pass this to the project template
- if the client is on a development version, use the default provided by the template
requirements.txt
- Required by template's Dockerfile; add your python pip
dependencies here.
environment.yml
- Required by template's Dockerfile; add your python conda
dependencies here.
install.R
- Required by template's Dockerfile (for r-based projects only).
README.md
- Edit this file to provide information about your own project.
Initial contents explain how to use a renku project.
.renku
- Directory containing renku metadata that renku commands update
(caution: don't update this manually).
.renkulfsignore
- File similar to .gitignore
for telling renku to NOT store listed files in git LFS. Use in conjunction with
renku config lfs_threshold <[size]kb>
to tell renku to NOT store files above a
threshold size in LFS. Initial configuration is set to 100kb.
By default, renku
commands (like renku run
and renku dataset
) store all output
files above a configurable threshold size of 100KB in git LFS
to prevent accidentally committing large files to git. It's bad to git commit
large files (e.g. datasets, graphics, videos, audio samples) without being tracked
by git LFS, because they slow down git commands (and thus renku commands). However,
sometimes:
- an imported dataset will come with markdown (
*.md
) and/or code (e.g.*.py
). - a code file (like
*.ipynb
) will be generated from arenku run
(e.g. with papermill). - generated or imported data could be small (e.g. <100kb)
Tracking files with LFS is good, but limits your ability to use commands like
git diff
to view changes, and to see the contents of the files in the project's
page on renkulab.
Thus, you can edit .renkulfsignore
to add files with particular paths or extensions
that are relevant for your project. renku
commands will consult .renkulfsignore
and not track those files with git LFS.
Note: When you start a new interactive environment, by default the LFS-tracked
files (e.g. files above the configured threshold AND not on this list) are in
their "pointer" form. Run renku storage pull <filepath>
to pull the real content
into each file, or git lfs pull
to replace all pointers with real content all
at once. Since these are large files, you might be better off pulling them one at
a time.
data
- Initially empty directory where renku dataset
creates subdirectories
for your named datasets and the files you add to those datasets (if you haven't
or will not be creating renku datasets, you can remove this directory).
notebooks
- Initially empty directory to help you organize jupyter notebooks
(not a requirement, you can remove this directory).
.gitignore
- Files and directories to be excluded from git repository (this
template only requires the #renku section, but the others are nice-to-haves
for common paths to ignore).