Skip to content

NearNodeFlash/nnf-deploy

Repository files navigation

****# NNF Deployment

To clone this project, use the additional --recurse-submodules option to retrieve its submodules:

git clone --recurse-submodules [email protected]:NearNodeFlash/nnf-deploy

Updating the Submodules

To update the submodules in this work area, run the update.sh script. Use this to pick up recent changes in any of the submodules.

Warning If the current submodules have already been deployed to a K8s system, then teardown and delete any workflows and run nnf-deploy undeploy to remove the old CRDs and pods prior to updating the submodules. An update may pull in new CRD changes that are incompatible with resources that are already on the K8s system.

The update.sh command will update each submodule directory to the head of its master branch.

tools/update.sh

Submodule Versions

Any submodule can be set to a specific revision and it will be used by the nnf-deploy command. Note the warning above prior to setting a submodule to a specific revision.

To set a submodule to a specific revision, change into that submodule's directory and switch to that revision or branch:

cd nnf-sos
git switch branch-with-my-fixes
cd ..

The update.sh command will switch that submodule back to the head of its master branch.

nnf-deploy

nnf-deploy is a golang executable capable of building components of the Rabbit software stack locally as well as deploying and un-deploying those components to a k8s cluster specified by the current kubeconfig.

Build

Build using: make

Prior to running, ensure correct NNF systems are loaded in ./config/systems.yaml and correct ghcr repositories are defined in ./config/repositories.yaml

Options

./nnf-deploy --help
Usage: nnf-deploy <command>

Flags:
  -h, --help       Show context-sensitive help.
      --debug      Enable debug mode.
      --dry-run    Show what would be run.
      --systems="config/systems.yaml"
                   path to the systems config file
      --repos="config/repositories.yaml"
                   path to the repositories config file
      --daemons="config/daemons.yaml"
                   path to the daemons config file

Commands:
  deploy [<only> ...]
    Deploy to current context.

  undeploy [<only> ...]
    Undeploy from current context.

  make <command> [<only> ...]
    Run make [COMMAND] in every repository.

  install [<node> ...]
    Install daemons (EXPERIMENTAL).

  init
    Initialize cluster.

Run "nnf-deploy <command> --help" for more information on a command.

Init

The init subcommand will install ArgoCD via helm. The user must have the helm CLI installed. This init command should be done only once on a new cluster.

./nnf-deploy init

To restore legacy init behavior--to have init install cert manager, mpi-operator, lustre-csi-driver, and lustre-fs-operator--copy the config/overlay-legacy.yaml-template file to ./overlay-legacy.yaml. This init command only needs to be done once on a new cluster or when one of them changes.

cp config/overlay-legacy.yaml-template overlay-legacy.yaml
./nnf-deploy init

Deploy

Deploy all the submodules using the deploy command

./nnf-deploy deploy

To deploy only specific repositories, include the desired modules after deploy command. For example, to deploy only dws and nnf-sos repositories, use

./nnf-deploy deploy dws nnf-sos

Undeploy

WARNING! Before you undeploy, delete any user or administrator created resources such as lustrefilesystems and workflows using kubectl commands

kubectl delete workflows.dws.cray.hpe.com --all
kubectl delete lustrefilesystems.cray.hpe.com --all

Undeploy all the submodules using the undeploy command.

./nnf-deploy undeploy

Similar to deploy, you may undeploy specific repositories by including the desired modules after the undeploy command. For example, to undeploy only dws and nnf-sos, use

./nnf-deploy undeploy dws nnf-sos

Make

The make subcommand provides direct access to makefile targets within each submodule in nnf-deploy executing make <command> within each submodule. For example, the following command performs a docker-build within each submodule:

./nnf-deploy make docker-build

Kind cluster

Kind clusters are built and deployed using locally compiled images. The following commands:

  • Create a kind cluster
  • Build all docker images for Rabbit modules
  • Push those images into the Kind cluster
  • Deploy those images onto the Kind cluster nodes
./tools/kind.sh reset
./nnf-deploy make docker-build
./nnf-deploy make kind-push
./nnf-deploy deploy

Install

The install subcommand will compile and install the daemons on the compute nodes, along with the proper certs and tokens. Systemd files are used to manage and start the daemons. This is necessary for data movement.

./nnf-deploy install