Topo workflows are run on a AWS EKS Cluster using Argo Workflows. The detailed configuration is available in this repo.
To get setup you need access to the Argo user role inside the EKS cluster, you will need to contact someone from Topo Data Engineering to get access, all Imagery maintainers will already have access.
If creating your own workflow, or interested in the details of a current workflow please also read the CONFIGURATION.md.
You will need
Ensure you have kubectl
aliased to k
alias k=kubectl
To connect to the EKS cluster you need to be logged into AWS
aws-azure-login
Then to setup the cluster, only the first time using the cluster you need to run this
aws --region=ap-southeast-2 eks update-kubeconfig --name=Workflows
to validate the cluster is connected,
k get nodes
NAME STATUS ROLES AGE VERSION
ip-255-100-38-100.ap-southeast-2.compute.internal Ready <none> 7d v1.21.12-eks-5308cf7
ip-255-100-39-100.ap-southeast-2.compute.internal Ready <none> 7d v1.21.12-eks-5308cf7
to make the cli access easier you can set the default namespace to argo
k config set-context --current --namespace=argo
Once the cluster connection is setup a job can be submitted with the cli or accessed via the running argo-server
argo submit --watch workflows/raster/standardising.yaml
To open the web interface:
# Create a connection to the Argo server
k port-forward deployment/argo-workflows-server 2746:2746
xdg-open http://localhost:2746
In the Workflows page:
SUBMIT NEW WORKFLOW
Edit using full workflow options
UPLOAD FILE
- (Locate File -> Open)
+ CREATE
Elasticsearch is an analytics engine, it allows us to store, search and analyse AWS logs.
Elasticsearch can be accessed through https://myapplications.microsoft.com/.
workflow
data view and set the correct time filter.
All Logs for a Workflow:
kubernetes.labels.workflows.argoproj.io/workflow : "imagery-standardising-v0.2.0-60-9b7dq"
All Logs for a pod:
Click on the pod in the Argo UI and scroll through the summary table to find the pod name.
kubernetes.annotations.workflows.argoproj.io/node-name.keyword : "imagery-standardising-v0.2.0-60-9b7dq.create-config"
List Failed Stac Validation Logs:
kubernetes.labels.workflows.argoproj.io/workflow : "imagery-standardising-v0.2.0-60-9b7dq" and data.valid : False
Find a Basemaps URL:
kubernetes.labels.workflows.argoproj.io/workflow : "imagery-standardising-v0.2.0-60-9b7dq" and data.url : *
or
data.title : "Wellington Urban Aerial Photos (1987-1988) SN8790" and data.url : *
kubernetes.container_hash
field, available in Elasticsearch, gives the container hash that was used to run the task. It allows to get the version from the container registry for further investigations.
All workflow outputs and logs are stored in the artifacts bucket, in the linz-workflow-artifacts
bucket on the li-topo-prod
account.
All outputs follow the same naming convention:
s3://linz-workflow-artifacts/YYYY-mm/dd-workflow.name/pod.name/
For each pod the logs are saved as a main.log
file within the related pod.name
prefix.
Unless a different location is specified within the workflow code, output files will be uploaded to the corresponding pod.name
prefix.
Note: This bucket has a 90 day expiration lifecycle.
List pods:
k get pods --namespace=argo
# note: if the default namespace is set to argo, `--namespace=argo` is not required.
In the output next to the NAME
of the pod, the READY
column indicates how many Docker containers are running inside the pod. For example, 1/1
indicates there is one Docker container.
The output of the follow command includes a Containers
section. The first line in this section is the container name, for example, argo-server
.
k describe pods *pod_name* --namespace=argo
To access a container in a pod run:
k exec --namespace=argo --stdin=true --tty=true *pod_name* -- bash
Once inside the container you can run a number of commands. For example, if trouble shooting network issues, you could run the following:
mtr linz-workflow-artifacts.s3.ap-southeast-2.amazonaws.com
mtr sts.ap-southeast-2.amazonaws.com
watch --errexit nslookup linz-workflow-artifacts.s3.ap-southeast-2.amazonaws.com
See Concurrency for details on how to set limits on how many workflow instances can be run concurrently.
error: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1alpha1"
Upgrade aws cli to > 2.7.x
Some tasks in the Workflows
or WorkflowsTemplates
use a container to run from. These containers are build from other repository, such as https://github.com/linz/topo-imagery, https://github.com/linz/argo-tasks or https://github.com/linz/basemaps.
Different tags are published for each of these containers:
latest
vX.Y.Z
vX.Y
vX
The container version are managed by a workflow parameter that needs to be specified when submitting the workflow. The default value is the last major version of the container.
Using the major version tag (vX
) with imagePullPolicy: Always
ensures that all minor versions are included when running a workflow using these containers.
This tag should never be used in production as it points to the latest build of the container which could be an unstable version. We reserve this tag for testing purposes.
These tags are intended to be use in production as they will be published for each stable release of the container.
:vX.Y
will change dynamically asZ
will be incremented.:vX
will change dynamically asY
andZ
will be incremented.
For each Workflow
and WorkflowTemplate
, there is a parameter version_*
that allows to specify the version of the LINZ container to use.