Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test kubernetes in CI #3482

Draft
wants to merge 76 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
8e3d7cf
Added options to set annotations and a service account in the Kuberne…
shishichen Jun 7, 2024
45269ed
Correct punctuation in debug message. hack out tests that won't fail-…
benclifford Jun 7, 2024
7ceec7a
Fix a couple of docstrings
benclifford Jun 7, 2024
0cdece2
a bit of name sanitization for default pod names
benclifford Jun 7, 2024
87d3454
fiddle with markings to deal with no shared fs and no staging
benclifford Jun 8, 2024
954bad7
add config file i've been using
benclifford Jun 10, 2024
d3e3828
Merge remote-tracking branch 'shishichen/add-k8s-pod-options'
benclifford Jun 10, 2024
17a00dd
add the dockerfile i've been using
benclifford Jun 10, 2024
62e0e36
beginning of kubernetes-in-CI
benclifford Jun 10, 2024
9086e19
push docker image? upgrade ubuntu
benclifford Jun 10, 2024
26869e6
fiddle with default name
benclifford Jun 10, 2024
16c0a49
Add kubernetes, needed for submitting from inside a cluster
benclifford Jun 10, 2024
b03615a
Add more bits for running everything in a kubernetes cluster
benclifford Jun 10, 2024
e122d19
fix syntax error in github workflow definition
benclifford Jun 10, 2024
ee14f6e
Tighten timeout, add some debugging info at the end
benclifford Jun 10, 2024
120cf78
Correct pod name from my test
benclifford Jun 10, 2024
5c55fe6
try to stop Job from recreating pod on failure, but instead abort fast
benclifford Jun 10, 2024
b56dfd9
Randomise test order to see if a test failure is specific to a partic…
benclifford Jun 10, 2024
dd0f66c
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jun 10, 2024
f8f5a27
Add some memory logging
benclifford Jun 10, 2024
f4a7300
Allocate more memory to workers
benclifford Jun 10, 2024
ffdb021
Add a staging_required marker that apparently wasn't breaking things …
benclifford Jun 10, 2024
21711e9
messing with backoff limits and restart policy
benclifford Jun 10, 2024
fb1733e
remove apparently invalid restart policy
benclifford Jun 10, 2024
73b3e1d
Flush out some more staging_required tests (by setting storage_access…
benclifford Jun 10, 2024
31bc958
Switch the Kubernetes client call to read_namespaced_pod_status() to …
shishichen Jun 12, 2024
8b39024
Fixed Kubernetes worker container launch command to remove trailing s…
shishichen Jun 13, 2024
4538763
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jun 14, 2024
5f43aeb
Merge remote-tracking branch 'shishichen/fix-k8s-launch-cmd' into ben…
benclifford Jun 14, 2024
299de99
Merge remote-tracking branch 'shishichen/swap-k8s-pod-status' into be…
benclifford Jun 14, 2024
68e3a5d
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jun 18, 2024
fe3c55e
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jun 24, 2024
064b833
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jul 2, 2024
9c6a04e
Merge remote-tracking branch 'origin/benc-k8s-kind-ci' into benc-k8s-…
benclifford Jul 2, 2024
ba5f047
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jul 2, 2024
69fbf03
Merge branch 'master' into benc-k8s-kind-ci
benclifford Jul 7, 2024
75b7c02
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Jul 31, 2024
780fbb0
Merge remote-tracking branch 'origin/benc-k8s-kind-ci' into benc-k8s-…
benclifford Jul 31, 2024
2324744
Merge branch 'master' into benc-k8s-kind-ci
benclifford Aug 5, 2024
b75a3ae
function data in temp
colinthomas-z80 Aug 19, 2024
2c18d6c
use getpass for username
colinthomas-z80 Aug 19, 2024
c201ec1
use tempfile module
colinthomas-z80 Aug 20, 2024
9f6b037
flake etc
colinthomas-z80 Aug 20, 2024
5ec7cdb
Merge branch 'master' into tmp_function_data
benclifford Aug 21, 2024
c3f6d45
Merge branch 'master' into benc-k8s-kind-ci
benclifford Aug 22, 2024
edf870f
Merge branch 'master' into benc-k8s-kind-ci
benclifford Aug 26, 2024
811b8e5
Merge branch 'master' into benc-k8s-kind-ci
benclifford Sep 3, 2024
7347f64
Merge remote-tracking branch 'refs/remotes/origin/master' into benc-k…
benclifford Sep 4, 2024
cd7229f
Merge branch 'master' into tmp_function_data
benclifford Sep 4, 2024
08f8ce9
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Sep 5, 2024
5967f01
Merge remote-tracking branch 'refs/remotes/origin/benc-k8s-kind-ci' i…
benclifford Sep 5, 2024
4938dbf
Build cctools and run a probably-broken taskvine vs kubernetes test c…
benclifford Sep 5, 2024
98d7693
fix repr in taskvine
benclifford Sep 5, 2024
dfc94a8
install cloudpickle explicitly for taskvine
benclifford Sep 5, 2024
47378f3
Add more time onto job timeout, because more is happening in job with…
benclifford Sep 5, 2024
43af8ef
revert to 180s test time
benclifford Sep 5, 2024
6a32f0f
Log more to the console, kubernetes style
benclifford Sep 5, 2024
d4fab6a
Note a (documentation?) bug in taskvine address selection
benclifford Sep 5, 2024
21dcae6
force hostname based address config, in line with comment in previous…
benclifford Sep 5, 2024
e1cce03
now we're starting taskvine test successfully, give it time to complete
benclifford Sep 5, 2024
4d4b4ba
Make taskvine shutdown scale-in more like htex shutdown scale-in
benclifford Sep 5, 2024
2e42e5c
enable staging_required tests in taskvine, because taskvine might be …
benclifford Sep 5, 2024
3ba7e12
Output timestamps in kubernetes log to help diagnose hangs
benclifford Sep 5, 2024
084d797
failed to get non-staging tests working, made a note in comments
benclifford Sep 5, 2024
f34f2b8
correct duplicated 'and' in pytest -k option
benclifford Sep 5, 2024
60a8611
Merge remote-tracking branch 'colinthomas-z80/tmp_function_data' into…
benclifford Sep 6, 2024
1f09e5c
Add utils to sanitize strings for DNS compliance
rjmello Oct 14, 2024
83278c2
Ensure k8s pod names/labels are RFC 1123 compliant
rjmello Oct 15, 2024
86ade32
Use hex value for k8s job ID instead of pod name
rjmello Oct 15, 2024
0c4d541
Add tests for KubernetesProvider submit
rjmello Oct 17, 2024
08693ab
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Oct 21, 2024
415f780
Merge remote-tracking branch 'origin/rjmello-kube-pod-names' into ben…
benclifford Oct 21, 2024
c78defa
Fix some bad merge
benclifford Oct 21, 2024
54ea143
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Oct 21, 2024
535289f
Merge remote-tracking branch 'origin/master' into benc-k8s-kind-ci
benclifford Oct 31, 2024
fd26ddd
Merge branch 'master' into benc-k8s-kind-ci
benclifford Nov 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions .github/workflows/ci-k8s.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: Parsl

on:
pull_request:
types:
- opened
- synchronize

jobs:
k8s-kind-suite:
runs-on: ubuntu-24.04
timeout-minutes: 60

steps:
- uses: actions/checkout@master

- name: Create k8s Kind Cluster
uses: helm/kind-action@v1
with:
# kind tooling uses this name by default, but kind-action uses
# a different default name
cluster_name: kind

- name: Build docker image
uses: docker/build-push-action@v5
with:
context: .
tags: parsl:ci

- name: Push docker image into kubernetes cluster
run: |
kind load docker-image parsl:ci

- name: set liberal permissions
run: |
kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts

- name: launch pytest Job
run: |
free -h
kubectl create -f ./pytest-task.yaml

- name: wait for pytest Job
run: |
# this pytest should take around 30 seconds to run, so 180 seconds
# should be plenty...
kubectl wait --timeout=180s --for=condition=Complete Job pytest

- name: report some info
if: ${{ always() }}
run: |
free -h
kubectl describe pods
kubectl describe jobs
kubectl logs Job/pytest
35 changes: 35 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM debian:trixie

RUN apt-get update && apt-get upgrade -y
rjmello marked this conversation as resolved.
Show resolved Hide resolved

RUN apt-get update && apt-get install -y sudo openssh-server

RUN apt-get update && apt-get install -y curl less vim

# git is needed for parsl to figure out it's own repo-specific
# version string
RUN apt-get update && apt-get install -y git

# useful stuff to have around
RUN apt-get update && apt-get install -y procps

# for building documentation
RUN apt-get update && apt-get install -y pandoc

# for monitoring visualization
RUN apt-get update && apt-get install -y graphviz wget

# for commandline access to monitoring database
RUN apt-get update && apt-get install -y sqlite3

RUN apt-get update && apt-get install -y python3.12 python3.12-dev
RUN apt-get update && apt-get install -y python3.12-venv
Comment on lines +25 to +26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we make the Python version configurable.

E.g.,

Suggested change
RUN apt-get update && apt-get install -y python3.12 python3.12-dev
RUN apt-get update && apt-get install -y python3.12-venv
ARG PYTHON_VERSION="3.12"
RUN apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev
RUN apt-get-install -y python${PYTHON_VERSION}-venv

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird (something to do with how python is packaged in trixie?). Normally in debian it looks like there's a single OS-level python3 available (which changes when there's a new code-named release) and it seems unusual that trixie happens to have two. certainly debian isn't traditionally set up to expect you to be able to choose a python version from the OS.

there's a couple of things that could happen: i) always use the OS-level default python3 or ii) use something like Conda to provide a much richer Python environment. Some dependencies like the ndcctools recommends being installed using conda anyway, and so maybe that's the way to go here. I don't think there's any particular reason to want to stick with the OS-level Python, as this is "an image where Parsl works" rather than "an image that looks like a particular debian version".


RUN apt-get update && apt-get install -y gcc build-essential make pkg-config mpich

RUN python3.12 -m venv /venv

ADD . /src
WORKDIR /src

RUN . /venv/bin/activate && pip3 install '.[kubernetes]' -r test-requirements.txt
26 changes: 26 additions & 0 deletions htex_k8s_kind.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from parsl.channels import LocalChannel
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.launchers import SimpleLauncher
from parsl.providers import KubernetesProvider


def fresh_config():
return Config(
executors=[
HighThroughputExecutor(
label="executorname",
storage_access=[],
worker_debug=True,
cores_per_worker=1,
encrypted=False, # needs certificate fs to be mounted in same place...
provider=KubernetesProvider(
worker_init=". /venv/bin/activate",
# pod_name="override-pod-name", # can't use default name because of dots, without own bugfix
image="parsl:ci",
max_mem="2048Gi" # was getting OOM-killing of workers with default... maybe this will help.
),
)
],
strategy='none',
)
30 changes: 27 additions & 3 deletions parsl/providers/kubernetes/kube.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
import re
import time

from parsl.providers.kubernetes.template import template_string
Expand Down Expand Up @@ -168,10 +169,9 @@ def submit(self, cmd_string, tasks_per_node, job_name="parsl"):
- tasks_per_node (int) : command invocations to be launched per node

Kwargs:
- job_name (String): Name for job, must be unique
- job_name (String): Name for job

Returns:
- None: At capacity, cannot provision more
- job_id: (string) Identifier for the job
"""

Expand All @@ -184,10 +184,12 @@ def submit(self, cmd_string, tasks_per_node, job_name="parsl"):
pod_name = '{}-{}'.format(self.pod_name,
cur_timestamp)

pod_name = _sanitizeDNS1123(pod_name)

formatted_cmd = template_string.format(command=cmd_string,
worker_init=self.worker_init)

logger.debug("Pod name :{}".format(pod_name))
logger.debug("Pod name: {}".format(pod_name))
self._create_pod(image=self.image,
pod_name=pod_name,
job_name=job_name,
Expand Down Expand Up @@ -350,3 +352,25 @@ def label(self):
@property
def status_polling_interval(self):
return 60


# this is based on:
# https://github.com/kubernetes/apimachinery/blob/703232ea6da48aed7ac22260dabc6eac01aab896/pkg/util/validation/validation.go#L177C32-L177C62
DNS_LABEL_REGEXP = "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$"


def _sanitizeDNS1123(raw: str) -> str:
"""Rewrite input string to be a valid RFC1123 DNS label.
This is required for Kubernetes pod names.
"""

# label must be lowercase
raw = raw.lower()

# label can only contain [-a-z0-9] characters - replace everything
# else with -
raw = re.sub("[^-a-z0-9]", "-", raw)

# TODO: sanitize against first and last symbols (no - at start or end?)
assert re.match(DNS_LABEL_REGEXP, raw), "sanitized DNS1123 label has not been properly sanitized: " + raw
return raw
3 changes: 3 additions & 0 deletions parsl/tests/test_bash_apps/test_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ def foo(x, y, z=10, stdout=None, label=None):
return f"echo {x} {y} {z}"


@pytest.mark.shared_fs
def test_command_format_1(tmpd_cwd):
"""Testing command format for BashApps"""

Expand All @@ -38,6 +39,7 @@ def test_command_format_1(tmpd_cwd):
assert so_content == "1 4 10"


@pytest.mark.shared_fs
def test_auto_log_filename_format(caplog):
"""Testing auto log filename format for BashApps
"""
Expand Down Expand Up @@ -66,6 +68,7 @@ def test_auto_log_filename_format(caplog):
assert record.levelno < logging.ERROR


@pytest.mark.shared_fs
def test_parallel_for(tmpd_cwd, n=3):
"""Testing a simple parallel for loop"""
outdir = tmpd_cwd / "outputs/test_parallel"
Expand Down
4 changes: 4 additions & 0 deletions parsl/tests/test_bash_apps/test_error_codes.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ def bad_format(stderr='std.err', stdout='std.out'):
whitelist = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'configs', '*threads*')


@pytest.mark.shared_fs
def test_div_0(test_fn=div_0):
err_code = test_matrix[test_fn]['exit_code']
f = test_fn()
Expand All @@ -73,6 +74,7 @@ def test_div_0(test_fn=div_0):
os.remove('std.out')


@pytest.mark.shared_fs
def test_bash_misuse(test_fn=bash_misuse):
err_code = test_matrix[test_fn]['exit_code']
f = test_fn()
Expand All @@ -87,6 +89,7 @@ def test_bash_misuse(test_fn=bash_misuse):
os.remove('std.out')


@pytest.mark.shared_fs
def test_command_not_found(test_fn=command_not_found):
err_code = test_matrix[test_fn]['exit_code']
f = test_fn()
Expand All @@ -103,6 +106,7 @@ def test_command_not_found(test_fn=command_not_found):
return True


@pytest.mark.shared_fs
def test_not_executable(test_fn=not_executable):
err_code = test_matrix[test_fn]['exit_code']
f = test_fn()
Expand Down
1 change: 1 addition & 0 deletions parsl/tests/test_bash_apps/test_kwarg_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ def foo(z=2, stdout=None):
return f"echo {z}"


@pytest.mark.shared_fs
def test_command_format_1(tmpd_cwd):
"""Testing command format for BashApps
"""
Expand Down
8 changes: 2 additions & 6 deletions parsl/tests/test_bash_apps/test_memoize.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ def fail_on_presence(outputs=()):
return 'if [ -f {0} ] ; then exit 1 ; else touch {0}; fi'.format(outputs[0])


# This test is an oddity that requires a shared-FS and simply
# won't work if there's a staging provider.
# @pytest.mark.sharedFS_required
@pytest.mark.shared_fs
def test_bash_memoization(tmpd_cwd, n=2):
"""Testing bash memoization
"""
Expand All @@ -29,9 +27,7 @@ def fail_on_presence_kw(outputs=(), foo=None):
return 'if [ -f {0} ] ; then exit 1 ; else touch {0}; fi'.format(outputs[0])


# This test is an oddity that requires a shared-FS and simply
# won't work if there's a staging provider.
# @pytest.mark.sharedFS_required
@pytest.mark.shared_fs
def test_bash_memoization_keywords(tmpd_cwd, n=2):
"""Testing bash memoization
"""
Expand Down
1 change: 1 addition & 0 deletions parsl/tests/test_bash_apps/test_memoize_ignore_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def no_checkpoint_stdout_app_ignore_args(stdout=None):
return "echo X"


@pytest.mark.shared_fs
def test_memo_stdout():

# this should run and create a file named after path_x
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ def no_checkpoint_stdout_app(stdout=None):
return "echo X"


@pytest.mark.shared_fs
def test_memo_stdout():

assert const_list_x == const_list_x_arg
Expand Down
1 change: 1 addition & 0 deletions parsl/tests/test_bash_apps/test_multiline.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ def multiline(inputs=(), outputs=(), stderr=None, stdout=None):
""".format(inputs=inputs, outputs=outputs)


@pytest.mark.shared_fs
def test_multiline(tmpd_cwd):
so, se = tmpd_cwd / "std.out", tmpd_cwd / "std.err"
f = multiline(
Expand Down
6 changes: 4 additions & 2 deletions parsl/tests/test_bash_apps/test_stdout.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def echo_to_streams(msg, stderr=None, stdout=None):
whitelist = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'configs', '*threads*')

speclist = (
'/bad/dir/t.out',
# '/bad/dir/t.out', - isn't bad if we're root - should be tagged issue3328 too...
['t3.out', 'w'],
('t4.out', None),
(42, 'w'),
Expand All @@ -26,7 +26,7 @@ def echo_to_streams(msg, stderr=None, stdout=None):
)

testids = [
'nonexistent_dir',
# 'nonexistent_dir', - goes with above /bad/dir/t.out
'list_not_tuple',
'null_mode',
'not_a_string',
Expand Down Expand Up @@ -73,6 +73,7 @@ def test_bad_stderr_file():


@pytest.mark.executor_supports_std_stream_tuples
@pytest.mark.shared_fs
def test_stdout_truncate(tmpd_cwd, caplog):
"""Testing truncation of prior content of stdout"""

Expand All @@ -92,6 +93,7 @@ def test_stdout_truncate(tmpd_cwd, caplog):
assert record.levelno < logging.ERROR


@pytest.mark.shared_fs
def test_stdout_append(tmpd_cwd, caplog):
"""Testing appending to prior content of stdout (default open() mode)"""

Expand Down
3 changes: 3 additions & 0 deletions parsl/tests/test_docs/test_from_slides.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import os

import pytest

from parsl.app.app import bash_app, python_app
from parsl.data_provider.files import File

Expand All @@ -15,6 +17,7 @@ def cat(inputs=[]):
return f.readlines()


@pytest.mark.staging_required
def test_slides():
"""Testing code snippet from slides """

Expand Down
3 changes: 3 additions & 0 deletions parsl/tests/test_docs/test_kwargs.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
"""Functions used to explain kwargs"""
from pathlib import Path

import pytest

from parsl import File, python_app


Expand All @@ -19,6 +21,7 @@ def reduce_app(inputs=()):
assert reduce_future.result() == 6


@pytest.mark.shared_fs
def test_outputs(tmpd_cwd):
@python_app()
def write_app(message, outputs=()):
Expand Down
1 change: 1 addition & 0 deletions parsl/tests/test_python_apps/test_outputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ def double(x, outputs=[]):
whitelist = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'configs', '*threads*')


@pytest.mark.shared_fs
def test_launch_apps(tmpd_cwd, n=2):
outdir = tmpd_cwd / "outputs"
outdir.mkdir()
Expand Down
1 change: 1 addition & 0 deletions parsl/tests/test_regression/test_226.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ def test_get_dataframe():
assert res.equals(data), 'Unexpected dataframe'


@pytest.mark.shared_fs
def test_bash_default_arg():
if os.path.exists('std.out'):
os.remove('std.out')
Expand Down
1 change: 1 addition & 0 deletions parsl/tests/test_staging/test_docs_1.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ def convert(inputs=[], outputs=[]):


@pytest.mark.cleannet
@pytest.mark.staging_required
def test():
# create an remote Parsl file
inp = File('ftp://ftp.iana.org/pub/mirror/rirstats/arin/ARIN-STATS-FORMAT-CHANGE.txt')
Expand Down
3 changes: 3 additions & 0 deletions parsl/tests/test_staging/test_output_chain_filenames.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from concurrent.futures import Future

import pytest

from parsl import File
from parsl.app.app import bash_app

Expand All @@ -14,6 +16,7 @@ def app2(inputs=(), outputs=(), stdout=None, stderr=None, mock=False):
return f"echo '{inputs[0]}' > {outputs[0]}"


@pytest.mark.shared_fs
def test_behavior(tmpd_cwd):
expected_path = str(tmpd_cwd / "simple-out.txt")
app1_future = app1(
Expand Down
Loading
Loading