Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPICH support #562

Merged
merged 16 commits into from
Jun 16, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ BASE_IMAGE_SSH_PORT?=2222
IMG_BUILDER=docker
PLATFORMS ?= linux/amd64
INTEL_PLATFORMS ?= linux/amd64
MPICH_PLATFORMS ?= linux/amd64
sheevy marked this conversation as resolved.
Show resolved Hide resolved
LD_FLAGS_V2=" \
-X '${REPO_PATH}/pkg/version.GitSHA=${GitSHA}' \
-X '${REPO_PATH}/pkg/version.Built=${Date}' \
Expand Down Expand Up @@ -71,6 +72,7 @@ test: bin/envtest scheduler-plugins-crd
test_e2e: export TEST_MPI_OPERATOR_IMAGE=${IMAGE_NAME}:${RELEASE_VERSION}
test_e2e: export TEST_OPENMPI_IMAGE=mpioperator/mpi-pi:${RELEASE_VERSION}-openmpi
test_e2e: export TEST_INTELMPI_IMAGE=mpioperator/mpi-pi:${RELEASE_VERSION}-intel
test_e2e: export TEST_MPICH_IMAGE=mpioperator/mpi-pi:${RELEASE_VERSION}-mpich
test_e2e: bin/kubectl kind helm images test_images dev_manifest scheduler-plugins-chart
go test -v ./test/e2e/...

Expand Down Expand Up @@ -108,6 +110,9 @@ test_images:
${IMG_BUILDER} build $(BUILD_ARGS) --platform $(INTEL_PLATFORMS) --build-arg BASE_LABEL=${RELEASE_VERSION} -t mpioperator/intel:${RELEASE_VERSION} build/base -f build/base/intel.Dockerfile
${IMG_BUILDER} build $(BUILD_ARGS) --platform $(INTEL_PLATFORMS) -t mpioperator/intel-builder:${RELEASE_VERSION} build/base -f build/base/intel-builder.Dockerfile
${IMG_BUILDER} build $(BUILD_ARGS) --platform $(INTEL_PLATFORMS) --build-arg BASE_LABEL=${RELEASE_VERSION} -t mpioperator/mpi-pi:${RELEASE_VERSION}-intel examples/v2beta1/pi -f examples/v2beta1/pi/intel.Dockerfile
${IMG_BUILDER} build $(BUILD_ARGS) --platform $(MPICH_PLATFORMS) --build-arg BASE_LABEL=${RELEASE_VERSION} -t mpioperator/mpich:${RELEASE_VERSION} build/base -f build/base/mpich.Dockerfile
${IMG_BUILDER} build $(BUILD_ARGS) --platform $(MPICH_PLATFORMS) -t mpioperator/mpich-builder:${RELEASE_VERSION} build/base -f build/base/mpich-builder.Dockerfile
${IMG_BUILDER} build $(BUILD_ARGS) --platform $(MPICH_PLATFORMS) --build-arg BASE_LABEL=${RELEASE_VERSION} -t mpioperator/mpi-pi:${RELEASE_VERSION}-mpich examples/v2beta1/pi -f examples/v2beta1/pi/mpich.Dockerfile

.PHONY: tidy
tidy:
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,12 @@ For a sample that uses Intel MPI, see:
cat examples/pi/pi-intel.yaml
```

For a sample that uses MPICH, see:

```bash
cat examples/pi/pi-mpich.yaml
```

## Exposed Metrics

| Metric name | Metric type | Description | Labels |
Expand Down
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this necessary?

In OpenMPI, we don't need this file, because OpenMPI is failure tolerant and implements retries when a host is not found.

IntelMPI just fails. What happens with MPICH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall having tested it last year, during the previous PR. And it was required. This makes sense, since IntelMPI is based on MPICH.

File renamed without changes.
2 changes: 1 addition & 1 deletion build/base/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@ RUN apt update \
intel-oneapi-mpi \
&& rm -rf /var/lib/apt/lists/*

COPY intel-entrypoint.sh /entrypoint.sh
COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
7 changes: 7 additions & 0 deletions build/base/mpich-builder.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM debian:bullseye as builder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should be using a newer version, but given that mpioperator/base still uses bullseye, this makes sense.

If you have the chance, I would welcome a PR to update the base image everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Happy to do it in a separate PR.


RUN apt update \
&& apt install -y --no-install-recommends \
g++ \
libmpich-dev \
&& rm -rf /var/lib/apt/lists/*
12 changes: 12 additions & 0 deletions build/base/mpich.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
ARG BASE_LABEL

FROM mpioperator/base:${BASE_LABEL}

RUN apt update \
&& apt install -y --no-install-recommends \
dnsutils \
mpich \
&& rm -rf /var/lib/apt/lists/*

COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
3 changes: 2 additions & 1 deletion deploy/v2beta1/mpi-operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,11 @@ spec:
mpiImplementation:
default: OpenMPI
description: MPIImplementation is the MPI implementation. Options
are "OpenMPI" (default) and "Intel".
are "OpenMPI" (default), "Intel" and "MPICH".
enum:
- OpenMPI
- Intel
- MPICH
type: string
mpiReplicaSpecs:
additionalProperties:
Expand Down
3 changes: 1 addition & 2 deletions examples/v2beta1/pi/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ FROM mpioperator/openmpi-builder:${BASE_LABEL} as builder
COPY pi.cc /src/pi.cc
RUN mpic++ /src/pi.cc -o /pi


FROM mpioperator/openmpi:${BASE_LABEL}

COPY --from=builder /pi /home/mpiuser/pi
COPY --from=builder /pi /home/mpiuser/pi
8 changes: 7 additions & 1 deletion examples/v2beta1/pi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,15 @@ For Intel MPI:
docker build -t mpi-pi . -f intel.Dockerfile
```

For MPICH:

```bash
docker build -t mpi-pi . -f mpich.Dockerfile
```

## Create MPIJob

Modify `pi.yaml` (for OpenMPI) or `pi-intel.yaml` (for Intel MPI) to set up the
Modify `pi.yaml` (for OpenMPI), `pi-intel.yaml` (for Intel MPI) or `pi-mpich.yaml` (for MPICH) to set up the
image name from your own registry.

Then, run:
Expand Down
2 changes: 1 addition & 1 deletion examples/v2beta1/pi/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ RUN bash -c "source /opt/intel/oneapi/setvars.sh && mpicxx /src/pi.cc -o /pi"

FROM mpioperator/intel:${BASE_LABEL}

COPY --from=builder /pi /home/mpiuser/pi
COPY --from=builder /pi /home/mpiuser/pi
10 changes: 10 additions & 0 deletions examples/v2beta1/pi/mpich.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
ARG BASE_LABEL

FROM mpioperator/mpich-builder:${BASE_LABEL} as builder

COPY pi.cc /src/pi.cc
sheevy marked this conversation as resolved.
Show resolved Hide resolved
RUN mpic++ /src/pi.cc -o /pi

FROM mpioperator/mpich:${BASE_LABEL}

COPY --from=builder /pi /home/mpiuser/pi
54 changes: 54 additions & 0 deletions examples/v2beta1/pi/pi-mpich.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
name: pi
spec:
slotsPerWorker: 1
runPolicy:
cleanPodPolicy: Running
sshAuthMountPath: /home/mpiuser/.ssh
mpiImplementation: MPICH
mpiReplicaSpecs:
Launcher:
replicas: 1
template:
spec:
containers:
- image: mpioperator/mpi-pi:mpich
imagePullPolicy: Always
name: mpi-launcher
securityContext:
runAsUser: 1000
args:
- mpirun
- -n
- "2"
- /home/mpiuser/pi
resources:
limits:
cpu: 1
memory: 1Gi
Worker:
replicas: 2
template:
spec:
containers:
- image: mpioperator/mpi-pi:mpich
imagePullPolicy: Always
name: mpi-worker
securityContext:
runAsUser: 1000
command:
args:
- /usr/sbin/sshd
- -De
- -f
- /home/mpiuser/.sshd_config
readinessProbe:
tcpSocket:
port: 2222
initialDelaySeconds: 2
resources:
limits:
cpu: 1
memory: 1Gi
3 changes: 2 additions & 1 deletion manifests/base/kubeflow.org_mpijobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,11 @@ spec:
mpiImplementation:
default: OpenMPI
description: MPIImplementation is the MPI implementation. Options
are "OpenMPI" (default) and "Intel".
are "OpenMPI" (default), "Intel" and "MPICH".
enum:
- OpenMPI
- Intel
- MPICH
type: string
mpiReplicaSpecs:
additionalProperties:
Expand Down
30 changes: 29 additions & 1 deletion pkg/apis/kubeflow/v2beta1/default_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ func TestSetDefaults_MPIJob(t *testing.T) {
},
},
},
"base defaults overridden": {
"base defaults overridden (intel)": {
job: MPIJob{
Spec: MPIJobSpec{
SlotsPerWorker: newInt32(10),
Expand Down Expand Up @@ -66,6 +66,34 @@ func TestSetDefaults_MPIJob(t *testing.T) {
},
},
},
"base defaults overridden (mpich)": {
job: MPIJob{
Spec: MPIJobSpec{
SlotsPerWorker: newInt32(10),
RunPolicy: RunPolicy{
CleanPodPolicy: NewCleanPodPolicy(CleanPodPolicyRunning),
TTLSecondsAfterFinished: newInt32(2),
ActiveDeadlineSeconds: newInt64(3),
BackoffLimit: newInt32(4),
},
SSHAuthMountPath: "/home/mpiuser/.ssh",
MPIImplementation: MPIImplementationMPICH,
},
},
want: MPIJob{
Spec: MPIJobSpec{
SlotsPerWorker: newInt32(10),
RunPolicy: RunPolicy{
CleanPodPolicy: NewCleanPodPolicy(CleanPodPolicyRunning),
TTLSecondsAfterFinished: newInt32(2),
ActiveDeadlineSeconds: newInt64(3),
BackoffLimit: newInt32(4),
},
SSHAuthMountPath: "/home/mpiuser/.ssh",
MPIImplementation: MPIImplementationMPICH,
},
},
},
"launcher defaults": {
job: MPIJob{
Spec: MPIJobSpec{
Expand Down
2 changes: 1 addition & 1 deletion pkg/apis/kubeflow/v2beta1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pkg/apis/kubeflow/v2beta1/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@
],
"properties": {
"mpiImplementation": {
"description": "MPIImplementation is the MPI implementation. Options are \"OpenMPI\" (default) and \"Intel\".",
"description": "MPIImplementation is the MPI implementation. Options are \"OpenMPI\" (default), \"Intel\" and \"MPICH\".",
"type": "string"
},
"mpiReplicaSpecs": {
Expand Down
5 changes: 3 additions & 2 deletions pkg/apis/kubeflow/v2beta1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -155,8 +155,8 @@ type MPIJobSpec struct {
SSHAuthMountPath string `json:"sshAuthMountPath,omitempty"`

// MPIImplementation is the MPI implementation.
// Options are "OpenMPI" (default) and "Intel".
// +kubebuilder:validation:Enum:=OpenMPI;Intel
// Options are "OpenMPI" (default), "Intel" and "MPICH".
// +kubebuilder:validation:Enum:=OpenMPI;Intel;MPICH
// +kubebuilder:default:=OpenMPI
MPIImplementation MPIImplementation `json:"mpiImplementation,omitempty"`
}
Expand All @@ -177,6 +177,7 @@ type MPIImplementation string
const (
MPIImplementationOpenMPI MPIImplementation = "OpenMPI"
MPIImplementationIntel MPIImplementation = "Intel"
MPIImplementationMPICH MPIImplementation = "MPICH"
)

// JobStatus represents the current observed state of the training Job.
Expand Down
3 changes: 2 additions & 1 deletion pkg/apis/kubeflow/validation/validation.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ var (

validMPIImplementations = sets.NewString(
string(kubeflow.MPIImplementationOpenMPI),
string(kubeflow.MPIImplementationIntel))
string(kubeflow.MPIImplementationIntel),
string(kubeflow.MPIImplementationMPICH))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm yet waiting for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in now. I tried to refactor into a common function, so both Intel and MPICH tests just call it. As the part of that refactoring "valid with worker" tests uses has different restart policy, then it had before. However, I'm fine with that as it's not what's being tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Let me check.


validRestartPolicies = sets.NewString(
string(common.RestartPolicyNever),
Expand Down
19 changes: 16 additions & 3 deletions pkg/controller/mpi_job_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,16 @@ var (
Value: "-o ConnectionAttempts=10",
},
}
mpichEnvVars = []corev1.EnvVar{
{
Name: "HYDRA_HOST_FILE",
Value: fmt.Sprintf("%s/%s", configMountPath, hostfileName),
},
{
Name: "HYDRA_LAUNCH_EXTRA_ARGS",
Value: "-o ConnectionAttempts=10",
sheevy marked this conversation as resolved.
Show resolved Hide resolved
},
}
nvidiaDisableEnvVars = []corev1.EnvVar{
{Name: "NVIDIA_VISIBLE_DEVICES"},
{Name: "NVIDIA_DRIVER_CAPABILITIES"},
Expand Down Expand Up @@ -588,8 +598,9 @@ func (c *MPIJobController) syncHandler(key string) error {
return err
}
}
if mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationIntel {
// The Intel implementation requires workers to communicate with the
if mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationIntel ||
mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationMPICH {
// The Intel and MPICH implementations require workers to communicate with the
// launcher through its hostname. For that, we create a Service which
// has the same name as the launcher's hostname.
_, err := c.getOrCreateService(mpiJob, newLauncherService(mpiJob))
Expand Down Expand Up @@ -1201,7 +1212,7 @@ func newConfigMap(mpiJob *kubeflow.MPIJob, workerReplicas int32) *corev1.ConfigM
switch mpiJob.Spec.MPIImplementation {
case kubeflow.MPIImplementationOpenMPI:
buffer.WriteString(fmt.Sprintf("%s%s-%d.%s.%s.svc slots=%d\n", mpiJob.Name, workerSuffix, i, workersService, mpiJob.Namespace, slots))
case kubeflow.MPIImplementationIntel:
case kubeflow.MPIImplementationIntel, kubeflow.MPIImplementationMPICH:
buffer.WriteString(fmt.Sprintf("%s%s-%d.%s.%s.svc:%d\n", mpiJob.Name, workerSuffix, i, workersService, mpiJob.Namespace, slots))
}
}
Expand Down Expand Up @@ -1429,6 +1440,8 @@ func (c *MPIJobController) newLauncherPodTemplate(mpiJob *kubeflow.MPIJob) corev
Name: intelMPISlotsEnv,
Value: slotsStr,
})
case kubeflow.MPIImplementationMPICH:
container.Env = append(container.Env, mpichEnvVars...)
}

container.Env = append(container.Env,
Expand Down
Loading