Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

docs: update sgx doc and sgx-test tag #3349

Merged
merged 15 commits into from
May 29, 2020
198 changes: 14 additions & 184 deletions docs/topics/sgx.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,9 @@
<!-- TOC -->
- [Using SGX with Kubernetes](#using-sgx-with-kubernetes)
- [Deploy a Kubernetes Cluster](#deploy-a-kubernetes-cluster)
- [Running a SGX-enabled container](#running-a-sgx-enabled-container)
- [Deploying the SGX device plugin](#deploying-the-sgx-device-plugin)
- [Device plugin installation](#device-plugin-installation)
- [Running on Azure](#running-on-azure)
- [Running outside Azure](#running-outside-azure)
- [Scheduling Pods to TEE enabled Hardware](#scheduling-pods-to-tee-enabled-hardware)
- [Scheduling Pods to TEE enabled Hardware](#scheduling-pods-to-tee-enabled-hardware)
<!-- /TOC -->

[Intel&reg; Secure Guard Extension](https://software.intel.com/en-us/sgx) (Intel&reg; SGX) is an architecture extension designed to increase the security of application code and data.
Expand All @@ -20,8 +17,8 @@ Azure supports provisioning of SGX-enabled VMs under the umbrella of Azure Confi
Refer to the [Quickstart Guide](../tutorials/quickstart.md) for details on how to provision a cluster using AKS-Engine. In order to use SGX enabled hardware we suggest updating the cluster model to include an additional agentpool with the supported operating system and virtual machine size. See below for further detail.


| OS | distro |
| ------------ | ----------- |
| OS | distro |
| ------------ | ------------------- |
| Ubuntu 18.04 | `ubuntu-18.04-gen2` |

The following example is a fragment of a cluster definition (apimodel) file declaring two ACC agent pools, one running `Ubuntu 18.04` image on `2 vCPU` nodes, and another running on `4 vCPU` nodes:
Expand All @@ -43,177 +40,19 @@ The following example is a fragment of a cluster definition (apimodel) file decl
],
```

The SGX driver is automatically installed on every ACC node in your cluster, so you don't need to do that manually.

## Running a SGX-enabled container

When running an SGX container, you will need to mount the drivers from the host (the kubernetes node) into the container.

On the host, the drivers are installed under `/dev/sgx`.

Here is an example template of Pod YAML file:

```yaml
apiVersion: v1
kind: Pod
metadata:
name: <POD NAME>
spec:
containers:
- name: <NAME>
image: <IMAGE>
command: <COMMAND>
imagePullPolicy: IfNotPresent
volumeMounts:
- name: dev-sgx
mountPath: /dev/sgx
securityContext:
privileged: true
volumes:
- name: dev-sgx
hostPath:
path: /dev/sgx
type: CharDevice
```
Note: ACC Gen2 images have intel dcap driver v1.2.6 installed

## Deploying the SGX device plugin

You can install the SGX device plugin which surfaces the usage of Intel SGX’s Encrypted Page Cache (EPC) RAM as a schedulable resource for Kubernetes. This allows you to schedule pods that use the Open Enclave SDK onto hardware which supports Trusted Execution Environments.
You can install the SGX device plugin which surfaces the usage of Intel SGX’s Encrypted Page Cache (EPC) RAM as a schedulable resource for Kubernetes. This allows you to schedule pods that use the [Open Enclave SDK](https://github.com/openenclave/openenclave) or [Intel SGX SDK](https://github.com/intel/linux-sgx) onto hardware which supports Trusted Execution Environments.

### Device plugin installation

#### Running on Azure

*NOTE: For kubernetes versions before v1.17, replace
1. `node.kubernetes.io/instance-type` -> `beta.kubernetes.io/instance-type`
2. `kubernetes.io/os` -> `beta.kubernetes.io/os`

If you are deploying your cluster on Azure, you can leverage the `node.kubernetes.io/instance-type` label in your node selector rules to target only the [DCsv2-series](https://docs.microsoft.com/en-us/azure/virtual-machines/dcv2-series) nodes -

```yaml
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values:
- Standard_DC2s
- Standard_DC4s
- Standard_DC1s_v2
- Standard_DC2s_v2
- Standard_DC4s_v2
- Standard_DC8_v2
```
Using kubectl, install the device plugin DaemonSet:

```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: sgx-device-plugin
namespace: kube-system
labels:
app: sgx-device-plugin
spec:
selector:
matchLabels:
app: sgx-device-plugin
template:
metadata:
labels:
app: sgx-device-plugin
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values:
- Standard_DC2s
- Standard_DC4s
- Standard_DC1s_v2
- Standard_DC2s_v2
- Standard_DC4s_v2
- Standard_DC8_v2
- key: kubernetes.io/os
operator: In
values:
- linux
containers:
- name: sgx-device-plugin
image: mcr.microsoft.com/aks/acc/sgx-device-plugin:1.0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: dev-sgx
mountPath: /dev/sgx
securityContext:
privileged: true
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev-sgx
hostPath:
path: /dev/sgx
```
#### Running outside Azure
We recommend labelling the nodes so that a nodeSelector can be used to run the device plugin only on the nodes that support Intel SGX based Trusted Execution Environments. Use the following command to apply the appropriate labels to the Intel SGX enabled nodes:

`kubectl label nodes <node-name> tee=sgx`

We also recommend tainting the nodes so that only pods that toleration that taint are scheduled to that specific node. Apply the following taints to all nodes that are Intel SGX enabled using the following command:

`kubectl taint nodes <node-name> kubernetes.azure.com/sgx_epc_mem_in_MiB=true:NoSchedule`

Using kubectl, install the device plugin DaemonSet:

```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: sgx-device-plugin
namespace: kube-system
labels:
app: sgx-device-plugin
spec:
selector:
matchLabels:
app: sgx-device-plugin
template:
metadata:
labels:
app: sgx-device-plugin
spec:
tolerations:
- key: kubernetes.azure.com/sgx_epc_mem_in_MiB
operator: Exists
effect: NoSchedule
containers:
- name: sgx-device-plugin
image: "mcr.microsoft.com/aks/acc/sgx-device-plugin:1.0"
imagePullPolicy: IfNotPresent
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: dev-sgx
mountPath: /dev/sgx
securityContext:
privileged: true
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev-sgx
hostPath:
path: /dev/sgx
nodeSelector:
tee: sgx
```
Using kubectl, deploy device plugin DaemonSet:
1. For kubernetes versions before v1.17, use: <br>
kubectl create -f [device-plugin-before-k8s-1-17.yaml](sgx/device-plugin-before-k8s-1-17.yaml)
2. For kubernetes v1.17 and onwards, use: <br>
kubectl create -f [device-plugin.yaml](sgx/device-plugin.yaml)

Confirm that the DaemonSet pods are running on each Intel SGX enabled node as follows:

Expand Down Expand Up @@ -242,9 +81,9 @@ status:
<snip>
```

### Scheduling Pods to TEE enabled Hardware
## Scheduling Pods to TEE enabled Hardware

The following pod specification demonstrates how you would schedule a pod to have access to a TEE by defining a limit on the specific EPC memory that is advertised to the Kubernetes scheduler by the device plugin available in alpha.
The following pod specification demonstrates how you would schedule a pod to have access to a TEE by defining a limit on the specific EPC memory that is advertised to the Kubernetes scheduler by the device plugin

```yaml
apiVersion: apps/v1
Expand Down Expand Up @@ -274,7 +113,7 @@ spec:
kubernetes.azure.com/sgx_epc_mem_in_MiB: 10
```

You can use the following test workload to confirm that your cluster is correctly configured:
You can use the following test workload to confirm that your cluster is correctly configured ([Dockerfile](https://github.com/microsoft/openenclave-aks/blob/master/k8s-sgxtest/Dockerfile) for sgx-test):

```yaml
apiVersion: batch/v1
Expand All @@ -291,21 +130,12 @@ spec:
spec:
containers:
- name: sgxtest
image: oeciteam/sgx-test
command: ["/helloworld/host/helloworldhost", "/helloworld/enclave/helloworldenc.signed"]
volumeMounts:
- mountPath: /dev/sgx
name: dev-sgx
image: oeciteam/sgx-test:1.0
securityContext:
privileged: true
resources:
limits:
kubernetes.azure.com/sgx_epc_mem_in_MiB: 10
volumes:
- name: dev-sgx
hostPath:
path: /dev/sgx
type: CharDevice
restartPolicy: Never
backoffLimit: 0
```
Expand Down
52 changes: 52 additions & 0 deletions docs/topics/sgx/device-plugin-before-k8s-1-17.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: sgx-device-plugin
namespace: kube-system
labels:
app: sgx-device-plugin
spec:
selector:
matchLabels:
app: sgx-device-plugin
template:
metadata:
labels:
app: sgx-device-plugin
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/instance-type
operator: In
values:
- Standard_DC2s
- Standard_DC4s
- Standard_DC1s_v2
- Standard_DC2s_v2
- Standard_DC4s_v2
- Standard_DC8_v2
- key: beta.kubernetes.io/os
operator: In
values:
- linux
containers:
- name: sgx-device-plugin
image: mcr.microsoft.com/aks/acc/sgx-device-plugin:1.0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: dev-sgx
mountPath: /dev/sgx
securityContext:
privileged: true
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev-sgx
hostPath:
path: /dev/sgx
52 changes: 52 additions & 0 deletions docs/topics/sgx/device-plugin.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: sgx-device-plugin
namespace: kube-system
labels:
app: sgx-device-plugin
spec:
selector:
matchLabels:
app: sgx-device-plugin
template:
metadata:
labels:
app: sgx-device-plugin
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values:
- Standard_DC2s
- Standard_DC4s
- Standard_DC1s_v2
- Standard_DC2s_v2
- Standard_DC4s_v2
- Standard_DC8_v2
- key: kubernetes.io/os
operator: In
values:
- linux
containers:
- name: sgx-device-plugin
image: mcr.microsoft.com/aks/acc/sgx-device-plugin:1.0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: dev-sgx
mountPath: /dev/sgx
securityContext:
privileged: true
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev-sgx
hostPath:
path: /dev/sgx
18 changes: 9 additions & 9 deletions test/e2e/kubernetes/workloads/sgx-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@ apiVersion: batch/v1
kind: Job
metadata:
name: sgx-test
labels:
app: sgx-test
spec:
template:
metadata:
labels:
app: sgx-test
spec:
containers:
- name: sgxtest
image: oeciteam/sgx-test
command: ["/helloworld/host/helloworldhost", "/helloworld/enclave/helloworldenc.signed"]
volumeMounts:
- mountPath: /dev/sgx
name: dev-sgx
image: oeciteam/sgx-test:1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I assume this image has the equivalent "run something to validate basic SGX functionality and exit 0 if success" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! It has the same helloworld application, just moved the executing command to the image itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running tests to validate this now, then will merge, thanks!

Btw, do we know why we are only testing Kubernetes v1.15 and v1.16? Does the sgx-device-plugin not work w/ v1.17+ for some reason (that would be weird), or do we need to enable tests for those versions of Kubernetes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly is this test happening? There's a different version of the yaml file for k8s v1.17 onwards because of changes from beta mode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this config as input to the Jenkinsfile in the root of the project:

https://github.com/Azure/aks-engine/blob/master/test/e2e/test_cluster_configs/sgx.json

Essentially, by building an 18.04-LTS node pool w/ the Standard_DC2s VM SKU, we expect that the drivers will be installed and that this sgx-test.yaml will be properly scheduled and executed on a node in the cluster.

I'm building a 1.18 cluster now /w a 18.04 + Standard_DC2s node pool, I'll report back...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sgx-test.yml mounts /dev/sgx so it essentially doesn't depend on the plugin...

securityContext:
privileged: true
volumes:
- name: dev-sgx
hostPath:
path: /dev/sgx
resources:
limits:
kubernetes.azure.com/sgx_epc_mem_in_MiB: 10
restartPolicy: Never
backoffLimit: 0