When using the Kubernetes Python client library, we must first load authentication and cluster information.
First, you need to setup the required service account and roles.
kubectl create -f k8s/rbac.yaml
This command creates a new service account named python-client-sa
, a new role with the needed permissions in the
spark-jobs
namespace and then binds the new role to the newly created service account.
WARNING: The python-client-sa
is the service account that will provide the identity for the Kubernetes Python
Client in the spark_client
library. Do not confuse this service account with the driver-sa
service account for
driver pods.
In this method, we can use an helper utility to load authentication and cluster information from a kubeconfig
file and
store them in kubernetes.client.configuration
.
from kubernetes import config, client
config.load_kube_config("path/to/kubeconfig_file")
v1 = client.CoreV1Api()
print("Listing pods with their IPs:")
ret = v1.list_namespaced_pod(namespace="spark-jobs")
for i in ret.items:
print("%s\t%s\t%s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))
But we DO NOT want to rely on the default kubeconfig
file, denoted by the environment variable KUBECONFIG
or
, failing that, in ~/.kube/config
. This kubeconfig
file is yours, as user of the kubectl
command. Concretely
, with this kubeconfig
file, you have the right to do almost everything in the K8s cluster, and in all namespaces
. Instead, we're going to generate one especially for the service account created above, with the help of the script
kubeconfig-gen.sh
.
The kubeconfig-gen.sh
script effectively uses the default kubeconfig
file, but its purpose is to generate another
kubeconfig
file that configures access to the cluster for the python-client-sa
service account, with only the
rights needed for the spark_client
Python library in the single namespace spark-jobs
("principle of least
privilege").
Here, we're going to configure the Python client in the most programmatic way possible.
First, we need to fetch the credentials to access the Kubernetes cluster. We’ll store these in python environmental
variables.
export APISERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')
SECRET_NAME=$(kubectl get serviceaccount python-client-sa -o jsonpath='{.secrets[0].name}')
export TOKEN=$(kubectl get secret ${SECRET_NAME} -o jsonpath='{.data.token}' | base64 --decode)
export CACERT=$(kubectl get secret ${SECRET_NAME} -o jsonpath="{['data']['ca\.crt']}")
Note that environment variables are captured the first time the os
module is imported, typically during IDE/Python
startup. Changes to the environment made after this time are not reflected in os.environ
(except for changes made by
modifying os.environ directly).
import base64
import os
from tempfile import NamedTemporaryFile
from kubernetes import client
api_server = os.environ["APISERVER"]
cacert = os.environ["CACERT"]
token = os.environ["TOKEN"]
# Set the configuration
configuration = client.Configuration()
with NamedTemporaryFile(delete=False) as cert:
cert.write(base64.b64decode(cacert))
configuration.ssl_ca_cert = cert.name
configuration.host = api_server
configuration.verify_ssl = True
configuration.debug = False
configuration.api_key = {"authorization": "Bearer " + token}
client.Configuration.set_default(configuration)
v1 = client.CoreV1Api()
print("Listing pods with their IPs:")
ret = v1.list_namespaced_pod(namespace="spark-jobs")
for i in ret.items:
print("%s\t%s\t%s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))