- Transforming BOSH concepts to Kubernetes
- How do we rename things going from one version to the next?
- Canary support in QuarksStatefulSets
- Missing support for the
allow_executions
flag in bpm configs
- releases are defined in the usual way (a
releases
block), but the information given is used to build a reference for a docker image - each instance group is transformed to an
QuarksStatefulSet
or anQuarksJob
- each BOSH Job corresponds to one or more containers in the
Pod
template defined in theQuarksStatefulSet
or theQuarksJob
; there's one container for each process defined in the BPM information of each BOSH Job - "explicit"
variables
are generated usingQuarksSecrets
- for rendering of BOSH Job Templates, please read this document
- we have a concept of Desired Manifests
- all communication happens through Kubernetes
Services
, which have deterministic DNS Addresses; you can read more about these here
Please read the documentation for the BOSHDeployment
controller.
---
# The name of the deployment. Replace the name with the name of the BOSHDeployment resource
# It's used to namespace resources created for this deployment.
# Based on docs [1], names should be less than 253 characters. We should limit this to
# characters in the operator, to make sure that with any suffix, we won't go beyond the limit.
name: "foo"
# Not used by the cf-operator.
# A warning is printed in the logs if this is present.
director_uuid: "bar"
# A hash of director features. We could use this to control operator features as well.
features:
# Enable variables to be regenerated by the config server (e.g. CredHub) when the variable options change. Default false.
# In the cf-operator, if an QuarksSecret is changed, (e.g. a new domain is added to a cert),
# the value will be automatically updated.
# The operator won't be able to control this behavior.
# A warning is printed in the logs if this is present.
converge_variables: true
# Randomizes AZs for left over instances that cannot be distributed equally between AZs.
# Not currently used. It's likely that we'll be able to support this.
randomize_az_placement: false
# Enables or disables returning of DNS addresses in links.
# In Kubernetes we always use DNS addresses.
# An error should be returned if this value is set to false.
use_dns_addresses: true
# A list of all releases used in this deployment.
# Required.
# Each release's image reference is constructed from this information like this:
# <url>/<name>:<stemcell.os>-<stemcell.version>-<version>
releases:
# Name of a release used in the deployment.
name: "capi-release"
# The version of the release to be used.
# "latest" is not supported by the cf-operator. An error is thrown if "latest" is used.
version: "1.0"
# Required for the operator. Link to the registry and organization containing the image.
url: "docker.io/cloudfoundry"
# Not used by the cf-operator.
# Integrity of the image itself is handled by whatever
# container runtime and the image registry.
sha1: "332ac15609b220a3fdf5efad0e0aa069d8235788"
# Required by the operator
stemcell:
# OS of the stemcell used by the release. Used to construct the image name.
os: "opensuse"
# Version of the OS of the stemcell used by the release.
version: "42.3"
# Only used by the cf-operator.
# A secret is created with the credentials [2], used by the pods
# that reference this release.
credentials:
username: "foo"
password: "secret"
# Not used by the cf-operator.
# A warning is logged if this is set
stemcells: []
# Specifies how updates are handled
# The cf-operator uses some of these settings.
update:
# The number of pods to deploy in the new version of an QuarksStatefulSet
# Once canaries are running, deployment can continue.
# TODO: Support for canaries needs implementation in QuarksStatefulSet.
canaries: 2
# Time to wait for canary pods to be ready in a new version of an QuarksStatefulSet
canary_watch_time: 100
# The maximum number of non-canary instances to update in parallel for an QuarksStatefulSet.
# TODO: Support for this needs to be implemented in the controller.
max_in_flight: 2
# TODO: is there a need for this in QuarksStatefulSet (in a readiness Probe?)
update_watch_time: 0
# Not used in cf-operator.
# All instance groups are deployed at the same time.
# If set to true, a warning is logged.
serial: false
# Not used in cf-operator.
# If set, a warning is logged.
vm_strategy: ""
# Each instance group is converted into an QuarksStatefulSet
instance_groups:
# Used to name the QuarksStatefulSet or QuarksJob
- name: "api-az1"
# Support for AZs is implemented in the QuarksStatefulSet
azs: ["az1"]
# Number of replicas for the StatefulSets in an QuarksStatefulSet
# If this instance group defines an QuarksJob, this value must be 1. An error is thrown otherwise
instances: 3
# Each job results in a rendered bpm.yml file.
# BPM information is required - the deployment fails if it's missing.
# Each job has one or more processes (defined in bpm.yml), and each corresponds to a container of a pod in a StatefulSet or Job
jobs:
# It's used to name the container
- name: "cloud_controller_ng"
# The name of a release that must exist in the releases block.
# If it doesn't exist in the releases block, an error is thrown.
# The docker image used for the container is resolved using this release name.
release: "capi-release"
# Used by the cf-operator to calculate links before rendering templates.
# All resources in the cf-operator are deterministic (IP addresses are not used),
# So they can be calculated before template rendering occurs.
consumes: {}
# Same as the consumes block above.
provides: {}
# Defines all properties, used to render job templates.
# Job templates are rendered as Secrets, and then mounted into pod containers.
# If a property is changed, the operator runs rendering in an QuarksJob, and the
# template's secret is (re)generated.
# All properties are input to this QuarksJob that does rendering.
# Some properties can reference variables, which can be generated. The cf-operator
# collects values for all properties before starting the rendering process.
properties:
domain: "mycf.com"
admin_password: "((adminpass))"
# Extra information specific to the cf-operator
quarks:
run:
# Hints for pod replica count
scaling:
min: 1
max: 3
ha: 2
# Extra capabilities required by the containers of this job
capabilities: []
# Memory used by each container. Overrides info from vm_resources.
memory: 128
# Number of vCPUs used by each container. Overrides info from vm_resources.
virtual-cpus: 2
# Healthcheck information for the containers in this job.
healthcheck:
some_process_name:
readiness:
exec:
command:
- "curl --silent --fail --head http://${HOSTNAME}:8080/health"
# List of ports to be opened up for this job.
ports:
- name: "health-port"
protocol: "TCP"
internal: 8080
# Not used by the cf-operator.
# A warning is logged if this is set.
vm_type: ""
# Not used by the cf-operator.
# A warning is logged if this is set.
vm_extensions: []
# Used by the cf-operator to limit the resources used by a container in a pod
vm_resources:
# Number of vCPUs used by a container
cpu: 4
# Memory used by a container
ram: 1024
# Used for PVC sizes if `ephemeralAsPVC` is set to true
ephemeral_disk_size: 4096
# Not used by the cf-operator.
# A warning is logged if this is set.
stemcell: ""
# Size of the volume attached to a pod container.
persistent_disk: 4096
# This must be the name of a StorageClass used by the cf-operator to create volumes.
persistent_disk_type: "default"
# Not used by the cf-operator.
# A warning is logged if this key is set.
networks:
# Not used by the cf-operator
- name: "foo"
# Not used by the cf-operator
static_ips: []
# Not used by the cf-operator
default: []
# Specific update settings for this instance group. Use this to override global job update settings on a per-instance-group basis.
update: {}
# TODO: understand how instance group renames can occur in an QuarksStatefulSet or QuarksJob
migrated_from:
- cloud_controller
# This is the key that controls how an instance group is treated by the cf-operator.
# If lifecycle is "service", an QuarksStatefulSet is created for the instance group.
# Otherwise, if it's "errand", an QuarksJob is created. As with normal BOSH, errands have a
# manual trigger, so QuarksJobs have to support this (manual triggers).
# In Kubernetes we also need errands that can run on a trigger. These are not supported by BOSH.
# The lifecycle for such an QuarksJob is "auto-errand".
# Manual triggers are supported by QuarksJobs
lifecycle: "service"
# Deprecated - the cf-operator does not support this key.
# An error is thrown if this is set.
properties: {}
# Usually used for BOSH Agent configuration.
# We can use this hash to control how the operator generates resources, however
# none of the settings used by the Agent are supported by the operator.
env:
# Not used by the cf-operator.
# A warning is logged if this is set.
persistent_disk_fs: "ext4"
# Not used by the cf-operator.
# A warning is logged if this is set.
persistent_disk_mount_options: []
# Not used by the cf-operator.
# A warning is logged if this is set.
bosh [Hash, optional]:
# Not used by the cf-operator.
# A warning is logged if this is set.
password: "foo"
# Not used by the cf-operator.
# A warning is logged if this is set.
keep_root_password: vcap
# Not used by the cf-operator.
# A warning is logged if this is set.
remove_dev_tools: false
# Not used by the cf-operator.
# A warning is logged if this is set.
remove_static_libraries: false.
# Not used by the cf-operator.
# A warning is logged if this is set.
swap_size: 100
# Not used by the cf-operator.
# A warning is logged if this is set.
ipv6:
# Not used by the cf-operator.
# A warning is logged if this is set.
enable: false
# Not used by the cf-operator.
# A warning is logged if this is set.
job_dir:
# Not used by the cf-operator.
# A warning is logged if this is set.
tmpfs: false
# Not used by the cf-operator.
# A warning is logged if this is set.
tmpfs_size: "0m"
agent:
# Not used by the cf-operator.
# A warning is logged if this is set.
tmpfs: false
# Used by the cf-operator to set kubernetes-specific information
# for the resources representing this instance group.
settings:
# Affinity information for this instance group's pod.
# These definitions are merged directly into the pod's definition.
# The structure is the same as the one used by Kube [3].
affinity: {}
# Labels to add to the resources representing the instance group
labels: {}
# Annotations to add to the resources representing the instance group
annotations: {}
# disable_log_sidecar is an option to disable log sidecar
disable_log_sidecar: false
# serviceAccountName is the name of the ServiceAccount to use to run this pod.
serviceAccountName: kubecf
# automountServiceAccountToken indicates whether a service account token should be automatically mounted
automountServiceAccountToken: false
# ImagePullSecrets is an optional list of references to secrets to use for pulling any of the images.
# This field in PodSpec can be automated by setting the imagePullSecrets in a serviceAccount.
imagePullSecrets: {}
# Tolerations and taints are a concept defined in kubernetes to repel pods from nodes. [4]
tolerations: []
# If this is set to true, the operator will define a PersistentVolumeClaim template
# on the QuarksStatefulSet of the instance group, and it will use that PVC for all volume
# mounts for ephemeral disks
ephemeralAsPVC: false
# This sets the backoffLimit for the jobs running errands. If not set, it will use the Kube default which is 6.
# https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#handling-pod-and-container-failures
jobBackoffLimit: 6
# An array of disks to be mounted on the containers
disks:
# A PersistentVolumeClaim to be used as a template in the StatefulSet of the instance group.
- pvc:
name: foo
storageClassName: persistent
# Volume definition to be included in the pod.
volume:
name: extravolume
emptyDir: {}
# Volume mounts to be set on the containers that match the job and process set in "filters".
volumeMount:
name: extravolume
mountPath: /var/vcap/data/rep
# Filters to identify on which containers to apply the volume mounts.
filters:
job_name: "cflinuxfs3-rootfs-setup"
process_name: "test-server"
# Each addon job is added to the desired manifest before it's persisted
# Not all placement rules are supported, see below for more details.
addons:
# The name of the addon is not used by the operator.
# TODO: investigate whether it's useful to set this in an annotation of the instance group sts/pod
- name: foo
# All jobs are added to instance groups based on placement rules before the desired manifest is persisted
jobs:
- name: metron
release: loggregator-release
properties:
loggregator:
metron:
log_level: debug
include:
# Supported
stemcell:
- os: opensuse
# Not supported, addons are used per-deployment
deployments: []
# Supported
jobs:
name: cloud_controller_ng
release: capi-release
# Supported
instance_groups:
- api
- diego-cell
# Not supported
networks: []
# Not supported
teams: []
# The same matchers are supported as the "include" key
exclude: {}
# Deprecated - the cf-operator does not support this key.
# An error is thrown if this is set.
properties: {}
# For each variable, the cf-operator creates QuarksSecrets
# As with normal BOSH, variables are referenced by job properties.
# Each variable's generated secret is mounted in the container that renders each job's
# templates. They are then used by the rendering process.
# This means that the operator needs to look at the job's properties, and parse any references
# to variables, so it knows what it needs to mount.
variables:
# Unique name used to identify a variable. Used to name the QuarksSecret
- name: "adminPass"
# As with normal BOSH, supported types are certificate, password, rsa, and ssh.
type: "password"
# Specifies generation options
options: {is_ca: true, common_name: "some-ca"}
# Tags are transformed into annotations for the resources created
# by this deployment.
tags:
maintainer: "Philip J. Fry"
- [1] https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
- [2] https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
- [3] https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
- [4] https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
In a BOSH release some jobs have BPM configuration in templates/bpm.yml.erb
. Each process specified in the BPM configuration is run in a single Kubernetes Container
as part of a Pod
.
The following subsections describe the mapping of BPM configuration into containers.
Bosh | Kube Pod Container |
---|---|
executable |
command |
args |
args |
env |
env |
Bosh | Kube Pod Container |
---|---|
workdir |
workingDir . Not implemented yet. |
hooks |
initContainers . and container hooks. Not implemented yet. |
process.capabilities |
container.SecurityContext.Capabilities . |
limits |
container.Resources.Limits . Not implemented yet. |
ephemeral_disk |
emptyDir volumes by default, but can be PersistentVolumeClaims if ephemeralAsPVC is set on the bosh.agent.settings . |
persistent_disk |
PersistentVolumeClaims . Not yet implemented. |
additional_volumes |
emptyDir . Paths under /var/vcap/store are currently ignored. |
unsafe.unrestricted_volumes |
emptyDir . Paths under /var/vcap/store are currently ignored. |
unsafe.privileged |
container.SecurityContext.Privileged . |
BPM doesn't provide information for health checks and relies on monit instead. CF-Operator provides health checks via the quarks property key in the deployment manifest.
In Kubernetes, we use liveness and readiness probes for healthchecks.
BPM supports pre_start
hooks. CF-Operator will convert those to additional init containers.
In addition, there are configuration variables that are not available in Bosh but are required for scaling in a kubernetes environment.
Job spec in Manifest | Kube Pod Container | Description |
---|---|---|
properties.quarks.bpm.processes[n].requests.cpu |
container.Resources.Requests.cpu |
Guaranteed CPU |
properties.quarks.bpm.processes[n].requests.memory |
container.Resources.Requests.memory |
Guaranteed memory |
Release image tags are immutable. The release image locations are comprised of multiple elements:
- docker registry URL
- organization and repository
- stemcell name and version
- fissile version
- the release name and version
Release image locations always have to be resolved in the context of an instance group/job because they depend on the stemcell that is being used.
A typical release image location looks could look like hub.docker.com/cfcontainerization/cflinuxfs3-release:opensuse-15.0-28.g837c5b3-30.263-7.0.0_233.gde0accd0-0.62.0
.
The different elements are taken from different places in the manifest. Given this excerpt from a BOSH deployment manifest:
stemcells:
- alias: default
os: opensuse-42.3
version: 28.g837c5b3-30.263-7.0.0_234.gcd7d1132
instance_groups:
- name: diego-cell
stemcell: default
jobs:
- name: cflinuxfs3-rootfs-setup
release: cflinuxfs3
releases:
- name: cflinuxfs3
version: 0.62.0
url: hub.docker.com/cfcontainerization
sha1: 6466c44827c3493645ca34b084e7c21de23272b4
stemcell:
os: opensuse-15.0
version: 28.g837c5b3-30.263-7.0.0_233.gde0accd0
The stemcell information (name, and stemcell and fissile version) are taken from the stemcells
entry that matches the instance group's stemcell alias. The registry URL including the organization, the release name, and the version come from the releases
entry that's referenced from the job.
Note:
Releases can optionally specify a separate
stemcell
section, in which case the information from the instance group stemcell is overridden.
For each Explicit BOSH Variable (with a definition in the variables
section in the deployment manifest), the cf-operator creates an QuarksSecret
.
The QuarksSecret
is meant to generate the value required by the variable.
The name of the QuarksSecret
is calculated like this:
var-<VARIABLE_NAME>
The name of the final generated Secret
(the secretName
key of the QuarksSecret
) is calculated the same way.
The user can also specify overrides for generated secrets using the vars
key in the BOSHDeployment
spec.
These map explicit variable names to secret names.
Each secret must contain the usual keys used in explicit variables (see here for more details).
You can find an example here.
BOSH Services are converted to QuarksStatefulSets
and Services
.
BOSH Errands are converted to QuarksJobs
with trigger.strategy: manually
.
BOSH Auto-Errands (supported only by the operator) are converted to QuarksJobs
with trigger.strategy: once
.
QuarksStatefulSets
support AZs. You can learn more about this in the docs.
QuarksStatefulSets
support active/passive pod replicas. You can learn more about this in the docs.
We use an emptyDir
for ephemeral disks. You can learn more from the official docs.
If the setting bosh.settings.agent.ephemeralAsPVC
is set to true
, the operator will use PersistentVolumeClaims
instead.
This option should be used for jobs that make assumptions about ephemeral disks (like this garden job) mounts, or the size limit for the disk is critical.
If vm_resources.ephemeral_disk_size
is set, the PVC size will be set to this. If it's not set, the operator will try to use persistent_disk
as a size. If this is not set either, the operator will use a default of 10GB
.
Providing credentials for private registries is supported by Kubernetes. Please read the official docs.
BOSH makes use of errands, which are manually triggered. We support manual triggers - you can learn more in the QuarksJob docs.
When the deployment manifest declares health check information for jobs, via the quarks
section, we configure those in Kubernetes.
The probes are defined per BPM process.
Example:
instance_groups:
- name: "api-az1"
process.
properties:
quarks:
run:
healthcheck:
bpm-process-name:
readiness:
liveness:
Both keys contain information that should is used as-is for the container that matches the process name.
When a BOSH deployment manifest declares persistent disks on instance groups, we provide a persistent volume to the containers of a pod in /var/vcap/store
. You can learn more about BOSH Persistent Disks in the BOSH Official Docs.
These volumes are mounted on each container that's part of the instance group.
The implementation uses the default storage class if not specified using the persistent_disk_type
key in the manifest.
BOSH deployment manifests support two different types of variables, implicit and explicit ones.
"Explicit" variables are declared in the variables
section of the manifest and are generated automatically before the interpolation step.
"Implicit" variables just appear in the document within double parentheses without any declaration. These variables have to be provided by the user prior to creating the BOSH deployment as a secret. The secret name has to follow the scheme
var-<variable-name>
By default the variable content is expected in the value
key, e.g.
((system-domain))
---
apiVersion: v1
kind: Secret
metadata:
name: var-system-domain
type: Opaque
stringData:
value: example.com
It is also possible to specify the key name after a /
separator, e.g.
((ssl/ca))
---
apiVersion: v1
kind: Secret
metadata:
name: var-ssl
type: Opaque
stringData:
ca: ...
cert: ...
key: ...
Similar to what can be achieved in SCF v1, with the patches scripts, the cf-operator
is able to support this behaviour. Basically, it allows the user to execute a custom script during runtime of the job container for a specific instance_group
. Because patching during runtime is always a great feature to have, for a variety of reasons, users can specify this via the quarks.pre_render_scripts
key.
Keep it mind, that the script should belong to a type, to avoid running all scripts as a whole. Currently supported types are:
quarks.pre_render_scripts.bpm
.quarks.pre_render_scripts.ig_resolver
quarks.pre_render_scripts.jobs
This allows you to run anything, by specifying a list of commands/scripts to execute. For example:
instance_groups:
- name: redis-slave
instances: 2
lifecycle: errand
azs: [z1, z2]
jobs:
- name: redis-server
release: redis
properties:
quarks:
pre_render_scripts:
bpm:
- |
touch /tmp
The BOSH DNS addon is implemented using a separate DNS server (coredns). For each BOSHDeployment, which enables this addon, an additional DNS server is created within the namespace.
This DNS server rewrites all BOSH dns requests to standard k8s queries (e.g. api.service.cf.internal
-> api.<namespace>.svc.cluster.local
) and forwards them to the k8s DNS server.
All pods created from the BOSHDeployment are configured to use this DNS server.
Additionally the headless services are created on base of the specified aliases. The following alias
- domain: blobstore.service.cf.internal
targets:
- deployment: cf
domain: bosh
instance_group: singleton-blobstore
network: default
query: '*'
will create a headless service with the name blobstore
instead of singleton-blobstore
.
For migration purpose, the DNS service does also a rewrite of all previous headless service names
(e.g. singleton-blobstore
is rewritten to blobstore.<namespace>.svc.cluster.local
).
After creating a BOSHDeployment
named nats-deployment
, with one Instance Group, the following resources should exist:
-
BOSHDeployment
nats-deployment
-
QuarksJob
ig dm
-
QuarksSecret
var-nats-password
-
QuarksStatefulSet
nats
-
Secrets
bpm.nats-v1 ig-resolved.nats-v1 var-nats-password with-ops desired-manifest-v1
-
StatefulSets
nats
-
Pods
nats-0 nats-1
-
Services
nats nats-0 nats-1