Skip to content

kubescape/storage

Kubescape Storage

FOSSA Status

An Aggregated APIServer for the Kubescape internal storage services.

Note: go-get or vendor this package as k8s.io/sample-apiserver.

Purpose

The Kubescape Storage APIServer serves custom resources that Kubescape defines for its operation. These custom reources might store internal Kubescape configuration, scan artifacts, computed snapshots etc that help the entire Kubescape in-cluster solution operate.

Fetch sample-apiserver and its dependencies

Like the rest of Kubernetes, sample-apiserver has used godep and $GOPATH for years and is now adopting go 1.11 modules. While upstream mentions two alternative ways to go about fetching the sample repository and its dependencies, we recommend and primarily use only one: using native Go 1.11 modules and vendoring.

When using native Go 1.11 modules

When using go 1.11 modules (GO111MODULE=on), you first need to create the appropriate working directory:

mkdir ~/github.com/kubescape
cd ~/github.com/kubescape

Warning

Due to the specifics of the code generation script hack/update-codegen.sh, your working directory should always match your module path. That is, ~/github.com/kubescape/storage for this specific repo. If the directories don’t match and you store code in some other directory, you will write code in whatever directory you chose, but once you run codegen, it will generate the code only in ~/github.com/kubescape/storage.

Once you have a working directory set up, issue the following commands in your working directory.

git clone https://github.com/kubescape/storage.git
cd storage

Note that when you need to generate code then you will also need the code-generator repo to exist in an old-style location. One easy way to do this is to use the command go mod vendor to create and populate the vendor directory.

A Note on kubernetes/kubernetes

If you are developing Kubernetes according to https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md then you already have a copy of this demo in kubernetes/staging/src/k8s.io/sample-apiserver and its dependencies --- including the code generator --- are in usable locations.

Design changes

Compared to the upstream repository, the following changes have been made:

  • metadata are stored in a SQLite database
  • payload are stored in a virtual filesystem (afero) mapping to a directory

Database schema

erDiagram
   METADATA {
        string kind
        string namespace
        string name
        JSON metadata
    }
    RESOURCE only one to one METADATA : has
Loading

Metadata are stored in JSON format, and should be unmarshalled to the appropriate struct when needed.

GetList() operations should support pagination, we are using ROWID to sort the results and limit the number of rows returned. On subsequent calls, the client provides the last ROWID to get the next page.

Filesystem layout

The filesystem contains both the metadata database and the payload files.

graph LR
    root["/data/"] --> metadata.sq3
    root --> metadata.sq3-shm
    root --> metadata.sq3-wal
    root --> spdx["spdx.softwarecomposition.kubescape.io/"]
    spdx --> ap["applicationprofiles/"]
    ap --> aps["..."]
    spdx --> sbom["sbomsyft/"]
    sbom --> sboms["..."]
    spdx --> sbomf["sbomsyftfiltered/"]
    sbomf --> sbomfs["..."]
    spdx --> vuln["vulnerabilitymanifests/"]
    vuln --> ns1["namespace1/"]
    ns1 --> dp1["deployment-deployment1.g"]
    ns1 --> dp2["deployment-deployment2.g"]
    ns1 --> pod1["pod-pod1.g"]
    ns1 --> st1["statefulset-statefulset1.g"]
Loading

Payloads are stored in Gob format, and since they can be quite big, we are using direct I/O when possible to reduce memory allocations when unmarshalling them.

Normal Build and Deploy

Changes to the Types

If you change the API object type definitions in any of the pkg/apis/.../types.go files then you will need to update the files generated from the type definitions. To do this, first pull the dependencies and create the vendor directory. Once you vendored the dependencies, you will have the code generation scripts in your vendor directory. To make code generation, you need to make them executable:

chmod +x vendor/k8s.io/code-generator/*.sh

Now you’re all set to generate the code for changed types. Do this with:

hack/update-codegen.sh

If you see any errors regarding GOPATH, just provide it manually:

GOPATH=$(go env GOPATH) hack/update-codegen.sh

The code generation script will give you warnings about API rule violations. Don’t mind them. To address these warnings, add them to the exclusion list as show in the updated upstream repo.

Now it's time to generate the protobuf code:

docker buildx build --file build/protoc.Dockerfile --platform linux/amd64 --tag protoc --load .
docker run --rm -it -v "$(pwd):/work" protoc
mkdir -p github.com/kubescape/storage
ln -sf /work/pkg github.com/kubescape/storage/
/go/bin/go-to-protobuf --packages=github.com/kubescape/storage/pkg/apis/softwarecomposition/v1beta1 --go-header-file=./hack/boilerplate.go.txt --apimachinery-packages='-k8s.io/apimachinery/pkg/util/intstr,-k8s.io/apimachinery/pkg/api/resource,-k8s.io/apimachinery/pkg/runtime/schema,-k8s.io/apimachinery/pkg/runtime,-k8s.io/apimachinery/pkg/apis/meta/v1,-k8s.io/apimachinery/pkg/apis/meta/v1beta1,-k8s.io/api/core/v1,-k8s.io/api/rbac/v1' --proto-import=/go/src/k8s.io/kubernetes/staging/src/ --proto-import=/go/src/k8s.io/kubernetes/vendor

Once the code generation finishes successfully, you should be able to run tests and build the binary with no errors:

go build -v ./...
go test -v -failfast -count=1 ./...

Storage operations

During storage operations there are several opportunities to either reject the request or modify the stored object before it is written.

Each type of operation (Create/Update/Delete) has its own set of functions that will run in the lifecycle of the request.

These functions are declared in pkg/registry/softwarecomposition/<type>/strategy.go

Read more about each function and its use here

Authentication plugins

The normal build supports only a very spare selection of authentication methods. There is a much larger set available in https://github.com/kubernetes/client-go/tree/master/plugin/pkg/client/auth . If you want your server to support one of those, such as oidc, then add an import of the appropriate package to sample-apiserver/main.go. Here is an example:

import _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"

Alternatively you could add support for all of them, with an import like this:

import _ "k8s.io/client-go/plugin/pkg/client/auth"

Build the Binary

With storage as your current working directory, issue the following command:

make build

Build the Container Image

With storage as your current working directory, issue the following commands:

TAG=v1.2.3 make docker-build && TAG=v1.2.3 make docker-push

Take note that the Makefile targets use default values for the image tag and the Dockerfile path, so feel free to adjust them as environment variables as needed:

TAG=v1.2.3 IMAGE=quay.io/kubescape/storage make docker-build
TAG=v1.2.3 IMAGE=quay.io/kubescape/storage make docker-push

Deploy into a Kubernetes Cluster

Edit artifacts/example/deployment.yaml, updating the pod template's image reference to match what you pushed and setting the imagePullPolicy to something suitable.

If you’re running a Minikube cluster locally, build and tag an container image, use it in the artifacts/example/deployment.yaml, set imagePullPolicy: Never and this will let you use a local container image without having to push it to a container registry.

Then, make sure the appropriate namespace for the APIServer components exists:

kubectl apply -f artifacts/example/ns.yaml

Finally, create all the other Aggregated APIServer components:

kubectl apply -f artifacts/example

Running it stand-alone

During development it is helpful to run the Storage APIServer stand-alone, i.e. without a Kubernetes API server for authn/authz and without aggregation. This is possible, but needs a couple of flags, keys and certs as described below. You will still need some kubeconfig, e.g. ~/.kube/config, but the Kubernetes cluster is not used for authn/z. A minikube or hack/local-up-cluster.sh cluster will work.

Instead of trusting the aggregator inside kube-apiserver, the described setup uses local client certificate based X.509 authentication and authorization. This means that the client certificate is trusted by a CA and the passed certificate contains the group membership to the system:masters group. As we disable delegated authorization with --authorization-skip-lookup, only this superuser group is authorized.

  1. First we need a CA to later sign the client certificate:

    openssl req -nodes -new -x509 -keyout ca.key -out ca.crt
  2. Then we create a client cert signed by this CA for the user development in the superuser group system:masters:

    openssl req -out client.csr -new -newkey rsa:4096 -nodes -keyout client.key -subj "/CN=development/O=system:masters"
    openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt
  3. As curl requires client certificates in p12 format with password, do the conversion:

    openssl pkcs12 -export -in ./client.crt -inkey ./client.key -out client.p12 -passout pass:password
  4. With these keys and certs in-place, we start the server:

    etcd &
    sample-apiserver --secure-port 8443 --etcd-servers http://127.0.0.1:2379 --v=7 \
       --client-ca-file ca.crt \
       --kubeconfig ~/.kube/config \
       --authentication-kubeconfig ~/.kube/config \
       --authorization-kubeconfig ~/.kube/config

    The first kubeconfig is used for the shared informers to access Kubernetes resources. The second kubeconfig passed to --authentication-kubeconfig is used to satisfy the delegated authenticator. The third kubeconfig passed to --authorized-kubeconfig is used to satisfy the delegated authorizer. Neither the authenticator, nor the authorizer will actually be used: due to --client-ca-file, our development X.509 certificate is accepted and authenticates us as system:masters member. system:masters is the superuser group such that delegated authorization is skipped.

  5. Use curl to access the server using the client certificate in p12 format for authentication:

    curl -fv -k --cert-type P12 --cert client.p12:password \
       https://localhost:8443/apis/wardle.example.com/v1alpha1/namespaces/default/flunders

    Or use wget:

    wget -O- --no-check-certificate \
       --certificate client.crt --private-key client.key \
       https://localhost:8443/apis/wardle.example.com/v1alpha1/namespaces/default/flunders

    Note: Recent OSX versions broke client certs with curl. On Mac try brew install httpie and then:

    http --verify=no --cert client.crt --cert-key client.key \
       https://localhost:8443/apis/wardle.example.com/v1alpha1/namespaces/default/flunders

Changelog

Kubescape Storage changes are tracked on the release page.

Profiling

To profile the Storage APIServer, you can use the --profiling flag (enabled by default). This will expose the profiling endpoints on the /debug/pprof path.

To access the profiling endpoints, you have to port-forward the Storage APIServer pod and generate a token:

kubectl port-forward -n kubescape svc/storage 8443:443
kubectl create serviceaccount k8sadmin -n kube-system
kubectl create clusterrolebinding k8sadmin --clusterrole=cluster-admin --serviceaccount=kube-system:k8sadmin
TOKEN=$(kubectl create token -n kube-system k8sadmin)
curl -k https://localhost:8443/debug/pprof/heap -H "Authorization: Bearer $TOKEN" > heap.out

You can also use the following script to generate a heap dump every second:

#!/usr/bin/env bash
while true; do
  timestamp=$(date '+%Y-%m-%d_%H-%M-%S')
  curl -k https://localhost:8443/debug/pprof/heap -H "Authorization: Bearer $TOKEN" > "$timestamp"_heap.out
  sleep 1
done