Skip to content

BROADSoftware/pvdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pvdf

TODO

  • Fix 'No topolvm storage class' (Update topolvm annotation token)
  • Remove the test on pvscanner (Must be in pvdf-system namespace). Instead add some timestamp to information
  • Or refactor the logic. (Also, insert in some monitoring stuff)

Overview

pvdf stand for 'PersistentVolume Disk Free'. The idea is to provide a quick an useful view of all disk usage on a Kubernetes cluster hosting some PersistentVolume.

Here is a sample output:

$ pvdf pv
NAMESPACE	NODE	PV NAME			POD NAME		REQ.	STORAGE CLASS	SIZE	FREE	%USED
gha-1			datalake1					50Gi			???	???	???
gha-1			datalake1-pv		gha2posix-161220..nj	50Gi			49Gi	25Gi	48%
kluster1	s3	pvc-335eb1b9-965..be	kluster1-kafka-0	20Gi	topolvm-ssd	19Gi	18Gi	6%
kluster1	s2	pvc-97967d6f-393..08	kluster1-kafka-1	20Gi	topolvm-ssd	19Gi	18Gi	6%
kluster1	s1	pvc-8ac08ff1-ef7..f4	kluster1-kafka-2	20Gi	topolvm-ssd	19Gi	18Gi	6%
kluster1	s3	pvc-e5e02f78-dc0..ea	kluster1-zookeeper-0	2Gi	topolvm-ssd	2036Mi	2003Mi	1%
kluster1	s2	pvc-07dbe115-864..91	kluster1-zookeeper-1	2Gi	topolvm-ssd	2036Mi	2003Mi	1%
kluster1	s1	pvc-c5f92adb-f0e..af	kluster1-zookeeper-2	2Gi	topolvm-ssd	2036Mi	2003Mi	1%
minio2		s3	pvc-15bd9536-2f1..91	minio2-zone-0-0		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
minio2		s3	pvc-b9ac7ad4-ca3..18	minio2-zone-0-0		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
minio2		s2	pvc-1960abb5-25e..2b	minio2-zone-0-1		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
minio2		s2	pvc-254d7df7-75e..87	minio2-zone-0-1		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
minio2		s1	pvc-231f2a2f-119..fd	minio2-zone-0-2		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
minio2		s1	pvc-7f1a5c31-ae2..32	minio2-zone-0-2		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
minio2		s3	pvc-33b98b30-8f6..c9	minio2-zone-0-3		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
minio2		s3	pvc-b820301f-167..3b	minio2-zone-0-3		10Gi	topolvm-hdd	10220Mi	2473Mi	75%
prometheus	s1	pvc-c97bf74c-b7b..c6	prometheus-prome..-0	10Gi	topolvm-hdd	10220Mi	0	100%

pvdf will lookup PersistentVolume associated information by first looking up linked PVC (PersistentVoumeClaim). From there, it can find associated namespace and Pod, if any.

Most of the columns meaning are obvious. Just a word on the REQ. one. It is the requested size of the PVC. While the SIZE column is the effective allocated size of the volume.

One can see on this sample we have a disk full on prometheus storage.

On top of that, pvdf integrate some special handling for volumes generated by the CSI plugin Topolvm. More on that below

Installation

pvdf is made of two components: A Daemonset, which regulary scan disk usage and store them as annotation, and a client, which collect these informations to present them in a user friendly way.

Daemonset installation

To install the pvdf daemonset (pvscanner), you need an account with full admin privileges. Then:

kubectl apply -f https://github.com/BROADSoftware/pvdf/releases/download/v0.2.0/deploy.yaml

Where v0.2.0 may be replaced by a later version

After few times, You can check the correct deployment:

kubectl -n pvdf-system get pods
NAME              READY   STATUS    RESTARTS   AGE
pvscanner-hgpww   1/1     Running   0          87m
pvscanner-lrrfd   1/1     Running   0          87m
pvscanner-n7tng   1/1     Running   0          87m
pvscanner-phbnf   1/1     Running   0          87m
pvscanner-xmhc7   1/1     Running   0          87m

You should have one pod per node (Except the ones of the Control Plane).

Client installation

Several client implementation are provided depending on your architecture. Below is an installation example for Linux:

$ cd /tmp
$ wget https://github.com/BROADSoftware/pvdf/releases/download/v0.2.0/pvdf_0.2.0_Linux_x86_64.tar.gz
$ tar xvzf pvdf_0.2.0_Linux_x86_64.tar.gz
$ sudo mv pvdf /usr/local/bin

One can also install as a kubectl extension. Just replace the last command by:

$ sudo mv pvdf /usr/local/bin/kubectl-pvdf

Then, you will be able to use it by:

kubectl pvdf ...

Usage

The pvdf command has several subcommands. The one we used to have the PV reports described at the beginning is pv

$ pvdf pv
NAMESPACE	NODE	PV NAME			POD NAME		REQ.	STORAGE CLASS	SIZE	FREE	%USED
gha-1			datalake1					50Gi			???	???	???
gha-1			datalake1-pv		gha2posix-161220..dj	50Gi			49Gi	25Gi	48%
kluster1	s1	pvc-8ac08ff1-ef7..f4	kluster1-kafka-2	20Gi	topolvm-ssd	19Gi	19Gi	0%
....

You can see all the available commands and options by:

$ pvdf help
A PV usage display tool

Usage:
pvdf [command]

Available Commands:
help        Help about any command
pv          List persistentVolumes and associated usage
topolvm     List Topolvm deviceClass per node
version     Display current version

Flags:
-f, --format string       Output format (text or json) (default "text")
-h, --help                help for pvdf
-k, --kubeconfig string   kubeconfig file
-j, --logJson             Logs in JSON
-l, --logLevel string     Log level (default "INFO")
-u, --unit string         Unit for storage values display (default "A")

Use "pvdf [command] --help" for more information about a command.

The other main command is topolvm, to be used if this kubernetes CSI plugin is installed. More info here

$ pvdf topolvm --unit Gi
STORAGE CLASS	DEVICE CLASS	FSTYPE	NODE	SIZE	FREE	%USED
topolvm-hdd	hdd		xfs	s1	49Gi	19Gi	60%
topolvm-hdd	hdd		xfs	s2	49Gi	29Gi	40%
topolvm-hdd	hdd		xfs	s3	49Gi	9Gi	80%
topolvm-nvme	nvme		xfs	s1	99Gi	99Gi	0%
topolvm-nvme	nvme		xfs	s2	99Gi	99Gi	0%
topolvm-nvme	nvme		xfs	s3	99Gi	99Gi	0%
topolvm-ssd	ssd		xfs	s1	79Gi	47Gi	40%
topolvm-ssd	ssd		xfs	s2	79Gi	47Gi	40%
topolvm-ssd	ssd		xfs	s3	79Gi	37Gi	52%

Topolvm rely on linux LVM. It creates one LVM VolumeGroup per StorageClass/DeviceClass for each nodes.

Then it allocate a LVM LogicalVolume per Kubernetes PersistentVolume.

This command displays all remaining space per StorageClass (Or VolumeGroup) for each node. This value is taken from a specific annotation set by Topolvm on the Node kubernetes ressource (Topolvm will use this value for scheduling)

kubeconfig

As most of the well-educated kubernetes commands, pvdf will lookup a kubeconfig file in the following order:

  • The --kubconfig option value.
  • The $KUBECONFIG environment variable
  • The ~/.kube/config file

Unit displaying

By default, pvdf adjust the unit of each storage value to be 'human readable'.

One can specify explicitly the unit:

$ pvdf pv --unit G
NAMESPACE	NODE	PV NAME			POD NAME		REQ.	STORAGE CLASS	SIZE	FREE	%USED
gha-1			datalake1					50Gi			???	???	???
gha-1			datalake1-pv		gha2posix-161220..k2	50Gi			53G	27G	48%
kluster1	s3	pvc-335eb1b9-965..be	kluster1-kafka-0	20Gi	topolvm-ssd	21G	20G	6%
kluster1	s2	pvc-97967d6f-393..08	kluster1-kafka-1	20Gi	topolvm-ssd	21G	20G	6%
kluster1	s1	pvc-8ac08ff1-ef7..f4	kluster1-kafka-2	20Gi	topolvm-ssd	21G	20G	6%
kluster1	s3	pvc-e5e02f78-dc0..ea	kluster1-zookeeper-0	2Gi	topolvm-ssd	2G	2G	1%
kluster1	s2	pvc-07dbe115-864..91	kluster1-zookeeper-1	2Gi	topolvm-ssd	2G	2G	1%

or:

$ pvdf pv --unit Mi
NAMESPACE	NODE	PV NAME			POD NAME		REQ.	STORAGE CLASS	SIZE	FREE	%USED
gha-1			datalake1					50Gi			???	???	???
gha-1			datalake1-pv		gha2posix-161220..k2	50Gi			51123Mi	26206Mi	48%
kluster1	s3	pvc-335eb1b9-965..be	kluster1-kafka-0	20Gi	topolvm-ssd	20450Mi	19094Mi	6%
kluster1	s2	pvc-97967d6f-393..08	kluster1-kafka-1	20Gi	topolvm-ssd	20450Mi	19094Mi	6%
kluster1	s1	pvc-8ac08ff1-ef7..f4	kluster1-kafka-2	20Gi	topolvm-ssd	20450Mi	19094Mi	6%
kluster1	s3	pvc-e5e02f78-dc0..ea	kluster1-zookeeper-0	2Gi	topolvm-ssd	2036Mi	2003Mi	1%
kluster1	s2	pvc-07dbe115-864..91	kluster1-zookeeper-1	2Gi	topolvm-ssd	2036Mi	2003Mi	1%

Unit can be expressed in B, K, M, G, T, P, Ki, Mi, Gi, Ti, Pi

Refer to wikipedia for more information.

Note than the REQ. column is not impacted by this unit option, as it is in fact the literal value of the PersistentVolumeClaim manifest.

json output

The output can also be provided in JSON form, using the --format json option.

As this output is not formatted, you can pipe the result in a tool like jq

$ pvdf pv --format json | jq
[
  {
    "name": "datalake1",
    "namespace": "gha-1",
    "node": "",
    "capacity": "50Gi",
    "pod": "",
    "storageclass": "",
    "free": -1,
    "size": -1,
    "usedpercent": -1
  },
  {
    "name": "datalake1-pv",
    "namespace": "gha-1",
    "node": "",
    "capacity": "50Gi",
    "pod": "gha2posix-1612206000-n6ggq",
    "storageclass": "",
    "free": 27478982656,
    "size": 53606350848,
    "usedpercent": 48
  },
  .....

Note than all storage values are expressed in bytes in this JSON form.

Precision

Please take note of the following regarding storage values accuracy:

  • When using pvdf command, you read values which was grabbed by the last scan of pvscanner. This scan occurs by default every 60s. So the displayed value may be up to 60 sec late.
  • Although displayed as Bytes, value precision is 1MiB. See architecture below for more information.

Architecture

pvdf is made of two components: pvscanner and the pvdf command

pvscanner

This is a daemonset which lookup all mounted volumes related to a PersistentVolume to perform an fstats unix system call on each. Then it stores the resulting values in these annotations:

  • pvscanner.pvdf.broadsoftware.com/free_mib
  • pvscanner.pvdf.broadsoftware.com/size_mib

This in the corresponding PersistentVolume.

In order to avoid overloading the API server, these annotations are only updated if they change. Also, values are expressed in MiB to lower the number of update operation.

pvscanner also lookup the nodes hosting Topolvm storage in order to provide information about the VolumeGroups size. This value is stored in the size.topolvm.pvscanner.pvdf.broadsoftware.com/<deviceClass> annotations of each node.

A deviceClass is a concept of Topolvm, which map to a VolumeGroup on each node.

pvdf command

This is the user command, which lookup all annotation described above to provide a user-friendly output. (Or a json output to store them in some other subsystems)