Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add number of NIC replicas to telemetry data #5245

Merged
merged 3 commits into from
Mar 14, 2024
Merged

Conversation

jjngx
Copy link
Contributor

@jjngx jjngx commented Mar 13, 2024

Proposed changes

This PR adds NIC replica count information to the telemetry data.

commands

13187  k scale -n nginx-ingress deployment nginx-ingress --replicas=1
13188  k scale -n nginx-ingress deployment nginx-ingress --replicas=3
13189  k scale -n nginx-ingress deployment nginx-ingress --replicas=6
13190  k scale -n nginx-ingress deployment nginx-ingress --replicas=9
13191  k scale -n nginx-ingress deployment nginx-ingress --replicas=1

NIC log:

I0313 17:37:20.342002       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:1}}
I0313 17:37:25.537657       1 collector.go:91] Collecting telemetry data
I0313 17:37:25.555606       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:3}}
I0313 17:37:30.664850       1 collector.go:91] Collecting telemetry data
I0313 17:37:30.686694       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:3}}
I0313 17:37:35.749217       1 collector.go:91] Collecting telemetry data
I0313 17:37:35.780244       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:3}}
I0313 17:37:41.250974       1 collector.go:91] Collecting telemetry data
I0313 17:37:41.281874       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:6}}
I0313 17:37:46.482953       1 collector.go:91] Collecting telemetry data
I0313 17:37:46.520021       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:6}}
I0313 17:37:51.660169       1 collector.go:91] Collecting telemetry data
I0313 17:37:51.696565       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:9}}
I0313 17:37:56.975177       1 collector.go:91] Collecting telemetry data
I0313 17:37:56.980994       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:c232007d-76bf-48a5-8dc2-60e017467f0f ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:1}}

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@jjngx jjngx requested a review from a team as a code owner March 13, 2024 17:50
@github-actions github-actions bot added enhancement Pull requests for new features/feature enhancements helm_chart Pull requests that update the Helm Chart labels Mar 13, 2024
@jjngx jjngx added this to the v3.5.0 milestone Mar 13, 2024
@jjngx jjngx linked an issue Mar 13, 2024 that may be closed by this pull request
Copy link
Contributor

@shaun-nx shaun-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test with this branch using Daemonset as the deployment kind.
In this case, Replicas is 0 since we don't have any ReplicaSets to pull from.
So we will need some way to know if NIC is deployed as a Daemonset or a Deployment for this.

Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:a5419913-6bca-47b0-b7b8-e6f28ff5a1be ClusterVersion:v1.27.4+k3s1 ClusterPlatform:k3s InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:1 VirtualServerRoutes:0 TransportServers:0 Replicas:0}}

One approach we can take is to check the CurrentNumberScheduled field for a Daemonset

Here is a code snippet of what I tried:

// ReplicaCount returns a number of running NIC replicas.
func (c *Collector) ReplicaCount(ctx context.Context) (int, error) {
	var rs *appsV1.ReplicaSet
	var ds *appsV1.DaemonSet
	var replicas int
	pod, err := c.Config.K8sClientReader.CoreV1().Pods(c.Config.PodNSName.Namespace).Get(ctx, c.Config.PodNSName.Name, metaV1.GetOptions{})
	if err != nil {
		return 0, err
	}
	podRef := pod.GetOwnerReferences()
	if len(podRef) != 1 {
		return 0, fmt.Errorf("expected pod owner reference to be 1, got %d", len(podRef))
	}

	// Switch based on the owner reference
	switch podRef[0].Kind {
	case "ReplicaSet":
		rs, err = c.Config.K8sClientReader.AppsV1().ReplicaSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(*rs.Spec.Replicas)
	case "DaemonSet":
		ds, err = c.Config.K8sClientReader.AppsV1().DaemonSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(ds.Status.CurrentNumberScheduled)
	default:
		return 0, fmt.Errorf("expected pod owner reference to be either ReplicaSet or DaemonSet, got %s", pod.OwnerReferences[0].Kind)
	}
	return replicas, nil
}

If we do go with this approach, we will need to add more permissions to our RBAC:

E0313 18:44:02.647830       1 collector.go:94] Error collecting telemetry data: daemonsets.apps "test-nginx-ingress-controller" is forbidden: User "system:serviceaccount:default:test-nginx-ingress" cannot get resource "daemonsets" in API group "apps" in the namespace "default"

@jjngx
Copy link
Contributor Author

jjngx commented Mar 13, 2024

NIC logs on multinode cluster, DaemonSet

I0313 20:43:27.718603       1 collector.go:91] Collecting telemetry data
I0313 20:43:27.732759       1 collector.go:119] Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:2bbabd21-3a0f-48cd-9e02-eb3f1fa2febd ClusterVersion:v1.29.2 ClusterPlatform:kind InstallationID: ClusterNodeCount:3} NICResourceCounts:{VirtualServers:0 VirtualServerRoutes:0 TransportServers:0 Replicas:2}}

@jjngx jjngx requested a review from shaun-nx March 13, 2024 20:51
@jjngx
Copy link
Contributor Author

jjngx commented Mar 13, 2024

I did a test with this branch using Daemonset as the deployment kind. In this case, Replicas is 0 since we don't have any ReplicaSets to pull from. So we will need some way to know if NIC is deployed as a Daemonset or a Deployment for this.

Exported telemetry data: {Data:{ProjectName:NIC ProjectVersion:3.5.0-SNAPSHOT ProjectArchitecture:amd64 ClusterID:a5419913-6bca-47b0-b7b8-e6f28ff5a1be ClusterVersion:v1.27.4+k3s1 ClusterPlatform:k3s InstallationID: ClusterNodeCount:1} NICResourceCounts:{VirtualServers:1 VirtualServerRoutes:0 TransportServers:0 Replicas:0}}

One approach we can take is to check the CurrentNumberScheduled field for a Daemonset

Here is a code snippet of what I tried:

// ReplicaCount returns a number of running NIC replicas.
func (c *Collector) ReplicaCount(ctx context.Context) (int, error) {
	var rs *appsV1.ReplicaSet
	var ds *appsV1.DaemonSet
	var replicas int
	pod, err := c.Config.K8sClientReader.CoreV1().Pods(c.Config.PodNSName.Namespace).Get(ctx, c.Config.PodNSName.Name, metaV1.GetOptions{})
	if err != nil {
		return 0, err
	}
	podRef := pod.GetOwnerReferences()
	if len(podRef) != 1 {
		return 0, fmt.Errorf("expected pod owner reference to be 1, got %d", len(podRef))
	}

	// Switch based on the owner reference
	switch podRef[0].Kind {
	case "ReplicaSet":
		rs, err = c.Config.K8sClientReader.AppsV1().ReplicaSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(*rs.Spec.Replicas)
	case "DaemonSet":
		ds, err = c.Config.K8sClientReader.AppsV1().DaemonSets(c.Config.PodNSName.Namespace).Get(ctx, podRef[0].Name, metaV1.GetOptions{})
		if err != nil {
			return 0, err
		}
		replicas = int(ds.Status.CurrentNumberScheduled)
	default:
		return 0, fmt.Errorf("expected pod owner reference to be either ReplicaSet or DaemonSet, got %s", pod.OwnerReferences[0].Kind)
	}
	return replicas, nil
}

If we do go with this approach, we will need to add more permissions to our RBAC:

E0313 18:44:02.647830       1 collector.go:94] Error collecting telemetry data: daemonsets.apps "test-nginx-ingress-controller" is forbidden: User "system:serviceaccount:default:test-nginx-ingress" cannot get resource "daemonsets" in API group "apps" in the namespace "default"

Thanks for catching this @shaun-nx ! Added DaemonSet to the telemetry.

@jjngx jjngx merged commit 4533e23 into main Mar 14, 2024
77 checks passed
@jjngx jjngx deleted the feat/tel-nic-repolicas branch March 14, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Pull requests for new features/feature enhancements helm_chart Pull requests that update the Helm Chart
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect number of deployment/daemonset replicas of NIC
3 participants