feat: status check for config-connector #6766

gsquared94 · 2021-10-25T16:58:50Z

Description

This PR implements skaffold's status checker for config connector resources.

Background:
Status check currently works for only deployments and standalone pods. For either type, we poll for the latest status (running kubectl rollout deployment <foo> for deployments, and kubectl get pods ... for standalone pods). We also check the status of individual pods and containers in the deployment.

For config connector resources:

Config connector resources are created from custom resource definitions (CRDs) and there's no direct command to invoke to get their status. However, they all expose a Ready condition that can be waited on.

Testing instructions

Setup config connector following their guide.
Clone https://github.com/gsquared94/config-connector-with-skaffold
In the directory pubsub, run skaffold dev against the k8s cluster with config connector setup.
You should see the status check happening for the deployed resources:

Starting deploy...
 - service.serviceusage.cnrm.cloud.google.com/pubsub.googleapis.com created
 - pubsubtopic.pubsub.cnrm.cloud.google.com/random-xxljslierjlj-sample created
Waiting for deployments to stabilize...
 - config-connector-resource/pubsub.cnrm.cloud.google.com/v1beta1, Kind=PubSubTopic, Name=random-xxljslierjlj-sample is ready. [1/2 deployment(s) still pending]
 - config-connector-resource/serviceusage.cnrm.cloud.google.com/v1beta1, Kind=Service, Name=pubsub.googleapis.com is ready.
Deployments stabilized in 2.576 seconds
Press Ctrl+C to exit

In the directory failing run skaffold dev. This should cause the status check to fail

Starting deploy...
 - storagebucket.storage.cnrm.cloud.google.com/random-xxljslierjlj-sample created
Waiting for deployments to stabilize...
 - config-control:config-connector-resource/storage.cnrm.cloud.google.com/v1beta1, Kind=StorageBucket, Name=random-xxljslierjlj-sample: Update call failed: error applying desired state: summary: googleapi: Error 400: The storage class you specified is not valid., invalid
    - config-control:StorageBucket/random-xxljslierjlj-sample: Update call failed: error applying desired state: summary: googleapi: Error 400: The storage class you specified is not valid., invalid
 - config-control:config-connector-resource/storage.cnrm.cloud.google.com/v1beta1, Kind=StorageBucket, Name=random-xxljslierjlj-sample failed. Error: Update call failed: error applying desired state: summary: googleapi: Error 400: The storage class you specified is not valid., invalid.
Cleaning up...
 - storagebucket.storage.cnrm.cloud.google.com "random-xxljslierjlj-sample" deleted
1/1 deployment(s) failed

gsquared94 · 2021-10-25T17:24:42Z

pkg/skaffold/kubernetes/manifest/selector.go

+// See https://cloud.google.com/config-connector/docs/overview
+var ConfigConnectorResourceSelector = []GroupKindSelector{
+	// add preliminary support for config connector services; group name is currently in flux
+	&wildcardGroupKind{Group: regexp.MustCompile(`([[:alpha:]]+\.)+cnrm\.cloud\.google\.com`)},


@briandealwis for the purpose of testing I've removed the restriction of only allowing Service kinds. We can add that back, although if this PR is merged, IMO we can start allowing all resource kinds. WDYT?

Sounds good.

codecov · 2021-10-25T17:29:12Z

Codecov Report

Merging #6766 (70dafd2) into main (290280e) will decrease coverage by 1.10%.
The diff coverage is 60.86%.

@@            Coverage Diff             @@
##             main    #6766      +/-   ##
==========================================
- Coverage   70.48%   69.38%   -1.11%     
==========================================
  Files         515      540      +25     
  Lines       23150    24567    +1417     
==========================================
+ Hits        16317    17045     +728     
- Misses       5776     6389     +613     
- Partials     1057     1133      +76

Impacted Files	Coverage Δ
cmd/skaffold/app/cmd/flags.go	`89.00% <ø> (-1.82%)`	⬇️
cmd/skaffold/skaffold.go	`0.00% <ø> (ø)`
cmd/skaffold/app/cmd/lint.go	`52.94% <52.94%> (ø)`
cmd/skaffold/app/cmd/cmd.go	`70.49% <75.00%> (-0.57%)`	⬇️
cmd/skaffold/app/cmd/debug.go	`100.00% <100.00%> (ø)`
cmd/skaffold/app/cmd/runner.go	`64.17% <100.00%> (ø)`
.../skaffold/kubernetes/status/resource/deployment.go	`65.48% <0.00%> (-20.32%)`	⬇️
pkg/skaffold/initializer/build/builders.go	`42.85% <0.00%> (-17.15%)`	⬇️
pkg/skaffold/build/cluster/logs.go	`0.00% <0.00%> (-16.67%)`	⬇️
pkg/skaffold/event/v2/status_check.go	`85.45% <0.00%> (-14.55%)`	⬇️
... and 132 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15f88b6...70dafd2. Read the comment docs.

pkg/diag/validator/config_connector.go

briandealwis · 2021-10-25T17:36:04Z

pkg/diag/validator/config_connector.go

+		if result.Message == "" {
+			status.updateAE(proto.StatusCode_STATUSCHECK_CONFIG_CONNECTOR_IN_PROGRESS, result.Message)
+		} else {
+			// config connector status doesn't always correctly parse to failed, but shows InProgress with an error message


Is there a bug that can be referenced here?

pkg/diag/validator/config_connector.go

pkg/skaffold/kubernetes/status/status_check.go

pkg/diag/validator/config_connector.go

briandealwis · 2021-10-26T15:17:15Z

pkg/diag/validator/config_connector.go

+	if recentEvent == nil || recentEvent.Type == v1.EventTypeNormal {
+		return
+	}
+	// TODO: Add unique error codes for reasons


Why is it necessary to walk the events instead of just using the resource conditions?

This is something implemented for deployment type resources. I didn't think there's any harm continuing that pattern here. WDYT?

From my understanding, the events are there for debugging and diagnostics purposes, but the object status should be considered the truth.

I think processing the events risks reporting false positives and should not be used for this purpose:

The list of event types is not fixed and may have more added, including diagnostic or informational events. This code would mis-report such events as errors.

What if the last event is at odds with the actual resource status?

Is there a particular situation you've seen where the events were at odds with the resource status? If there's a specific sequence that you've seen that is a precursor for an error, that might make sense to report.

We could log something at debug level if the event type is not normal. Or even an info message and ask that they report an issue to the Skaffold issue tracker.

briandealwis · 2021-10-26T15:29:58Z

pkg/diag/validator/config_connector.go

+type configConnectorSelector struct {
+	client    kubernetes.Interface
+	dynClient dynamic.Interface
+	uResource unstructured.Unstructured
+}
+
+func NewConfigConnectorSelector(client kubernetes.Interface, dynClient dynamic.Interface, uResource unstructured.Unstructured) CustomResourceSelector {
+	return &configConnectorSelector{client: client, dynClient: dynClient, uResource: uResource}
+}
+
+func (c *configConnectorSelector) Select(ctx context.Context, namespace string, opts metav1.ListOptions) (*unstructured.UnstructuredList, error) {
+	_, r, err := util.GroupVersionResource(c.client.Discovery(), c.uResource.GroupVersionKind())
+	if err != nil {
+		return nil, fmt.Errorf("failed to query config connector resources: %w", err)
+	}
+	resList, err := c.dynClient.Resource(r).Namespace(namespace).List(ctx, opts)
+	if err != nil {
+		return nil, fmt.Errorf("failed to query config connector resources: %w", err)
+	}
+	return resList, nil
+}


I don't understand the purpose of this code: we create a selector for a resource that returns the list of other resources with this same type and name — which should be uResource, right?

Is this to fetch the latest copy of the uResource?

yes, it's to fetch the latest version. Also we poll on this selector so that if there are any new resources of this type that show up later and they are checked on that iteration (as against just fetching the list once and checking the status of known resources individually).

pkg/diag/validator/config_connector.go

yuwenma · 2021-10-26T17:14:09Z

I have a general question:

skaffold does not specifically tie itself to kubernetes which gives it more room to grow as a standalone CICD tooling. Do we really want it to deal with the kubernetes objects (GVK, unstructured) and mechanisms? If so, is that possible to reuse some existing functions to handle GVK, selector operations? I'm worried it may eventually hit many problems that other kubernetes tools have or have been resolved. @tejal29 @briandealwis

briandealwis · 2021-10-26T19:44:31Z

We're happy for pointers @yuwenma!

tejal29 · 2021-11-01T19:10:59Z

Thanks @yuwenma. Going to take a look at this in depth today.

gsquared94 · 2021-11-02T09:54:11Z

I have a general question:

skaffold does not specifically tie itself to kubernetes which gives it more room to grow as a standalone CICD tooling.
Do we really want it to deal with the kubernetes objects (GVK, unstructured) and mechanisms?

This has become the design philosophy now with the implementation of new deployers like docker deployer. However, the entire codebase already has references to kubernetes internals in deploy and status-check phases. We're already using the following packages in our go.mod file:

	k8s.io/api v0.21.3
	k8s.io/apimachinery v0.22.2
	k8s.io/client-go v0.21.3
	k8s.io/kubectl v0.21.3
	k8s.io/utils v0.0.0-20201110183641-67b214c5f920
	knative.dev/pkg v0.0.0-20201119170152-e5e30edc364a // indirect
	sigs.k8s.io/kustomize/kyaml v0.10.17
	sigs.k8s.io/yaml v1.2.0

The status-check phase currently only works for selected kubernetes resources. The Deployer interface however allows any randomly implemented deployer to define it's own implementation of a Monitor.

If so, is that possible to reuse some existing functions to handle GVK, selector operations? I'm worried it may eventually hit many problems that other kubernetes tools have or have been resolved. @tejal29 @briandealwis

we are using the kstatus library to parse the resource status and other client libraries and constructs. We're happy for pointers to other packages that you think could simplify our status-checker implementation.

pkg/diag/validator/config_connector.go

docs/content/en/docs/references/api/grpc.md

briandealwis · 2021-11-02T15:14:15Z

pkg/diag/validator/config_connector.go

+	if recentEvent == nil || recentEvent.Type == v1.EventTypeNormal {
+		return
+	}
+	// TODO: Add unique error codes for reasons


From my understanding, the events are there for debugging and diagnostics purposes, but the object status should be considered the truth.

I think processing the events risks reporting false positives and should not be used for this purpose:

The list of event types is not fixed and may have more added, including diagnostic or informational events. This code would mis-report such events as errors.

What if the last event is at odds with the actual resource status?

Is there a particular situation you've seen where the events were at odds with the resource status? If there's a specific sequence that you've seen that is a precursor for an error, that might make sense to report.

We could log something at debug level if the event type is not normal. Or even an info message and ask that they report an issue to the Skaffold issue tracker.

pkg/diag/validator/config_connector.go

briandealwis · 2021-11-03T15:07:32Z

pkg/diag/validator/config_connector.go

 	case kstatus.NotFoundStatus:
-		status.updateAE(proto.StatusCode_STATUSCHECK_CONFIG_CONNECTOR_NOT_FOUND, result.Message)
+		ae = proto.ActionableErr{ErrCode: proto.StatusCode_STATUSCHECK_CONFIG_CONNECTOR_NOT_FOUND, Message: result.Message}


I wonder how this would happen!

I don't actually think that we'd run into this. But I added it for completion's sake.

briandealwis

This is looking good. A few more comments.

pkg/diag/validator/config_connector.go

pkg/diag/validator/custom_resource.go

pkg/diag/validator/config_connector.go

briandealwis · 2021-11-03T15:48:19Z

pkg/skaffold/kubernetes/manifest/selector.go

+// See https://cloud.google.com/config-connector/docs/overview
+var ConfigConnectorResourceSelector = []GroupKindSelector{
+	// add preliminary support for config connector services; group name is currently in flux
+	&wildcardGroupKind{Group: regexp.MustCompile(`([[:alpha:]]+\.)+cnrm\.cloud\.google\.com`)},


Sounds good.

pkg/skaffold/kubernetes/manifest/selector.go

briandealwis · 2021-11-03T15:54:20Z

pkg/skaffold/kubernetes/manifest/filter.go

+)
+
+// Filter returns the manifest list filtered by the given selectors
+func (l *ManifestList) Filter(selectors []GroupKindSelector) (ManifestList, error) {


I like these functions, but they need tests.

working on UTs

briandealwis · 2021-11-03T15:55:58Z

pkg/skaffold/kubernetes/manifest/filter.go

+			return nil, fmt.Errorf("unmarshaling config: %w", err)
+		}
+		gvk := obj.GroupVersionKind()
+		for _, w := range selectors {


I do wonder if we should have separate interfaces for allowing resource selectors (taking an unstructured.Ubstructured), or GVK, or group+kind. That can be done separately.

pkg/skaffold/kubernetes/manifest/filter.go

feat: status check for config-connector

534fa35

gsquared94 requested review from briandealwis, tejal29 and nkubala October 25, 2021 16:58

gsquared94 requested review from yuwenma and a team as code owners October 25, 2021 16:58

pull-request-size bot added the size/XXL label Oct 25, 2021

google-cla bot added the cla: yes label Oct 25, 2021

gsquared94 commented Oct 25, 2021

View reviewed changes

gsquared94 added the kokoro:force-run forces a kokoro re-run on a PR label Oct 26, 2021

kokoro-team removed the kokoro:force-run forces a kokoro re-run on a PR label Oct 26, 2021

briandealwis reviewed Oct 26, 2021

View reviewed changes

yuwenma reviewed Oct 26, 2021

View reviewed changes

pkg/diag/validator/config_connector.go Outdated Show resolved Hide resolved

gsquared94 mentioned this pull request Oct 27, 2021

Skaffold should wait for Config Connector resources to be up to date before reporting success #6709

Closed

tejal29 assigned briandealwis and tejal29 Nov 1, 2021

explicitly specify interface

a312e63

gsquared94 added 2 commits November 2, 2021 15:25

address PR feedback

1be4c63

delete example folder

3c45275

gsquared94 requested review from briandealwis and yuwenma November 2, 2021 11:35

simplify configConnectorSelector

16e5f2a

briandealwis reviewed Nov 2, 2021

View reviewed changes

pkg/diag/validator/config_connector.go Outdated Show resolved Hide resolved

rename ConfigConnectorSelector to CustomResourceSelector

a4638dd

briandealwis reviewed Nov 2, 2021

View reviewed changes

gsquared94 added 3 commits November 2, 2021 21:38

update config_connector.go

12348cf

split long message

d1b4e77

add comment to newly added proto enums

1293769

gsquared94 requested a review from briandealwis November 3, 2021 14:57

briandealwis reviewed Nov 3, 2021

View reviewed changes

address PR review

2932758

tejal29 approved these changes Nov 3, 2021

View reviewed changes

briandealwis approved these changes Nov 3, 2021

View reviewed changes

add tests

70dafd2

gsquared94 enabled auto-merge (squash) November 4, 2021 15:40

gsquared94 merged commit 4a2dd8e into GoogleContainerTools:main Nov 4, 2021

gsquared94 added the area/status-check label Feb 25, 2022

jsok mentioned this pull request Mar 21, 2022

ConfigConnector CRD status checking #7207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: status check for config-connector #6766

feat: status check for config-connector #6766

gsquared94 commented Oct 25, 2021 •

edited

Loading

gsquared94 Oct 25, 2021

briandealwis Nov 3, 2021

codecov bot commented Oct 25, 2021 •

edited

Loading

briandealwis Oct 25, 2021

briandealwis Oct 26, 2021

gsquared94 Nov 2, 2021

briandealwis Nov 2, 2021

briandealwis Oct 26, 2021

gsquared94 Nov 2, 2021 •

edited

Loading

yuwenma commented Oct 26, 2021

briandealwis commented Oct 26, 2021

tejal29 commented Nov 1, 2021

gsquared94 commented Nov 2, 2021

briandealwis Nov 2, 2021

briandealwis Nov 3, 2021

gsquared94 Nov 3, 2021

briandealwis left a comment

briandealwis Nov 3, 2021

briandealwis Nov 3, 2021

gsquared94 Nov 3, 2021

briandealwis Nov 3, 2021

feat: status check for config-connector #6766

feat: status check for config-connector #6766

Conversation

gsquared94 commented Oct 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 25, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsquared94 Nov 2, 2021 • edited Loading

Choose a reason for hiding this comment

yuwenma commented Oct 26, 2021

briandealwis commented Oct 26, 2021

tejal29 commented Nov 1, 2021

gsquared94 commented Nov 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briandealwis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsquared94 commented Oct 25, 2021 •

edited

Loading

codecov bot commented Oct 25, 2021 •

edited

Loading

gsquared94 Nov 2, 2021 •

edited

Loading