-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: status check for config-connector #6766
Conversation
// See https://cloud.google.com/config-connector/docs/overview | ||
var ConfigConnectorResourceSelector = []GroupKindSelector{ | ||
// add preliminary support for config connector services; group name is currently in flux | ||
&wildcardGroupKind{Group: regexp.MustCompile(`([[:alpha:]]+\.)+cnrm\.cloud\.google\.com`)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briandealwis for the purpose of testing I've removed the restriction of only allowing Service
kinds. We can add that back, although if this PR is merged, IMO we can start allowing all resource kinds. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Codecov Report
@@ Coverage Diff @@
## main #6766 +/- ##
==========================================
- Coverage 70.48% 69.38% -1.11%
==========================================
Files 515 540 +25
Lines 23150 24567 +1417
==========================================
+ Hits 16317 17045 +728
- Misses 5776 6389 +613
- Partials 1057 1133 +76
Continue to review full report at Codecov.
|
if result.Message == "" { | ||
status.updateAE(proto.StatusCode_STATUSCHECK_CONFIG_CONNECTOR_IN_PROGRESS, result.Message) | ||
} else { | ||
// config connector status doesn't always correctly parse to failed, but shows InProgress with an error message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a bug that can be referenced here?
if recentEvent == nil || recentEvent.Type == v1.EventTypeNormal { | ||
return | ||
} | ||
// TODO: Add unique error codes for reasons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to walk the events instead of just using the resource conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something implemented for deployment
type resources. I didn't think there's any harm continuing that pattern here. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the events are there for debugging and diagnostics purposes, but the object status should be considered the truth.
I think processing the events risks reporting false positives and should not be used for this purpose:
- The list of event types is not fixed and may have more added, including diagnostic or informational events. This code would mis-report such events as errors.
- What if the last event is at odds with the actual resource status?
Is there a particular situation you've seen where the events were at odds with the resource status? If there's a specific sequence that you've seen that is a precursor for an error, that might make sense to report.
We could log something at debug level if the event type is not normal. Or even an info message and ask that they report an issue to the Skaffold issue tracker.
type configConnectorSelector struct { | ||
client kubernetes.Interface | ||
dynClient dynamic.Interface | ||
uResource unstructured.Unstructured | ||
} | ||
|
||
func NewConfigConnectorSelector(client kubernetes.Interface, dynClient dynamic.Interface, uResource unstructured.Unstructured) CustomResourceSelector { | ||
return &configConnectorSelector{client: client, dynClient: dynClient, uResource: uResource} | ||
} | ||
|
||
func (c *configConnectorSelector) Select(ctx context.Context, namespace string, opts metav1.ListOptions) (*unstructured.UnstructuredList, error) { | ||
_, r, err := util.GroupVersionResource(c.client.Discovery(), c.uResource.GroupVersionKind()) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to query config connector resources: %w", err) | ||
} | ||
resList, err := c.dynClient.Resource(r).Namespace(namespace).List(ctx, opts) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to query config connector resources: %w", err) | ||
} | ||
return resList, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of this code: we create a selector for a resource that returns the list of other resources with this same type and name — which should be uResource
, right?
Is this to fetch the latest copy of the uResource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's to fetch the latest version. Also we poll on this selector so that if there are any new resources of this type that show up later and they are checked on that iteration (as against just fetching the list once and checking the status of known resources individually).
I have a general question: skaffold does not specifically tie itself to kubernetes which gives it more room to grow as a standalone CICD tooling. Do we really want it to deal with the kubernetes objects (GVK, unstructured) and mechanisms? If so, is that possible to reuse some existing functions to handle GVK, selector operations? I'm worried it may eventually hit many problems that other kubernetes tools have or have been resolved. @tejal29 @briandealwis |
We're happy for pointers @yuwenma! |
Thanks @yuwenma. Going to take a look at this in depth today. |
This has become the design philosophy now with the implementation of new deployers like
The status-check phase currently only works for selected kubernetes resources. The
we are using the kstatus library to parse the resource status and other client libraries and constructs. We're happy for pointers to other packages that you think could simplify our status-checker implementation. |
if recentEvent == nil || recentEvent.Type == v1.EventTypeNormal { | ||
return | ||
} | ||
// TODO: Add unique error codes for reasons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the events are there for debugging and diagnostics purposes, but the object status should be considered the truth.
I think processing the events risks reporting false positives and should not be used for this purpose:
- The list of event types is not fixed and may have more added, including diagnostic or informational events. This code would mis-report such events as errors.
- What if the last event is at odds with the actual resource status?
Is there a particular situation you've seen where the events were at odds with the resource status? If there's a specific sequence that you've seen that is a precursor for an error, that might make sense to report.
We could log something at debug level if the event type is not normal. Or even an info message and ask that they report an issue to the Skaffold issue tracker.
case kstatus.NotFoundStatus: | ||
status.updateAE(proto.StatusCode_STATUSCHECK_CONFIG_CONNECTOR_NOT_FOUND, result.Message) | ||
ae = proto.ActionableErr{ErrCode: proto.StatusCode_STATUSCHECK_CONFIG_CONNECTOR_NOT_FOUND, Message: result.Message} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how this would happen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually think that we'd run into this. But I added it for completion's sake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. A few more comments.
// See https://cloud.google.com/config-connector/docs/overview | ||
var ConfigConnectorResourceSelector = []GroupKindSelector{ | ||
// add preliminary support for config connector services; group name is currently in flux | ||
&wildcardGroupKind{Group: regexp.MustCompile(`([[:alpha:]]+\.)+cnrm\.cloud\.google\.com`)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
) | ||
|
||
// Filter returns the manifest list filtered by the given selectors | ||
func (l *ManifestList) Filter(selectors []GroupKindSelector) (ManifestList, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these functions, but they need tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on UTs
return nil, fmt.Errorf("unmarshaling config: %w", err) | ||
} | ||
gvk := obj.GroupVersionKind() | ||
for _, w := range selectors { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if we should have separate interfaces for allowing resource selectors (taking an unstructured.Ubstructured), or GVK, or group+kind. That can be done separately.
Fixes: #6709
Description
This PR implements skaffold's status checker for config connector resources.
Background:
Status check currently works for only deployments and standalone pods. For either type, we poll for the latest status (running
kubectl rollout deployment <foo>
for deployments, andkubectl get pods ...
for standalone pods). We also check the status of individual pods and containers in the deployment.For config connector resources:
Config connector resources are created from custom resource definitions (CRDs) and there's no direct command to invoke to get their status. However, they all expose a Ready condition that can be waited on.
Testing instructions
pubsub
, runskaffold dev
against the k8s cluster with config connector setup.You should see the status check happening for the deployed resources:
failing
runskaffold dev
. This should cause the status check to fail