Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add events for config daemon #519

Merged
merged 1 commit into from
Oct 17, 2023
Merged

Conversation

rollandf
Copy link
Contributor

K8s Events on SriovNetworkNodeStates objects
will be created on following stages:

  • Status change of SriovNetworkNodeStates
  • Start of config daemon
  • Node reboot started
  • Node drain started/finished

Fixes #510

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@rollandf
Copy link
Contributor Author

Example:

Events:
  Type    Reason             Age                    From           Message
  ----    ------             ----                   ----           -------
  Normal  ConfigDaemonStart  8m48s                  config-daemon  Config Daemon starting
  Normal  SyncStatusChanged  8m43s                  config-daemon  Status changed from: Unknown to: Succeeded
  Normal  SyncStatusChanged  6m42s                  config-daemon  Status changed from: InProgress to: Succeeded
  Normal  SyncStatusChanged  6m41s (x2 over 7m27s)  config-daemon  Status changed from: Succeeded to: InProgress
  Normal  DrainNode          6m11s                  config-daemon  Drain node has been initiated
  Normal  DrainNode          6m11s                  config-daemon  Drain node completed
  Normal  RebootNode         6m11s                  config-daemon  Reboot node has been initiated
  Normal  ConfigDaemonStart  65s                    config-daemon  Config Daemon starting
  Normal  SyncStatusChanged  65s                    config-daemon  Status changed from: InProgress to: Unknown
  Normal  SyncStatusChanged  60s                    config-daemon  Status changed from: Unknown to: InProgress
  Normal  SyncStatusChanged  9s                     config-daemon  Status changed from: InProgress to: Succeeded

@rollandf
Copy link
Contributor Author

@zeeke @adrianchiris please take a look

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments but the code is straightforward. great work!

Do you mind adding a basic test check to test/conformance/tests/_test_sriov_operator.go#L168 ?

Something like


				Eventually(func() bool {
					events, err := clients.Events(operatorNamespace).List(context.Background(), metav1.ListOptions{})
					Expect(err).ToNot(HaveOccurred())

					for _, e := range events.Items {
						if strings.Contains(e.Message, "Config Daemon starting") {
							return true
						}
					}
					return false
				}, 30*time.Second, 5*time.Second).Should(BeTrue(), "Config Daemon should record an event when starting")

I can propose it in a different pull request if you have trouble with that suite

pkg/daemon/event_recorder.go Outdated Show resolved Hide resolved
func NewEventRecorder(c snclientset.Interface, n string, kubeclient kubernetes.Interface) *EventRecorder {
eventBroadcaster := record.NewBroadcaster()
eventBroadcaster.StartStructuredLogging(4)
eventBroadcaster.StartRecordingToSink(&typedv1core.EventSinkImpl{Interface: kubeclient.CoreV1().Events("")})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use operator namespace for the Events? I think that way we can get a list of all events happened in the cluster via

kubectl -n sriov-network-operator get events

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The events are on sriovnetworknodestate which is a namespaced object. so they will be available.
Note that the events will disappear after 1 hour. (kubernetes/kubernetes#52521)

$ kubectl get events -n network-operator 
LAST SEEN   TYPE      REASON              OBJECT                                                     MESSAGE
8m35s       Normal    SyncStatusChanged   sriovnetworknodestate/cloud-dev-10                         Status changed from: Succeeded to: InProgress
2m7s        Normal    DrainNode           sriovnetworknodestate/cloud-dev-10                         Drain node has been initiated
2m7s        Normal    DrainNode           sriovnetworknodestate/cloud-dev-10                         Drain node completed
2m7s        Normal    RebootNode          sriovnetworknodestate/cloud-dev-10                         Reboot node has been initiated
8m35s       Normal    SyncStatusChanged   sriovnetworknodestate/cloud-dev-11                         Status changed from: Succeeded to: InProgress
8m4s        Normal    DrainNode           sriovnetworknodestate/cloud-dev-11                         Drain node has been initiated
8m4s        Normal    DrainNode           sriovnetworknodestate/cloud-dev-11                         Drain node completed
8m4s        Normal    RebootNode          sriovnetworknodestate/cloud-dev-11                         Reboot node has been initiated
2m59s       Normal    ConfigDaemonStart   sriovnetworknodestate/cloud-dev-11                         Config Daemon starting
2m59s       Normal    SyncStatusChanged   sriovnetworknodestate/cloud-dev-11                         Status changed from: InProgress to: Unknown
2m53s       Normal    SyncStatusChanged   sriovnetworknodestate/cloud-dev-11                         Status changed from: Unknown to: InProgress
2m8s        Normal    SyncStatusChanged   sriovnetworknodestate/cloud-dev-11                         Status changed from: InProgress to: Succeeded
87s         Warning   NodeNotReady        pod/cni-plugins-ds-259wh                                   Node is not ready
7m32s       Warning   NodeNotReady        pod/cni-plugins-ds-jtpkk                                   Node is not ready

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's exactly what I was referring to. Thanks for explaining.

About the deletion of the events, it looks like it can be tuned via a kubeapi-server argument:

--event-ttl duration     Default: 1h0m0s

IMO it's OK that is on the cluster administrator's side

K8s Events on SriovNetworkNodeStates objects
will be created on following stages:

- Status change of SriovNetworkNodeStates
- Start of config daemon
- Node reboot started
- Node drain started/finished

Signed-off-by: Fred Rolland <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@rollandf
Copy link
Contributor Author

I can propose it in a different pull request if you have trouble with that suite

Yeah, it would be great. I tried to run the suite on my env and it failed.

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@e0ne e0ne merged commit bf595e4 into k8snetworkplumbingwg:master Oct 17, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the config-daemon observability with k8s v1.Events
3 participants