Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain controller #488

Closed
wants to merge 5 commits into from
Closed

Conversation

e0ne
Copy link
Collaborator

@e0ne e0ne commented Aug 9, 2023

No description provided.

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @e0ne,

we are not going with the other solution to add the max number into the sriovPoolConfig?

@e0ne
Copy link
Collaborator Author

e0ne commented Aug 9, 2023

hi @e0ne,

we are not going with the other solution to add the max number into the sriovPoolConfig?

@SchSeba this PR doesn't change user-facing API. The goal of this PR is to move drain logic to controller and use SriovNetworkNodeState to store drain status. it's still does drain only on one node at the same time.

Once we agree on API changes proposed in #479 next PR will add changes to sriovPoolConfig and implement multiple draining nodes

@e0ne e0ne force-pushed the drain-controller branch from dcc81ce to 893ca36 Compare August 14, 2023 20:42
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne force-pushed the drain-controller branch from 893ca36 to d75c669 Compare August 16, 2023 12:35
@e0ne e0ne marked this pull request as ready for review August 16, 2023 12:35
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

main.go Fixed Show fixed Hide fixed
main.go Fixed Show fixed Hide fixed
@e0ne e0ne force-pushed the drain-controller branch from d75c669 to 350e604 Compare August 22, 2023 15:46
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne force-pushed the drain-controller branch from 350e604 to 17d0662 Compare August 22, 2023 15:46
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Sep 4, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne
Copy link
Collaborator Author

e0ne commented Sep 5, 2023

/test-e2e-all

controllers/drain_controller.go Show resolved Hide resolved
controllers/drain_controller.go Show resolved Hide resolved
controllers/drain_controller.go Show resolved Hide resolved
}

func (dr *DrainReconciler) drainNode(node *corev1.Node) error {
glog.Info("drainNode(): Update prepared")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't combine different logger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return reconcile.Result{}, err
}
nodeState.Status.DrainStatus = anno
mr.ClientSet.SriovnetworkV1().SriovNetworkNodeStates(namespace).UpdateStatus(context.TODO(), nodeState, metav1.UpdateOptions{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be careful here..
because today there is a clear split between the daemon and the operator.
meaning the operator updates ONLY the spec and the daemon updates the status

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke
Copy link
Member

zeeke commented Oct 19, 2023

@e0ne can you please rebase this PR? I'm interested in running e2e test lane on this

main.go Outdated
glog.Info(fmt.Sprintf("%s pod from Node %s/%s", verbStr, pod.Namespace, pod.Name))
},
//Out: writer{glog.Info},
//ErrOut: writer{glog.Error},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a panic while running conformance tests:

E1020 10:20:44.423046       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 237 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2a473e0?, 0x40d9d30})
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000544900?})
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x2a473e0, 0x40d9d30})
	/usr/lib/golang/src/runtime/panic.go:884 +0x213
fmt.Fprintf({0x0, 0x0}, {0x2d42314, 0xc}, {0xc000766898, 0x1, 0x1})
	/usr/lib/golang/src/fmt/print.go:225 +0x7a
k8s.io/kubectl/pkg/drain.RunNodeDrain(0xc0008c2dd0, {0xc00067cdb0?, 0x0?})
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/kubectl/pkg/drain/default.go:40 +0xd4
github.com/k8snetworkplumbingwg/sriov-network-operator/controllers.(*DrainReconciler).drainNode.func1()
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/controllers/drain_controller.go:168 +0x156
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x0?)
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:145 +0x4d
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x2540be400, 0x4000000000000000, 0x0, 0x5, 0x0}, 0x0?)
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:461 +0x5f
github.com/k8snetworkplumbingwg/sriov-network-operator/controllers.(*DrainReconciler).drainNode(0xc000275b00, {0x308f540?, 0xc000513560?}, 0xc000428dc0)
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/controllers/drain_controller.go:161 +0x31d
github.com/k8snetworkplumbingwg/sriov-network-operator/controllers.(*DrainReconciler).Reconcile(0xc000275b00, {0x308f540, 0xc000513560}, {{{0xc0000000fa, 0x20}, {0x2d606eb, 0x1c}}})
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/controllers/drain_controller.go:86 +0x51b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x308f540?, {0x308f540?, 0xc000513560?}, {{{0xc0000000fa?, 0x292db00?}, {0x2d606eb?, 0x3079ed8?}}})
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00039a960, {0x308f498, 0xc0006929b0}, {0x2b27140?, 0xc0005198a0?})
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314 +0x377
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00039a960, {0x308f498, 0xc0006929b0})
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222 +0x587
2023-10-20T10:20:44.423098134Z	INFO	Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference	{"controller": "sriovoperatorconfig", "controllerGroup": "sriovnetwork.openshift.io", "controllerKind": "SriovOperatorConfig", "SriovOperatorConfig": {"name":"drain-upgrade-reconcile-name","namespace":"openshift-sriov-network-operator"}, "namespace": "openshift-sriov-network-operator", "name": "drain-upgrade-reconcile-name", "reconcileID": "a469baca-6d9c-46fc-a2ff-1d81e65d4402"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1048b3a]

@zeeke zeeke mentioned this pull request Oct 20, 2023
return false
}

return oldAnno == newAnno
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the doc:

// Update returns true if the Update event should be processed

So I'd change it to oldAnno != newAnno, as the event should be processed if a value change has happened. Right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// sort nodeList to iterate in the same order each reconcile loop
sort.Slice(nodeList.Items, func(i, j int) bool {
return strings.Compare(nodeList.Items[i].Name, nodeList.Items[j].Name) == -1
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not sort.Strings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we sort slice of objects here

@e0ne e0ne force-pushed the drain-controller branch from d666eff to 20a617f Compare November 6, 2023 07:32
Copy link

github-actions bot commented Nov 6, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne force-pushed the drain-controller branch from 20a617f to a2f0a5c Compare November 6, 2023 08:34
Copy link

github-actions bot commented Nov 6, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne force-pushed the drain-controller branch from a2f0a5c to 651df2e Compare November 6, 2023 08:40
Copy link

github-actions bot commented Nov 6, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 6768257558

  • 127 of 199 (63.82%) changed or added relevant lines in 4 files are covered.
  • 12 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.8%) to 26.29%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/cluster.go 29 41 70.73%
controllers/helper.go 16 29 55.17%
pkg/daemon/daemon.go 5 27 18.52%
controllers/drain_controller.go 77 102 75.49%
Files with Coverage Reduction New Missed Lines %
controllers/sriovibnetwork_controller.go 2 70.45%
controllers/sriovnetwork_controller.go 2 70.68%
pkg/daemon/daemon.go 2 40.95%
api/v1/helper.go 6 42.36%
Totals Coverage Status
Change from base Build 6744863425: 0.8%
Covered Lines: 2420
Relevant Lines: 9205

💛 - Coveralls

@SchSeba SchSeba mentioned this pull request Nov 25, 2023
@SchSeba
Copy link
Collaborator

SchSeba commented Dec 21, 2023

Hi @e0ne it's fine if we close this in favor of #555?

@SchSeba
Copy link
Collaborator

SchSeba commented Mar 10, 2024

closing this one

@SchSeba SchSeba closed this Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants