Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POWERMON-235: Add support for Redfish #364

Merged

Conversation

vimalk78
Copy link
Collaborator

@vimalk78 vimalk78 commented Mar 2, 2024

This PR

  • Extends Kepler and KeplerInternal API to add Exporter.Redfish
  • Redfish spec has fields for adding reference to a secret which should
    contain Redfish credentials.
  • The user supplied secret is mounted in the kepler container, and daemon set spec is changed which causes kepler containers to restart.
  • Added Unit tests and e2e test

@vimalk78 vimalk78 force-pushed the add-redfish-credentials branch 10 times, most recently from 2fde234 to d1ab5ab Compare March 3, 2024 16:15
@vimalk78 vimalk78 requested review from sthaha and sunya-ch March 4, 2024 04:46
Copy link
Collaborator

@sunya-ch sunya-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is worth merging for configuring the redfish. Please kindly find my detailed comments below.

@@ -93,82 +93,98 @@ func TestTolerations(t *testing.T) {
}
}

func TestHostPID(t *testing.T) {
func Test_DaemonSet(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to TestDaemonSet to keep the same format to the other test function.

type InternalExporterSpec struct {
// +kubebuilder:validation:Required
Deployment InternalExporterDeploymentSpec `json:"deployment"`

Redfish *InternalRedfishSpec `json:"redfish,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to extend spec for configuring the other metrics as well (such as to enable/disable cgroups metrics and so on) - check #202.
Thus, I think we should move Redfish to be under metric.energy and resource utilization metrics such as cGroups should go under metric.resource.

Refer to terms in Kepler:

// https://github.com/sustainable-computing-io/kepler/blob/3d0bad753d68f74a0955f6e25d2a9d8e93507d08/pkg/collector/stats/stats.go#L47
type Stats struct {
	ResourceUsage map[string]*types.UInt64StatCollection
	EnergyUsage   map[string]*types.UInt64StatCollection
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a bigger change to the API. Can we do this api change in a separate PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I think we can update that on a different PR. @sthaha What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of keeping redfish right under exporter since it isn't metrics or deployment related rather is is power meter and having exporter.redfish is acceptable. I might consider exporter.powermeter.redfish (or something else) at a later stage.

@vimalk78 vimalk78 force-pushed the add-redfish-credentials branch from d1ab5ab to 4fb666d Compare March 4, 2024 05:18
@sunya-ch
Copy link
Collaborator

sunya-ch commented Mar 4, 2024

This PR made a change on Kepler resource. Shall we first limit this change to only KeplerInternal?
Also, if redfish.secretRef is not set in the CR, it seems to make it not mounting since there is no default value for that.

@vimalk78
Copy link
Collaborator Author

vimalk78 commented Mar 4, 2024

Also, if redfish.secretRef is not set in the CR, it seems to make it not mounting

thats by design. once secretRef is set, then only -redfish-cred-file-path is added to container

Comment on lines +48 to +50
REDFISH_ARGS = "-redfish-cred-file-path=/etc/redfish/redfish.csv"
REDFISH_CSV = "redfish.csv"
REDFISH_ANNOTATION = "kepler.system.sustainable.computing.io/redfish-secret-ref"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please change this to follow the convention?
If these constants aren't used outside the package, lets make them private.

func (r KeplerInternalReconciler) getRedfishSecret(secretName string) (*corev1.Secret, error) {
redfishSecret := corev1.Secret{}
redFishSecretName := types.NamespacedName{
Namespace: KeplerDeploymentNS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be the namespace that you get from the kepler-internal instead of the deployment namespace of kepler ?
As a rule of thumb, internal must not use (or have any knowledge) of Kepler related config or variables.

rs = append(rs,
KeplerDaemonSetReconciler{
ki: *ki,
updater: updateResource,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All single resource reconcilers should move to pkg/reconcilers to provide decoupling.
Since KeplerDaemonSetReconciler already has ki which should be be owner, it need not be passed the updater and can create a generic updater{Owner: ki, Resource: ds}.

Note: reconcileFn is a private type within the controller to ease creation of reconciler

// # pkg/reconciler/kepler_daemonset.go 

func (r ) Reconcile(..) {
  ... 
  return Updater{Owner: ki, }.Reconcile(...)
}

rs = append(rs, resourceReconcilers(updateResource, openshiftNamespacedResources(ki, cluster)...)...)
return rs
}

type KeplerDaemonSetReconciler struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should move to pkg/reconciler/ where this can be unit tested as well.


requests := []reconcile.Request{}
for _, ki := range ks.Items {
if secret.ObjectMeta.Name == ki.Spec.Exporter.Redfish.SecretRef {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also check the namespace matches deployment.namespace

if Config.Cluster == k8s.OpenShift {
c = c.Owns(&secv1.SecurityContextConstraints{}, genChanged)
}
return c.Complete(r)
}

// mapSecretToRequests returns the reconcile requests for kepler-internal objects for which an associated redfish secret has been changed.
func (r *KeplerInternalReconciler) mapSecretToRequests(ctx context.Context, object client.Object) []reconcile.Request {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider Own(ing) the secret instead, which wouldn't require us to map secrets to k-i manually?

Comment on lines 591 to 594
if errors.IsNotFound(err) {
return nil, nil
}
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return nil, errors.IgnoreNotFound(err)

@@ -287,7 +303,7 @@ func NewClusterRole(c components.Detail, k *v1alpha1.KeplerInternal) *rbacv1.Clu
},
Rules: []rbacv1.PolicyRule{{
APIGroups: []string{""},
Resources: []string{"nodes/metrics", "nodes/proxy", "nodes/stats", "pods"},
Resources: []string{"nodes/metrics", "nodes/proxy", "nodes/stats", "pods", "secrets"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be needed since kepler does not need to watch secrets.

Comment on lines 262 to 267
var redfishSpec *v1alpha1.InternalRedfishSpec = nil
if k.Spec.Exporter.Redfish != nil {
redfishSpec = &v1alpha1.InternalRedfishSpec{
RedfishSpec: *k.Spec.Exporter.Redfish,
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this an api added to Kepler, we should first add this as v1alpha1.RedfishSpec and resuse that in the internal. As as rule of thumb, Internal can resue Kepler but not the other way round.

Comment on lines 39 to 42
type InternalRedfishSpec struct {
RedfishSpec `json:",inline"`
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need a separate internal redfish spec since the one in kepler can be reused.

// +optional
// +kubebuilder:validation:Pattern=`\d+`
// +kubebuilder:default:="60"
ProbeInterval string `json:"probeIntervalSec,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use metav1.Duration instead and convert to seconds when the configmap is created?
also rename the field to json:probeInterval ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func (r KeplerDaemonSetReconciler) Reconcile(ctx context.Context, cli client.Client, s *runtime.Scheme) reconciler.Result {

secretRef := r.ki.Spec.Exporter.Redfish.SecretRef
secret, err := r.getRedfishSecret(ctx, cli, secretRef)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fail reconcile, set the status to available false, if a secret is specified and is missing or invalid (missing the redfish.csv) . Otherwise there is no way for user to know if kepler is using redfish or not.

@sthaha
Copy link
Collaborator

sthaha commented Mar 5, 2024

@vimalk78 nice work overall. It worked pretty well in my test. But few comments to address.
The main one being that secret must be present for kepler deployment to continue and the reason for failure to deploy must be reported back in Availability status.

@@ -74,12 +77,39 @@ func (r *KeplerInternalReconciler) SetupWithManager(mgr ctrl.Manager) error {
Owns(&rbacv1.ClusterRoleBinding{}, genChanged).
Owns(&rbacv1.ClusterRole{}, genChanged)

c = c.Watches(&corev1.Secret{},
handler.EnqueueRequestsFromMapFunc(r.mapSecretToRequests))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, handler.ResourceVersionChanged{}

}

ks := v1alpha1.KeplerInternalList{}
if err := r.List(ctx, &ks); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the perf a bit more by using the fieldSelector
see: https://book.kubebuilder.io/reference/watching-resources/externally-managed

Comment on lines 92 to 95
secret, ok := object.(*corev1.Secret)
if !ok {
return nil
}
Copy link
Collaborator

@sthaha sthaha Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed since the watch is already set for v1.Secret and we can make use of GetName() and GetNamespace()

 - Extended Kepler and KeplerInternal API to add `Exporter.Redfish`
 - Redfish spec has fields for adding reference to a secret which should
   contain Redfish credentials.
 - The user supplied secret is mounted in the kepler container.
 - Added Unit tests and e2e test

Signed-off-by: Vimal Kumar <[email protected]>
@vimalk78 vimalk78 force-pushed the add-redfish-credentials branch from 4fb666d to 8ce215e Compare March 8, 2024 11:40
@@ -9,6 +9,7 @@ rules:
resources:
- daemonsets
- deployments
- secrets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't create secrets do we ?

Comment on lines +148 to +150
ds.Spec.Template.Annotations = map[string]string{
REDFISH_ANNOTATION: secret.ResourceVersion,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be sufficient to just annotate the ds instead of the template. Lets handle this in a different PR.

//+kubebuilder:rbac:groups=security.openshift.io,resources=securitycontextconstraints,verbs=list;watch;create;update;patch;delete;use
//+kubebuilder:rbac:groups=monitoring.coreos.com,resources=servicemonitors;prometheusrules,verbs=list;watch;create;update;patch;delete

// RBAC required by Kepler exporter
//+kubebuilder:rbac:groups=core,resources=nodes/metrics;nodes/proxy;nodes/stats,verbs=get;list;watch
//+kubebuilder:rbac:groups=core,resources=nodes/metrics;nodes/proxy;nodes/stats;secrets,verbs=get;list;watch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary since Kepler exporter does not watch or create secrets.

@@ -302,7 +302,7 @@ restart_operator() {
info "scale down Operator"
run kubectl scale -n "$OPERATORS_NS" --replicas=0 "$deployment"
run kubectl wait -n "$OPERATORS_NS" --for=delete \
pods -l app.kubernetes.io/component=operator --timeout=60s
pods -l app.kubernetes.io/component=manager --timeout=60s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good find!

@sthaha sthaha merged commit c46fb30 into sustainable-computing-io:v1alpha1 Mar 10, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants