Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds the LVMVolumeGroup CRD and processing logic #72

Merged
merged 6 commits into from
Jan 17, 2022
Merged

feat: adds the LVMVolumeGroup CRD and processing logic #72

merged 6 commits into from
Jan 17, 2022

Conversation

nbalacha
Copy link
Contributor

The LVMCluster reconciliation will now create individual LVMVolumeGroup objects for each deviceClass. The VGManager will now reconcile the LVMVolumeGroup instead of the LVMCluster CR.

@nbalacha nbalacha requested a review from jmolmo January 14, 2022 12:44
return lvmvgName
}

func (c lvmVG) ensureCreated(r *LVMClusterReconciler, ctx context.Context, lvmCluster *lvmv1alpha1.LVMCluster) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not related to this PR). Just realised the context should usually be the first argument of the function.

Scheme: mgr.GetScheme(),
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Nodename: os.Getenv("NODE_NAME"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need Nodename in the VGReconciler object ? It s used only when we update the status. So we can directly call os.Getenv("NODE_NAME") over there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem with saving it. Similar to how we save the namespace elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. name should be NodeName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -46,7 +46,7 @@ const (
func (r *VGReconciler) SetupWithManager(mgr ctrl.Manager) error {
r.deviceAgeMap = newAgeMap(&wallTime{})
return ctrl.NewControllerManagedBy(mgr).
For(&lvmv1alpha1.LVMCluster{}).
For(&lvmv1alpha1.LVMVolumeGroup{}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will creation of a new LVMVolumeGroup trigger controller reconcile for all the other existing LVMVolumeGroups?
If yes, then I think we should avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the request will have the name of the object it was triggered for.

Copy link
Contributor

@jmolmo jmolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor things related with the use of Caps letters in log messages

Scheme: mgr.GetScheme(),
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Nodename: os.Getenv("NODE_NAME"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is set this env. variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here.

r.Log.Info("LVMVG deleted", "Name", vgcr.Name)
return nil
}
r.Log.Error(err, "failed to retrieve LVMVG resource", "Name", vgcr.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

":s/Name/name" in log lines

Copy link
Contributor Author

@nbalacha nbalacha Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Name is the fieldname. shouldn't that be capitalized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not consistent through out. Both name and Name are used in the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Changing it to "name".

if err != nil {
// already deleted in previous reconcile
if errors.IsNotFound(err) {
r.Log.Info("LVMVG deleted", "Name", vgcr.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

":s/Name/name"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// if not deleted, initiate deletion
if existingLvmVg.GetDeletionTimestamp().IsZero() {
if err = r.Client.Delete(ctx, existingLvmVg); err != nil {
r.Log.Error(err, "failed to delete LVMVG", "Name", existingLvmVg.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

r.Log.Error(err, "failed to delete LVMVG", "Name", existingLvmVg.Name)
return err
} else {
r.Log.Info("initiated LVMVG deletion", "Name", existingLvmVg.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

err := r.Client.Get(ctx, req.NamespacedName, lvmCluster)

var status string = "available"
var reason string = "Success"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

":/s/Success/success"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status strings are usually capitalized.

var matchingDevices []internal.BlockDevice
_, matchingDevices, err = filterMatchingDevices(remainingValidDevices, volumeGroup)
if err != nil {
r.Log.Error(err, "could not filter matching Devices", "VGName", volumeGroup.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

":S/Devices/devices"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

reason = "failed to create VG"
} else {
status = "degraded"
reason = "Failed to extend VG"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

":S/Failed/failed"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status messages are usually capitalized.
Eg. This is a condition in a deployment status:

  • lastTransitionTime: "2022-01-14T13:48:03Z"
    lastUpdateTime: "2022-01-14T13:48:03Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2022
return nil
}

func (c lvmVG) getLvmVolumeGroups(r *LVMClusterReconciler, instance *lvmv1alpha1.LVMCluster) ([]*lvmv1alpha1.LVMVolumeGroup, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only pass the namespace instead of entire LVMClusterReconciler, if we only plan to use the namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to address that in a separate PR.


deviceClasses := instance.Spec.DeviceClasses
for _, deviceClass := range deviceClasses {
lvmVolumeGroup := &lvmv1alpha1.LVMVolumeGroup{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we decide which deviceClass is default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will need to be changed in the CRD design. I will take that up in a separate PR.

lvmVolumeGroups = append(lvmVolumeGroups, lvmVolumeGroup)
}
return lvmVolumeGroups, nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

)

const (
lvmvgName = "lvmvg-manager"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lvmvgName = "lvmvg-manager"
lvmVGName = "lvmvg-manager"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2022
})

if err != nil {
r.Log.Error(err, "LvmVolumeGroup reconcile failure", "name", volumeGroup.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency

Suggested change
r.Log.Error(err, "LvmVolumeGroup reconcile failure", "name", volumeGroup.Name)
r.Log.Error(err, "failed to reconcile LvmVolumeGroup", "name", volumeGroup.Name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


for _, vgcr := range vgcrs {
err := r.Client.Get(ctx, types.NamespacedName{Name: vgcr.Name, Namespace: vgcr.Namespace}, existingLvmVg)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// already deleted in previous reconcile
if errors.IsNotFound(err) {
r.Log.Info("LVMVG deleted", "Name", vgcr.Name)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we continue here rather than returning nil ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

return nil
}

func (c lvmVG) getLvmVolumeGroups(r *LVMClusterReconciler, instance *lvmv1alpha1.LVMCluster) ([]*lvmv1alpha1.LVMVolumeGroup, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLvmVolumeGroups does not return any error. Should remove error from the return types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if err != nil {
// already deleted in previous reconcile
if errors.IsNotFound(err) {
r.Log.Info("LVMVG deleted", "Name", vgcr.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logs should be more consistent. ensureDeleted uses LVMVG in the log messages while ensureCreated uses LvmVolumeGroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if err = r.Client.Delete(ctx, existingLvmVg); err != nil {
r.Log.Error(err, "failed to delete LVMVG", "Name", existingLvmVg.Name)
return err
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else block can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

found = true
nodeStatus.Status = status
nodeStatus.Reason = reason
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break for loop once NodeName is found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

nodeStatus.Reason = reason
}
}
if !found {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will update the Status when the NodeName was found but it has different Status or Reason. So we should add status if NodeName not found and update status if NodeName is found but its Status or Reason are different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't understand this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only updating the instance.Status.NodeStatus in line 307, inside the if !found block. What if found is true but Status and/or Reason is different. In this case, the Status and Reason will never be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status is being updated in the original array if the node is found. append should create a new slice which is why it is being assigned again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

VolumeGroup: deviceClass.Name,
Default: true,
})
status = "unavailable"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status should be used as constants (enums) so that its easier to know what all statuses are available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be taken up in a different PR.

The LVM operator will create individual LVMVolumeGroup objects
for the deviceClasses defined in the LVMCluster resource.

Signed-off-by: N Balachandran <[email protected]>
This commit adds the LVMVolumeGroup CRD and rbac permissions
for the LVM controller.

Signed-off-by: N Balachandran <[email protected]>

func (r *VGReconciler) updateStatus(ctx context.Context, status string, reason string, instance *lvmv1alpha1.LVMVolumeGroup) error {

found := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have a situation where multiple controllers will be updating the same LVMVolumeGroup status, right?
If that's the case, should we first get the latest LVMVolumeGroup instance before verifying the NodeStatus?

Copy link
Contributor

@sp98 sp98 Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may still NOT* be enough to ensure that we are using the updated instance of LVMVolumeGroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not the latest version, I expect the update to fail. I will confirm after testing on a multi node cluster.

Default: true,
})
status = "unavailable"
reason = "failed to create VG"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason for unavailable starts with small f, while that of degraded starts with capital F

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2022

lvmVolumeGroups := c.getLvmVolumeGroups(r, lvmCluster)

for _, volumeGroup := range lvmVolumeGroups {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the LVMVolumeGroup CRs from the LVMCluster CR is still a design concern. The admin should create the LVMVolumeGroup CRs directly. Not a blocker for feature complete, but we really need to discuss this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The admin should create the LVMCluster CR and the controller should create the LVMVolumeGroup CRs. The intention is to have a single CR for the controller to create the internal pieces from.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still haven't seen a design backing this decision, so my position remains to disagree with it so we can follow standard K8s patterns.

This commit adds the LVMVolumeGroupNodeStatus CRD and related files.
It also removes the NodeStatus from the LVMVolumegroup.Status as that is
now part of LVMVolumeGroupNodeStatus.

Signed-off-by: N Balachandran <[email protected]>
This commit adds the rbac permissions required by the
vgmanager for LVMVolumeGroup and LVMVolumeGroupNodeStatus.

Signed-off-by: N Balachandran <[email protected]>
The vgmanager will watch and reconcile the LVMVolumeGroup
instead of the LVMCluster CR. This commit also makes minor changes
the vgmanager reconcile code.

Signed-off-by: N Balachandran <[email protected]>
This commit adds tests to check the creation and deletion of
LVMVolumeGroups by the LVM controller.

Signed-off-by: N Balachandran <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2022
return ctrl.Result{}, err
}

r.Log.Info("listing block devices")
//TODO: actually check the node against the nodeSelector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to match with the node selector anymore? Since the LVMVolumeGroup CR is running on the correct node, should that already handle this check for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CR is the object stored in etcd and does not run on a node. The vgmanager pod is expected to check the node it is running on against the nodeselector specified in the LVMVolumeGroup CR to see if it expected to act on it. This is currently not being done.

@sp98
Copy link
Contributor

sp98 commented Jan 16, 2022

Q: When we add the functionality to read any new device add events on the nodes, how will we tell all the LVMVolumeGroup CRs associated with the node to reconcile?

@nbalacha
Copy link
Contributor Author

Q: When we add the functionality to read any new device add events on the nodes, how will we tell all the LVMVolumeGroup CRs associated with the node to reconcile?

The VG manager will need to reread the CRs once it receives the event. No changes will be made to the CR to trigger a reconcile.


for _, volumeGroup := range lvmVolumeGroups {
result, err := cutil.CreateOrUpdate(ctx, r.Client, volumeGroup, func() error {
// no need to mutate any field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • might not always be true, since deviceSelector isn't implemented fully yet we can get away without not using mutate function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. For now this is fine. The comment can be removed when that is actually done.

Comment on lines +116 to +119
remainingValidDevices, delayedDevices, err := r.filterAvailableDevices(blockDevices)
if err != nil {
_ = err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • either don't handle the err at all, a, b, _ := func or pls add a comment on why this is being not handled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the earlier existing code which I have not changed.

} else {
if found {
// Update the status again just to be safe.
if statuserr := r.updateStatus(ctx, status, volumeGroup); statuserr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • just a question, this has a possibility of conflicts when a vgcr matches multiple nodes right?
  • we didn't create the vg yet, why should we update the status as ready or the presence of name in config file is just enough for vg status to be ready?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because the lvmd.yaml file is only updated once the VG is created. This block is mainly to handle cases when the status might not have been updated correctly. That function can also be modified to not update if there is no change for this entry.


status := &lvmv1alpha1.VGStatus{
Name: req.Name,
Status: lvmv1alpha1.VGStatusReady,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • IMHO initial vgstatus should be unknown and we try to set it correctly by the end of reconcile 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be taken up in a separate PR?

if err != nil {
return ctrl.Result{}, err
if err == nil {
err = os.WriteFile(controllers.LvmdConfigFile, out, 0600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move this line to 218 and remove this conditional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks work like this:
if unmarshall fails,-> return error
If unmarshall succeeds -> attempt to write
if write fails -> return error

Notice that there is no else check - so the err check will work for both scenarios.

Comment on lines +338 to +341
if nodeStatus.CreationTimestamp.IsZero() {
vgNodeStatus.DeepCopyInto(nodeStatus)
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • i think setStatus already handles this condition by creating a new status when one is not found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working on the existing CR in the system so we need to update that object.

Copy link
Contributor

@leelavg leelavg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2022
@nbalacha
Copy link
Contributor Author

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 17, 2022

@nbalacha: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 17, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmolmo, leelavg, nbalacha, sp98

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nbalacha nbalacha merged commit a5ee8b9 into openshift:main Jan 17, 2022
@nbalacha nbalacha deleted the new-crd-2 branch January 18, 2022 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants