Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for NoExecute taint, pod affinity, and bug fix volume attachment delete #102

Merged
merged 8 commits into from
Jan 26, 2022

Conversation

rbo54
Copy link
Collaborator

@rbo54 rbo54 commented Jan 21, 2022

Description

This PR contains:

  1. Changes to the podmontest helm script to support pod affinity and pod topology spread in order to test customer use case.
  2. A new script bounce.kubelet to test scenarios where a kubelet is not present.
  3. Changes to the controller code to process pods with affinity together when discovered by ArrayConnectivityLoss to try and help the k8s scheduler make better decisions.
  4. A change to the controller logic to only honor the existing flag SkipArrayConnectionValidation when the pod's node is tainted with a k8s No Execute taint (meaning kubernetes wants to evacuate the node.) This makes the code safer.
  5. A retry that fixes the problem in CSM-148 unable to delete volume attachment.
  6. A new flag on nway.sh "rebalance" that will rebalance the pod to node distribution after a failover if required rather than failing because some nodes need more than 110 pods. Also a separate rebalance.sh script to do the same thing.

GitHub Issues

List the GitHub issues impacted by this PR:

GitHub Issue #
CSM-148 - Fix for unable to delete volume attachment
CSM-87 - enhancement so that enabling SkipArrayConnectionValidation is safer as is only done when the node has a No Execute taint
CSM-165 - CSM Resiliency enhancement to consider pod affinity

Checklist:

  • I have performed a self-review of my own code to ensure there are no formatting, vetting, linting, or security issues
  • I have verified that new and existing unit tests pass locally with my changes
  • I have not allowed coverage numbers to degenerate
  • I have maintained at least 90% code coverage
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • Backward compatibility is not broken

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

  • New unit tests for pod affinity
  • Executed VxFlex full integration test- results:
  • 18 scenarios (18 passed)
    184 steps (184 passed)
    1h33m23.697942654s
    time="2022-01-20T20:18:43-05:00" level=info msg="Integration test finished"
    --- PASS: TestPowerFlexIntegration (5603.73s)
    PASS

@rbo54
Copy link
Collaborator Author

rbo54 commented Jan 21, 2022

I will fix the forbidden words with a grep of 'mast.r' instead of what kubernetes calls them.

cmd/podmon/main.go Outdated Show resolved Hide resolved
@hoppea2 hoppea2 changed the title Tom/krv 2608 1662 Support for NoExecute taint, pod affinity, and bug fix volume attachment delete Jan 24, 2022
alikdell
alikdell previously approved these changes Jan 24, 2022
tdawe
tdawe previously approved these changes Jan 25, 2022
sharmilarama
sharmilarama previously approved these changes Jan 25, 2022
@@ -403,6 +454,23 @@ func (cm *PodMonitorType) ArrayConnectivityMonitor() {
}
}

// ProcessPodInfoForCleanup processes a ControllerPodInfo for cleanup, checking that the UID and object are the same, and then calling controllerCleanupPod.
func (cm *PodMonitorType) ProcessPodInfoForCleanup(podInfo *ControllerPodInfo, reason string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This func is not required to expose!

@rbo54
Copy link
Collaborator Author

rbo54 commented Jan 26, 2022

I have updated the PR, addressing Trevor's concern. I have run full integration test for VxFlex and Unity, both passed. Unit test still also passing.

alikdell
alikdell previously approved these changes Jan 26, 2022
tdawe
tdawe previously approved these changes Jan 26, 2022
@rbo54 rbo54 dismissed stale reviews from tdawe, shaynafinocchiaro, and alikdell via fe24a65 January 26, 2022 20:57
@rbo54 rbo54 force-pushed the tom/KRV-2608-1662 branch from 18e5e12 to fe24a65 Compare January 26, 2022 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants