-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve nodePort support with --preserve-nodeports flag #3095
Preserve nodePort support with --preserve-nodeports flag #3095
Conversation
…" flag Signed-off-by: Yusuf Güngör <[email protected]>
Signed-off-by: Yusuf Güngör <[email protected]>
2d1dce4
to
1ff3498
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is reasonable to me.
However, I am curious to know why https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/service_action.go#L75-L79 doesn't work in this case.
Hi @ashish-amarnath thanks for review. It does not work because in our case NodePorts values were not explicitly specified, all NodePorts values are auto-assigned. It is very hard for us to explicitly specify NodePorts for all Services. There exist too many k8s clusters for too many teams so it is a big operation for us to convert auto-assigned NodePorts to explicitly specify NodePorts. |
-> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]>
e7761ac
to
872ce87
Compare
@yusufgungor What happens when the node ports are preserved in the backup, but are already allocated in the target cluster on restore with this change? Does the service just fail to restore? My understanding is that is what would happen. I've researched this before and it's hard to get correct behavior due to how Kubernetes handles the error cases around port numbers. We're definitely open to addressing this, though, as you're right that this is a problem for services not managed directly by a human. |
Hi @nrb , thanks for your explanation. Now, I have tested with 3 Services which named: hello-service, test-service-1, test-service-2 When these 3 services already exist in target cluster, tried to backup and then restore with "--preserve-nodeports" flag. (ports already allocated on target cluster by these services) As you said, we got error for a service and than Velero continue with next Service. In my 1st test case, all 3 services were exist on target cluster, so we got errors for all 3 services.
On my 2nd test case, I have removed the "test-service-1" before restoring the backup and then we are not getting error for that service but still getting errors for other 2 services.
When these 3 services already exist in target cluster, tried to restore without "--preserve-nodeports" flag. Then we got no errors on logs.
Using the "--preserve-nodeports" flag does not break the restore operation even "provided port is already allocated" error occures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming. I think my only issue with the code as is right now, then, is that these lines should definitely be flagged as error or warning level, not info. This is so users can easily find what wasn't restored.
time="2020-11-20T06:38:56Z" level=info msg="error restoring test-service-2: Service \"test-service-2\" is invalid: spec.ports[0].nodePort: Invalid value: 31192: provided port is already allocated" logSource="pkg/restore/restore.go:1170" restore=velero/test-with-3-svc-20201120093856
One more question to consider - what happens if the services are restored into a cluster with a completely different port range? Have you tried? I think a similar allocator error would occur, specifically this error.
I think these things can be addressed via documentation, and using the flag gives users the control over whether or not they want to migrate their Services, though, so overall I'm in favor of the idea that this PR puts forward.
Could you please also add documentation for use and these caveats to https://github.com/vmware-tanzu/velero/blob/main/site/content/docs/v1.5/restore-reference.md and https://github.com/vmware-tanzu/velero/blob/main/site/content/docs/main/restore-reference.md?
-> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]>
-> Documentation updated about Velero nodePort restore logic and preservation of them. Signed-off-by: Yusuf Güngör <[email protected]>
…velero into preserve-node-ports
Hi @nrb , thanks for your review. "Error was something other than an AlreadyExists" now logged as Error instead of Info.
After you pointed it out, tested the case if the services are restored into a cluster with a completely different port range and you are exactly right. I got the error which you guessed. (that error) Backup has 3 services with NodePorts as shown below: NAME PORT(S) Then updated a k8s cluster to serve NodePorts from range 20000-22767 using this doc
Removed the services and tried to restore and got the error logs as shown below.
Velero printed error logs and restore operation was not interrupted. I also documented this flag and the Velero's nodePort restore logic on the files which you have pointed out as below. What happens to NodePorts when restoring ServicesAuto assigned NodePorts deleted by default and Services get new auto assigned nodePorts after restore. Explicitly specified NodePorts auto detected using Always Preserve NodePortsIt is not always possible to set nodePorts explicitly on some big clusters because of operation complexity. Official Kubernetes documents states that preventing port collisions is responsibility of the user when explicitly specifying nodePorts:
The clusters which are not explicitly specifying nodePorts still may need to restore original NodePorts in case of disaster. Auto assigned nodePorts most probably defined on Load Balancers which located front side of cluster. Changing all these nodePorts on Load Balancers is another operation complexity after disaster if nodePorts are changed. Velero has a flag to let user deciding the preservation of nodePorts. If this flag given and/or set to true, Velero does not remove the nodePorts when restoring Service and tries to use the nodePorts which written on backup. Trying to preserve nodePorts may cause port conflicts when restoring on situations below:
|
Hi @dsu-igeek is there anything I should do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yusufgungor Thanks for your thorough changes and documentation! I appreciate it. Sorry for the delay in updating my review.
@ashish-amarnath When you have time, please revisit as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @yusufgungor
It is my pleasure to contribute this great project. I am looking forward to see the release 1.6.0. Thanks for your reviews and helps @nrb @ashish-amarnath |
…u#3095) * -> Preserve nodePort support when restoring via "--preserve-nodeports" flag Signed-off-by: Yusuf Güngör <[email protected]> * -> Added changelog. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Other restore errors log level changed from info to error. -> Documentation updated about Velero nodePort restore logic and preservation of them. Signed-off-by: Yusuf Güngör <[email protected]> Co-authored-by: Yusuf Güngör <[email protected]>
…u#3095) * -> Preserve nodePort support when restoring via "--preserve-nodeports" flag Signed-off-by: Yusuf Güngör <[email protected]> * -> Added changelog. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Other restore errors log level changed from info to error. -> Documentation updated about Velero nodePort restore logic and preservation of them. Signed-off-by: Yusuf Güngör <[email protected]> Co-authored-by: Yusuf Güngör <[email protected]>
Thanks for this PR @yusufgungor. When will this be GA ? |
…u#3095) * -> Preserve nodePort support when restoring via "--preserve-nodeports" flag Signed-off-by: Yusuf Güngör <[email protected]> * -> Added changelog. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Other restore errors log level changed from info to error. -> Documentation updated about Velero nodePort restore logic and preservation of them. Signed-off-by: Yusuf Güngör <[email protected]> Co-authored-by: Yusuf Güngör <[email protected]>
…u#3095) * -> Preserve nodePort support when restoring via "--preserve-nodeports" flag Signed-off-by: Yusuf Güngör <[email protected]> * -> Added changelog. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Other restore errors log level changed from info to error. -> Documentation updated about Velero nodePort restore logic and preservation of them. Signed-off-by: Yusuf Güngör <[email protected]> Co-authored-by: Yusuf Güngör <[email protected]>
…u#3095) * -> Preserve nodePort support when restoring via "--preserve-nodeports" flag Signed-off-by: Yusuf Güngör <[email protected]> * -> Added changelog. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Other restore errors log level changed from info to error. -> Documentation updated about Velero nodePort restore logic and preservation of them. Signed-off-by: Yusuf Güngör <[email protected]> Co-authored-by: Yusuf Güngör <[email protected]>
…u#3095) * -> Preserve nodePort support when restoring via "--preserve-nodeports" flag Signed-off-by: Yusuf Güngör <[email protected]> * -> Added changelog. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Unit test added. -> Using boolptr.IsSetToTrue for bool ptr check. Signed-off-by: Yusuf Güngör <[email protected]> * -> Other restore errors log level changed from info to error. -> Documentation updated about Velero nodePort restore logic and preservation of them. Signed-off-by: Yusuf Güngör <[email protected]> Co-authored-by: Yusuf Güngör <[email protected]>
We are planning to use Velero in case of disaster for restore the whole cluster. (All namespaces, deployments, services etc)
Unfortunately our k8s services exposed with auto assigned nodePorts. There exist hundreds of k8s clusters and it is not easy and possible to set predefined nodePorts for our Services. These nodeports also defined on lots of nginx load balancers and in case of disaster, changing all these nodePorts will cause a lot of issue for us. We always restore to a new cluster so nodePort conflict is not a case for us.
We respect the nodePort restore logic of Velero which uses annotations but this not help us. (#354)
We have added a flag to restore command ("--preserve-nodeports") which prevents removing nodePorts from services when restoring.
We hope you merge this PR to velero. (If not then we have to make a velero docker image and let it up to date)
Thanks.