Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

Verify that a namespace is truly deleted after an API call #32

Merged
merged 5 commits into from
Oct 23, 2019

Conversation

rpalaznik
Copy link
Contributor

What changes were proposed in this pull request and why are they needed?

Resolves integration test issues when a namespace is not immediately deleted after an API call, resulting in errors like this in further tests:

Error from server (Forbidden): error when creating "/tmp/job-886088610": sparkapplications.sparkoperator.k8s.io "linear-regression" is forbidden: unable to create new content in namespace kudo-spark-operator-testing because it is being terminated 

How were the changes tested?

With make cluster-create and make test

Copy link
Contributor

@akirillov akirillov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @rpalaznik. Looks good with a few nits.

Can you please clarify the fix? The error indicates that an operator instance is deployed to a namespace which is being deleted at that moment. Do we create it before the test? If so, why is it still not available, is it due to API requests queueing?

tests/utils/k8s.go Outdated Show resolved Hide resolved
tests/utils/k8s.go Outdated Show resolved Hide resolved
@rpalaznik
Copy link
Contributor Author

Can you please clarify the fix? The error indicates that an operator instance is deployed to a namespace which is being deleted at that moment. Do we create it before the test?

No, but an instance was never created in this test run, becase InstallSparkOperator threw an error before that.

First, there is a clean up before installing the operator at this line
https://github.com/mesosphere/kudo-spark-operator/blob/master/tests/utils/spark_operator.go#L41

At the end it calls DropNamespace method which is a simple kubernetes API call
https://github.com/mesosphere/kudo-spark-operator/blob/master/tests/utils/k8s.go#L46

But when the API call returns, the namespace can still be in in being terminated state. Which causes re-creation of the namespace to fail at this line
https://github.com/mesosphere/kudo-spark-operator/blob/master/tests/utils/spark_operator.go#L47
(This is what this PR is aimed to prevent)

The error will be handled here https://github.com/mesosphere/kudo-spark-operator/blob/master/tests/basic_test.go#L62 and the error message will be listed at the end of the test
basic_test.go:62: object is being deleted: namespaces "kudo-spark-operator-testing" already exists

But t.Error() doesn't stop the execution of the test, so it continues and accumulates more errors. It could be a bit misleading and wastes time, so I've replaced it with t.Fatal() to stop the execution.

Hope this helps!

@akirillov
Copy link
Contributor

thanks for such a detailed explanation, @rpalaznik. won't we hit the same issue while creating namespaces? shall we add retries to CreateNamespace too?

@rpalaznik
Copy link
Contributor Author

won't we hit the same issue while creating namespaces? shall we add retries to CreateNamespace too?

I don't think creation would cause a similar issue. Never encountered it so far, and I don't think it's good idea to add timeouts to every API call defensively. We can always do this if the issue arise, but I'd prefer to leave it as is for now.

Copy link
Contributor

@akirillov akirillov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense, thanks. LGTM 👍

Copy link
Contributor

@samvantran samvantran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rpalaznik. Couple nits from me but 👍 on the fix

}

k8sNamespace, err := spark.Clients.CoreV1().Namespaces().Get(spark.Namespace, v1.GetOptions{})
if err != nil {
t.Error(err.Error())
t.Fatal(err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting an error during Get.NameSpace is considered fatal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to stop test execution and prevent logging a message about successful operator installation to be printed, since it's may be not true depending on the error.

return retry(namespaceDeletionTimeout, namespaceDeletionCheckInterval, func() error {
_, err := clientSet.CoreV1().Namespaces().Get(name, metav1.GetOptions{})
if err == nil {
return errors.New(fmt.Sprintf("Namespace %s is still there", name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.New(fmt.Sprintf("Namespace %s is still there", name))
return errors.New(fmt.Sprintf("Namespace '%s' still exists. Retrying delete of namespace...", name))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but I dropped 'Retrying ...' part since it's not actually do that, just waits for the namespace to disappear completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point.. Sorry my suggestion was incorrect then but you've already merged. It should be fine but maybe in a later fix update wording to match more correctly. Sorry about that!

} else if statusErr, ok := err.(*apiErrors.StatusError); !ok || statusErr.Status().Reason != metav1.StatusReasonNotFound {
return err
} else {
log.Info(fmt.Sprintf("Namespace %s is deleted", name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info(fmt.Sprintf("Namespace %s is deleted", name))
log.Info(fmt.Sprintf("Namespace '%s' successfully deleted", name))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rpalaznik rpalaznik merged commit a8f17ac into master Oct 23, 2019
@rpalaznik rpalaznik deleted the fix-namespace-deletion branch October 23, 2019 09:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants