Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for PDBs on deployment and statefulset #2141

Merged
merged 8 commits into from
Oct 6, 2023

Conversation

JorTurFer
Copy link
Contributor

Fixes: #2136

@JorTurFer JorTurFer marked this pull request as ready for review September 21, 2023 11:17
@JorTurFer JorTurFer requested a review from a team September 21, 2023 11:17
@JorTurFer
Copy link
Contributor Author

JorTurFer commented Sep 21, 2023

Sorry, I missed the bundle during the first try. I have updated the PR

@JorTurFer
Copy link
Contributor Author

awesome, a missing change in the stage 🤦

Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good! Thank you for updating the logic for version detection as well to be more generic. Could you also update one of the e2e to set the PDB and ensure it gets reconciled correctly and the selector works as expected?

// for the OpenTelemetryCollector workload.
//
// +optional
PodDisruptionBudget *PodDisruptionBudgetSpec `json:"podDisruptionBudget,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for other reviewers, this is a paired down version of the main PDB spec because we don't want to allow the configuration of the selector

internal/manifests/collector/poddisruptionbudget.go Outdated Show resolved Hide resolved
@JorTurFer
Copy link
Contributor Author

Thanks for the quick review, tomorrow I'll address al the comments 😄

@JorTurFer
Copy link
Contributor Author

I have addressed all the comments. I'm not 100% if I have done the e2e test correctly because it's the 1st time I have used kuttl but at least it works in local 🤣

@JorTurFer
Copy link
Contributor Author

It looks like I have to review the e2e test to work on multiple cluster versions. Let me do it before triggering again the CI because the check will fail again. I'm on it

@JorTurFer
Copy link
Contributor Author

JorTurFer commented Sep 22, 2023

@pavolloffay ,
What should I do? I mean, should I wait until #2145 is decided? Will it require too much time? I'm not in rush with the pdb, during next month is enough for me (the natural month, not just September xD), but if the discussion goes longer I'd prefer to merge it with support to both if It's possible

@pavolloffay
Copy link
Member

What should I do?

Let's see how quickly goes the conversation in #2145 I am happy to open a PR next week to bump the version.

@JorTurFer
Copy link
Contributor Author

JorTurFer commented Sep 28, 2023

I have just rebased main branch and updated this PR. PTAL when you have some time @pavolloffay @jaronoff97

Signed-off-by: Jorge Turrado <[email protected]>
Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to add something in the validating webhook to be sure that we don't attempt a bad config

apis/v1alpha1/opentelemetrycollector_types.go Show resolved Hide resolved
@pavolloffay
Copy link
Member

End-to-end tests / End-to-end tests (1.23, e2e-pdb) (pull_request) - failed

@JorTurFer
Copy link
Contributor Author

I'm spinning up a cluster with that version to check it

Signed-off-by: Jorge Turrado <[email protected]>
@JorTurFer
Copy link
Contributor Author

@pavolloffay , I've tested it locally with k8s 1.23 and (at least locally) the feature works:
image

The problem I have seen is that the deletion is producing timeouts

Signed-off-by: Jorge Turrado <[email protected]>
@JorTurFer
Copy link
Contributor Author

JorTurFer commented Sep 29, 2023

I've noticed that if I use my own namespace for the tests, the k8s GC removes the namespaces more quickly and I can also validate the generated labels, so I've updated my test to use it's own namespace, and now 30 seconds of timeout is more than enough (the test needs a few seconds indeed, so far from 30 seconds, at least locally).

I say this because I see other e2e tests with > 150 seconds of timeouts and maybe the root cause it's something like this. The cause of failures was that, timeout deleting the namespace :(
I can see pdb events and a timeout at the end

@pavolloffay
Copy link
Member

The PR looks good to me. But I am also interested in @jaronoff97 question whether we could do some validation of the PDB fields that we added to our CR.

@JorTurFer
Copy link
Contributor Author

Which kind of validation? I mean, I added a check in webhooks to validate if both have been set, and the other validations are already done by the openapi spec (but I can be missing any important thing at some point)

@jaronoff97 jaronoff97 requested a review from pavolloffay October 2, 2023 19:38
@jaronoff97 jaronoff97 merged commit 7a48b56 into open-telemetry:main Oct 6, 2023
@JorTurFer JorTurFer deleted the add-pdb branch October 6, 2023 14:00
@jaronoff97
Copy link
Contributor

@JorTurFer thank you so much for your contribution 🎊🎉

@JorTurFer
Copy link
Contributor Author

You're welcome!
Thank you for accepting it, I just did it because I need it 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for PodDisruptionBudgets
3 participants