-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(k8s): reenable deep copies when interacting with k8s resources #10561
Merged
michaelbeaumont
merged 1 commit into
kumahq:master
from
michaelbeaumont:fix/k8s-deep-copies
Jul 3, 2024
Merged
fix(k8s): reenable deep copies when interacting with k8s resources #10561
michaelbeaumont
merged 1 commit into
kumahq:master
from
michaelbeaumont:fix/k8s-deep-copies
Jul 3, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
michaelbeaumont
requested review from
jijiechen and
bartsmykla
and removed request for
a team
June 19, 2024 12:17
michaelbeaumont
added
the
ci/run-full-matrix
PR: Runs all possible e2e test combination (expensive use carefully)
label
Jun 19, 2024
michaelbeaumont
force-pushed
the
fix/k8s-deep-copies
branch
from
June 19, 2024 12:17
8373f55
to
3ccbe9c
Compare
bartsmykla
approved these changes
Jun 20, 2024
jijiechen
approved these changes
Jul 1, 2024
Signed-off-by: Mike Beaumont <[email protected]>
michaelbeaumont
force-pushed
the
fix/k8s-deep-copies
branch
from
July 1, 2024 08:10
3ccbe9c
to
3e35106
Compare
jakubdyszkiewicz
approved these changes
Jul 3, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge it and check the result of mesh-perf
jijiechen
changed the title
fix(k8s): reenable deep copies
fix(k8s): reenable deep copies when interacting with k8s resources
Oct 11, 2024
lukidzi
added a commit
that referenced
this pull request
Dec 19, 2024
## Motivation I was investigating a test fail on release-2.8 and noticed that we had a panic https://github.com/kumahq/kuma/actions/runs/12401863260/job/34624092612 ``` fatal error: concurrent map read and map write goroutine 1264 [running]: github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.getWithDefault(0x4004194e70, {0x247fb40?, 0x505e860?}, 0x4001a8d7c8, {0x4001a8d848?, 0x3272270?, 0x4001a8d7f8?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:347 +0x94 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetBooleanWithDefault(0x4001a8d858?, 0x0, 0x3c?, {0x4001a8d848?, 0x2203be4?, 0x25c85e0?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:231 +0x64 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetEnabledWithDefault(...) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:227 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetEnabled(0x1?, {0x4001a8d848?, 0x4000cc6d88?, 0x40008b1590?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:219 +0x34 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).findMatchingServices.Ignored.func1(0x25deb80?) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util/util.go:56 +0x4c github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util.MatchService(...) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util/util.go:63 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).findMatchingServices(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, 0x4000cc6d88) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:264 +0x5b4 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).reconcileDataplane(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, 0x4000cc6d88, {{0x32da180?, 0x4003e92ae0?}, 0x0?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:132 +0x2a0 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).Reconcile(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, {{{0x40026b05a0, 0x11}, {0x4002644ba0, 0x22}}}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:101 +0x41c sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x32da180?, {0x32d27f0?, 0x4003e92ab0?}, {{{0x40026b05a0?, 0xb?}, {0x4002644ba0?, 0x0?}}}) sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x8c ``` It's not an issue in > 2.8 since this change #10561 ## Implementation information While investigating a stack trace, I noticed that another goroutine was modifying a service object that we were also working with. I realized that we were not cloning the map before making edits to it. After reviewing the Kubernetes controller code, I identified the areas where these changes occur. I updated a few places to clone the map first before adding labels or making modifications. ``` goroutine 867 [runnable]: github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*ServiceReconciler).Reconcile(0x40007a8cf0, {0x32d27f0, 0x4004043320}, {{{0x40027c2c48, 0x11}, {0x4003e3e447, 0x7}}}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/service_controller.go:83 +0x7f4 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x32da180?, {0x32d27f0?, 0x4004043320?}, {{{0x40027c2c48?, 0xb?}, {0x4003e3e447?, 0x0?}}}) sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x8c sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x4000651220, {0x32d2828, 0x40009d0370}, {0x27baca0, 0x40012fb460}) ``` ## Supporting documentation <!-- Is there a MADR? An Issue? A related PR? --> Fix #XX <!-- > Changelog: skip --> <!-- Uncomment the above section to explicitly set a [`> Changelog:` entry here](https://github.com/kumahq/kuma/blob/master/CONTRIBUTING.md#submitting-a-patch)? --> Signed-off-by: Lukasz Dziedziak <[email protected]>
kumahq bot
pushed a commit
that referenced
this pull request
Dec 19, 2024
## Motivation I was investigating a test fail on release-2.8 and noticed that we had a panic https://github.com/kumahq/kuma/actions/runs/12401863260/job/34624092612 ``` fatal error: concurrent map read and map write goroutine 1264 [running]: github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.getWithDefault(0x4004194e70, {0x247fb40?, 0x505e860?}, 0x4001a8d7c8, {0x4001a8d848?, 0x3272270?, 0x4001a8d7f8?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:347 +0x94 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetBooleanWithDefault(0x4001a8d858?, 0x0, 0x3c?, {0x4001a8d848?, 0x2203be4?, 0x25c85e0?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:231 +0x64 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetEnabledWithDefault(...) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:227 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetEnabled(0x1?, {0x4001a8d848?, 0x4000cc6d88?, 0x40008b1590?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:219 +0x34 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).findMatchingServices.Ignored.func1(0x25deb80?) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util/util.go:56 +0x4c github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util.MatchService(...) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util/util.go:63 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).findMatchingServices(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, 0x4000cc6d88) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:264 +0x5b4 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).reconcileDataplane(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, 0x4000cc6d88, {{0x32da180?, 0x4003e92ae0?}, 0x0?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:132 +0x2a0 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).Reconcile(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, {{{0x40026b05a0, 0x11}, {0x4002644ba0, 0x22}}}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:101 +0x41c sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x32da180?, {0x32d27f0?, 0x4003e92ab0?}, {{{0x40026b05a0?, 0xb?}, {0x4002644ba0?, 0x0?}}}) sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x8c ``` It's not an issue in > 2.8 since this change #10561 ## Implementation information While investigating a stack trace, I noticed that another goroutine was modifying a service object that we were also working with. I realized that we were not cloning the map before making edits to it. After reviewing the Kubernetes controller code, I identified the areas where these changes occur. I updated a few places to clone the map first before adding labels or making modifications. ``` goroutine 867 [runnable]: github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*ServiceReconciler).Reconcile(0x40007a8cf0, {0x32d27f0, 0x4004043320}, {{{0x40027c2c48, 0x11}, {0x4003e3e447, 0x7}}}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/service_controller.go:83 +0x7f4 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x32da180?, {0x32d27f0?, 0x4004043320?}, {{{0x40027c2c48?, 0xb?}, {0x4003e3e447?, 0x0?}}}) sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x8c sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x4000651220, {0x32d2828, 0x40009d0370}, {0x27baca0, 0x40012fb460}) ``` ## Supporting documentation <!-- Is there a MADR? An Issue? A related PR? --> Fix #XX <!-- > Changelog: skip --> <!-- Uncomment the above section to explicitly set a [`> Changelog:` entry here](https://github.com/kumahq/kuma/blob/master/CONTRIBUTING.md#submitting-a-patch)? --> Signed-off-by: Lukasz Dziedziak <[email protected]>
kumahq bot
pushed a commit
that referenced
this pull request
Dec 19, 2024
## Motivation I was investigating a test fail on release-2.8 and noticed that we had a panic https://github.com/kumahq/kuma/actions/runs/12401863260/job/34624092612 ``` fatal error: concurrent map read and map write goroutine 1264 [running]: github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.getWithDefault(0x4004194e70, {0x247fb40?, 0x505e860?}, 0x4001a8d7c8, {0x4001a8d848?, 0x3272270?, 0x4001a8d7f8?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:347 +0x94 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetBooleanWithDefault(0x4001a8d858?, 0x0, 0x3c?, {0x4001a8d848?, 0x2203be4?, 0x25c85e0?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:231 +0x64 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetEnabledWithDefault(...) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:227 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata.Annotations.GetEnabled(0x1?, {0x4001a8d848?, 0x4000cc6d88?, 0x40008b1590?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/metadata/annotations.go:219 +0x34 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).findMatchingServices.Ignored.func1(0x25deb80?) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util/util.go:56 +0x4c github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util.MatchService(...) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/util/util.go:63 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).findMatchingServices(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, 0x4000cc6d88) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:264 +0x5b4 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).reconcileDataplane(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, 0x4000cc6d88, {{0x32da180?, 0x4003e92ae0?}, 0x0?}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:132 +0x2a0 github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*PodReconciler).Reconcile(0x400069ab40, {0x32d27f0, 0x4003e92ab0}, {{{0x40026b05a0, 0x11}, {0x4002644ba0, 0x22}}}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/pod_controller.go:101 +0x41c sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x32da180?, {0x32d27f0?, 0x4003e92ab0?}, {{{0x40026b05a0?, 0xb?}, {0x4002644ba0?, 0x0?}}}) sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x8c ``` It's not an issue in > 2.8 since this change #10561 ## Implementation information While investigating a stack trace, I noticed that another goroutine was modifying a service object that we were also working with. I realized that we were not cloning the map before making edits to it. After reviewing the Kubernetes controller code, I identified the areas where these changes occur. I updated a few places to clone the map first before adding labels or making modifications. ``` goroutine 867 [runnable]: github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers.(*ServiceReconciler).Reconcile(0x40007a8cf0, {0x32d27f0, 0x4004043320}, {{{0x40027c2c48, 0x11}, {0x4003e3e447, 0x7}}}) github.com/kumahq/kuma/pkg/plugins/runtime/k8s/controllers/service_controller.go:83 +0x7f4 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x32da180?, {0x32d27f0?, 0x4004043320?}, {{{0x40027c2c48?, 0xb?}, {0x4003e3e447?, 0x0?}}}) sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0x8c sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x4000651220, {0x32d2828, 0x40009d0370}, {0x27baca0, 0x40012fb460}) ``` ## Supporting documentation <!-- Is there a MADR? An Issue? A related PR? --> Fix #XX <!-- > Changelog: skip --> <!-- Uncomment the above section to explicitly set a [`> Changelog:` entry here](https://github.com/kumahq/kuma/blob/master/CONTRIBUTING.md#submitting-a-patch)? --> Signed-off-by: Lukasz Dziedziak <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #8160
This was enabled in some form in #1113 and then in its current form in #6809
This is of course more correct but the performance implications are not 100% clear.
Checklist prior to review
syscall.Mkfifo
have equivalent implementation on the other OS --ci/
labels to run additional/fewer testsUPGRADE.md
? --