Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve unwind info persisting failure handling #23

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

javierhonduco
Copy link
Owner

In some production machines, persisting the unwind info fails. We are currently investigating this and so far we don't know what the culprit is.

On those hosts we get pretty much 100% unwind errors, which should not happen. This leads me to notice that errors persisting the unwind info aren't handled properly.

For example, once a shard is full, the current code ignores this wipes the in-memory shard and assigns a new BPF shard. This is not correct.

Test Plan

Forced some errors in this logic and the current in-memory state wasn't wiped. We need failure injection during testing to ensure all these cases are covered and don't regress.

cc @gmarler

@javierhonduco javierhonduco force-pushed the improve-persisting-failures-handling branch from b978e90 to fdebff4 Compare April 23, 2024 11:50
In some production machines, persisting the unwind info fails.
We are currently investigating this and so far we don't know
what the culprit is. Typically after some attempts, persisting
eventually succeeds.

On those hosts we get pretty much 100% unwind errors, which
should not happen. This leads me to notice that errors persisting
the unwind info aren't handled properly.

For example, once a shard is full, the current code ignores this
wipes the in-memory shard and assigns a new BPF shard. This is not correct.

Test Plan
=========

Forced some errors in this logic and the current in-memory state
wasn't wiped. We need failure injection during testing to ensure
all these cases are covered and don't regress.
@javierhonduco javierhonduco force-pushed the improve-persisting-failures-handling branch from fdebff4 to 67e97db Compare April 23, 2024 13:23
@javierhonduco javierhonduco merged commit ab2dd8d into main Apr 23, 2024
4 checks passed
@javierhonduco javierhonduco deleted the improve-persisting-failures-handling branch April 23, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant