Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve unwind info persisting failure handling
In some production machines, persisting the unwind info fails. We are currently investigating this and so far we don't know what the culprit is. Typically after some attempts, persisting eventually succeeds. On those hosts we get pretty much 100% unwind errors, which should not happen. This leads me to notice that errors persisting the unwind info aren't handled properly. For example, once a shard is full, the current code ignores this wipes the in-memory shard and assigns a new BPF shard. This is not correct. Test Plan ========= Forced some errors in this logic and the current in-memory state wasn't wiped. We need failure injection during testing to ensure all these cases are covered and don't regress.
- Loading branch information