Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sysdump: don't specify --follow while collecting hubble flows #2240

Merged
merged 1 commit into from
Jan 22, 2024

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Jan 18, 2024

Currently, hubble flows are retrieved during sysdump collection passing the --follow parameter to hubble observe. According to the comment, this appeared to be a necessary hack to prevent the "requested data has been overwritten and is no longer available" error. Yet, the consequence is that the hubble observe command becomes blocking, and we relying on the specified timeout only for its termination. When capturing a sysdump, though, we are interested in storing (as many as possible) flows prior to that moment (e.g., to investigate the causes of a connectivity test failure), not the ones occurring during the collection of the sysdump itself.

Given that the original reason for using the --follow parameter got fixed quite some time ago [1] and the fix is included in any Cilium versions supported today, let's just get rid of it. The side effects include the early termination of the collection process as soon as all the flows have been retrieved, as well as the reduction of the size of the sysdumps when increasing the timeout period, given that we do no longer block until its expiration (this is relevant especially in CI tests, as they are currently too large to be uploaded on GH). Nonetheless, the timeout parameter is preserved to interrupt the retrieval if taking too long.

/cc @michi-covalent as the original author of the hubble flows sysdump collector and of the fix.

[1]: cilium/cilium#17046

Currently, hubble flows are retrieved during sysdump collection passing
the `--follow` parameter to hubble observe. According to the comment,
this appeared to be a necessary hack to prevent the "requested data has
been overwritten and is no longer available" error. Yet, the consequence
is that the hubble observe command becomes blocking, and we relying on the
specified timeout only for its termination. When capturing a sysdump,
though, we are interested in storing (as many as possible) flows prior to
that moment (e.g., to investigate the causes of a connectivity test failure),
not the ones occurring during the collection of the sysdump itself.

Given that the original reason for using the `--follow` parameter got
fixed quite some time ago [1] and the fix is included in any Cilium
versions supported today, let's just get rid of it. The side effects
include the early termination of the collection process as soon as
all the flows have been retrieved, as well as the reduction of the
size of the sysdumps when increasing the timeout period, given that
we do no longer block until its expiration (this is relevant especially
in CI tests, as they are currently too large to be uploaded on GH).
Nonetheless, the timeout parameter is preserved to interrupt the
retrieval if taking too long.

[1]: cilium/cilium#17046

Signed-off-by: Marco Iorio <[email protected]>
Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! 🥰

@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 19, 2024
@aditighag aditighag merged commit 5cfb677 into cilium:main Jan 22, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants