Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kubernetes_logs source): Start reading logs at checkpoint #4043

Merged
merged 2 commits into from
Sep 24, 2020

Conversation

MOZGIII
Copy link
Contributor

@MOZGIII MOZGIII commented Sep 21, 2020

Closes #4042.

Also adds extensive comments to capture the motivation behind the decisions on the FileServer configuration.

@MOZGIII MOZGIII added the ci-condition: k8s e2e all targets Run Kubernetes E2E test suite for all targets (instead of just the essential subset) label Sep 21, 2020
@binarylogic
Copy link
Contributor

We should add this to list of gotchas with the file source in #3480.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Sep 21, 2020

It operates as expected in my testing so far.
If anyone is interested in trying it, I've pushed an image to docker hub at mozgiii/vector-k8s-wip:checkpoints-v1

@lukesteensen
Copy link
Member

We should add this to list of gotchas with the file source in #3480.

Could you expand on this? This is the start_at_beginning option working as designed and documented.

start_at_beginning: true,
// We want to use checkpoining mechanism, and resume from where we
// left off.
start_at_beginning: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not relying on the Default implementation here? That would have avoided this bug in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileServer does not implement Default.
I know you've asked this before, and I was confused by this question last time, so might have forgot to address it, sorry.

I might still be confused, in that case could you please elaborate?

Copy link
Contributor

@Hoverbear Hoverbear Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, FileServer does not have a safe constructor with defaults.

I think adding such would be a separate ticket, which I feel should be done, since right now it's very difficult to use the FileServer type in a 'default' way.

Copy link
Contributor Author

@MOZGIII MOZGIII Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding such would be a separate ticket, which I feel should be done, since right now it's very difficult to use the FileServer type in a 'default' way.

I'm not sure if default for the whole thing makes sense. But we could extract a subset of parameters into a Default-constructible object and pass that.

However, even then, for something like a kubernetes_logs source, I'd assign most of the settings manually. For instance, this one should probably be set in the source level rather than being defined at the Default that may change independently of the source (and potentially break the compatibility).

Some of the options I'd definitely offload to Default, like max_read_bytes.

We also need a proper test for this behaviour at the E2E, I'll create an issue (UPD: #4047).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been clearer. We have encoded a reasonable set of default values into the Default implementation for FileConfig. It would take some tweaking, but we should really try to rely on those defaults wherever the source is reused.

Copy link
Contributor Author

@MOZGIII MOZGIII Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense! FileConfig isn't exactly a good fit for the use in kubernetes_logs source, so we probably need to add a layer of indirection.
I'll create an issue for it (UPD: #4048).

@binarylogic
Copy link
Contributor

Could you expand on this? This is the start_at_beginning option working as designed and documented.

This PR and past experiences (#1020) have alluded that it's confusing to users. I'm curious what @MOZGIII thought the option did.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Sep 21, 2020

I'm curious what @MOZGIII thought the option did.

I didn't properly address this at all, just copied from the previous implementation, my bad: #2653 (comment)

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Sep 21, 2020

I don't know why I'd think of the start_at_beginning anything else but what it does. That said, I still got it wrong, even though I audited it.
Hmm, should we consider renaming it to ignore_checkpoints or something similar to expose the meaning of it better?

UPD: according to the table at #1020, the ignore_checkpoints name wouldn't properly capture the behaviour in one of the cases:

Old Startup Checkpoint start_at_beginning Result
N N Y Y Checkpoint

This is a quite odd outliner, but the logic seems to be right nonetheless for this case. It may only happen if the file disappears and then reappears during the regular operation, with the same contents. Not using a checkpoint would mean that we'd start processing the file from scratch, that definitely has been read and checkpointed before.

Copy link
Contributor

@Hoverbear Hoverbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! Glad we caught this.

@ktff
Copy link
Contributor

ktff commented Sep 22, 2020

I should have documented this. So I remember thinking about this option for some time, but don't quite remember what was bugging me, it's quite possible that it was a misinterpretation of the option on my part. I think I expected that with a false value file source wouldn't capture logs found before the startup, so to avoid having a condition of starting Vector before other pods true was used.

@MOZGIII MOZGIII merged commit b995485 into master Sep 24, 2020
@MOZGIII MOZGIII deleted the k8s-logs-start-at-checkpoint branch September 24, 2020 02:41
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
…dotdev#4043)

* Turn start_at_beginning off at kubernetes_logs source

Signed-off-by: MOZGIII <[email protected]>

* Add the extensive comments to motivate the decisions on the FileServer confguration

Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: Brian Menges <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-condition: k8s e2e all targets Run Kubernetes E2E test suite for all targets (instead of just the essential subset)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubernetes_logs source should have start_at_beginning set to false internally
5 participants