Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if file has disappeared sooner in Log reader #13907

Merged

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Oct 3, 2019

Resolves #13889.

This PR teach the Log reader to check file stat related changes, e.g. a file being moved or renamed, more aggressively than before. As a result a slow or flaky output can no longer hold up these checks from being performed.

@ycombinator ycombinator requested a review from urso October 3, 2019 20:29
@ycombinator ycombinator added bug Filebeat Filebeat review v6.8.4 v7.4.1 v7.5.0 v8.0.0 needs_backport PR is waiting to be backported to other branches. labels Oct 3, 2019
@urso urso self-assigned this Oct 4, 2019
return nil
}

func (f *Log) checkFileErrors() error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately github doesn't allow me to comment in code.
With this change every single read is preceded by an extra syscall, to check the file state. Maybe we can reduce the check to be only run if CloseRemoved or CloseRenamed is set.

The CloseInactive check MUST NOT be executed here. This function is run before the next line is read. If the harvester was blocked by the outputs for some amount of time, we might trigger an ErrInactive here, although the file was still written to. The CloseInactive check is supposed to trigger only after we did reach EOF.

The truncate case is somewhat tricky. I think checking truncate after EOF should be ok, as you should get a return of 0bytes read + EOF if you read on a truncated file. The check (and any synchronous check alike) is always subject to false-negatives, if the output was blocked for such a long time that the file was allowed to grow past offset.

By moving truncate and CloseInactive checks back to errorCheck, we don't need to execute the stat syscall for everyone line we want to read if CloseRemoved or CloseRenamed are not set.

@@ -111,16 +116,22 @@ func (f *Log) errorChecks(err error) error {
return err
}

// At this point we have hit EOF!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding this detail to the godoc: errorChecks determines the cause for EOF errors, and how the EOF event should be handled based on the config options.

Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on checkFileErrors

@urso
Copy link

urso commented Oct 4, 2019

Potentially related flaky tests:
#7690
#10606
#12037

@ycombinator ycombinator force-pushed the close_removed+close_renamed_before_eof branch from 92bf942 to 5b4505e Compare October 4, 2019 13:13
@ycombinator
Copy link
Contributor Author

@urso Thanks for the review. I've addressed your feedback and this PR is ready for another round of review, when you get a chance.

@ycombinator ycombinator changed the title Check file stat related changes sooner in Log reader Check if file has disappeared sooner in Log reader Oct 4, 2019
@ycombinator ycombinator force-pushed the close_removed+close_renamed_before_eof branch from 44921d6 to 1fefbe4 Compare October 7, 2019 15:53
@ycombinator ycombinator force-pushed the close_removed+close_renamed_before_eof branch from 1fefbe4 to 71804f0 Compare October 7, 2019 18:10
@ycombinator
Copy link
Contributor Author

Failing CI job is unrelated to this PR. Merging.

@ycombinator ycombinator merged commit a37e8be into elastic:master Oct 7, 2019
@ycombinator ycombinator removed the needs_backport PR is waiting to be backported to other branches. label Oct 7, 2019
ycombinator added a commit that referenced this pull request Oct 8, 2019
* Adding comment about EOF and removing redundant logic

* Adding debug logging when close_remove and close_renamed situations are reached

* Check file stats related errors before adding to buffer

* Reordering checks to be same as before

* Add CHANGELOG entry

* Fixing issue # in CHANGELOG entry

* Move CloseInactive and truncate checks back to being after EOF check

* Don't perform stat call for CloseRemoved/CloseRenamed unless those settings are enabled

* Better comments / function godoc

* Renaming method to be more descriptive

* Fixing up CHANGELOG
ycombinator added a commit that referenced this pull request Oct 11, 2019
…3960)

* Check if file has disappeared sooner in Log reader (#13907)

* Adding comment about EOF and removing redundant logic

* Adding debug logging when close_remove and close_renamed situations are reached

* Check file stats related errors before adding to buffer

* Reordering checks to be same as before

* Add CHANGELOG entry

* Fixing issue # in CHANGELOG entry

* Move CloseInactive and truncate checks back to being after EOF check

* Don't perform stat call for CloseRemoved/CloseRenamed unless those settings are enabled

* Better comments / function godoc

* Renaming method to be more descriptive

* Fixing up CHANGELOG

* Fix CHANGELOG

* Fixed compile error

* [WIP] Trying to check if file is removed

* Adding Removed() to Source interface

* Initialize procGetFileInformationByHandleEx

* Adding missed import
@ycombinator ycombinator deleted the close_removed+close_renamed_before_eof branch December 25, 2019 11:08
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…lastic#13959)

* Adding comment about EOF and removing redundant logic

* Adding debug logging when close_remove and close_renamed situations are reached

* Check file stats related errors before adding to buffer

* Reordering checks to be same as before

* Add CHANGELOG entry

* Fixing issue # in CHANGELOG entry

* Move CloseInactive and truncate checks back to being after EOF check

* Don't perform stat call for CloseRemoved/CloseRenamed unless those settings are enabled

* Better comments / function godoc

* Renaming method to be more descriptive

* Fixing up CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

close_removed and close_renamed only checked after EOF
2 participants