Track rotated #182

rockb1017 · 2021-06-11T21:39:35Z

continuing #168
fixes #85

instead of closing all files at the end of poll cycle, keep them open till next poll cycle. Only after opening all matched files, consume files that kept open from previous poll and close. By having this "overlap" at each cycle, we make sure we have trace of moved away files and consume any logs that were written to it but not yet read before being rotated.

In addition, there were "Move/Create" rotation tests not being skipped for windows. I skipped those.

codecov · 2021-06-11T21:42:07Z

Codecov Report

Merging #182 (9e316c3) into main (44b6bf5) will increase coverage by 0.0%.
The diff coverage is 91.6%.

@@          Coverage Diff          @@
##            main    #182   +/-   ##
=====================================
  Coverage   75.6%   75.7%           
=====================================
  Files         95      95           
  Lines       4360    4371   +11     
=====================================
+ Hits        3299    3311   +12     
  Misses       738     738           
+ Partials     323     322    -1

Impacted Files	Coverage Δ
operator/builtin/input/file/config.go	`73.4% <0.0%> (ø)`
operator/builtin/input/file/file.go	`73.5% <100.0%> (+2.1%)`	⬆️
operator/builtin/input/file/reader.go	`65.8% <100.0%> (+0.7%)`	⬆️

rockb1017 · 2021-06-12T06:00:39Z

https://github.com/open-telemetry/opentelemetry-log-collection/pull/183/checks?check_run_id=2808466320
i tested that configuration on github actions. it failed on windows.

djaglowski · 2021-06-14T16:22:12Z

https://github.com/open-telemetry/opentelemetry-log-collection/pull/183/checks?check_run_id=2808466320
i tested that configuration on github actions. it failed on windows.

@rockb1017 What conclusion are you drawing from this?

rockb1017 · 2021-06-14T18:02:34Z

I naively deleted a blocked of code that seemed to be inefficient and unnecessary.
it was for "copy/trunc" cases and we are having flakiness on "copy/trunc" tests. I added those back and it passed.
Thanks for the discussion we had today. @djaglowski

Could you run a few more times just to be sure ?

rockb1017 · 2021-06-15T16:50:45Z

100% data ingestion rate.
(i am generating 100,000 + 3 messages per pod and i deployed 3 pod. so total 300,009

djaglowski

This looks good to me.

A few notes on this PR.

See explanation here of what was wrong with the previous attempt.
In addition to believing that we have a good handle on the issues involved, I've rerun unit tests about 10 times and have observed no failures.
@rockb1017 has incorporated this branch into a build of the collector, and run it on k8s as part of a benchmark. He observed 100% delivery rate, and has posted the results here.
I'm confident that this is an improvement upon the previous release because it verifiably addresses a clear problem that was identified and does not regress in any regard that is currently tested. However, it is possible that further edge cases may be identified in the future.
I am drafting a design doc that explains how this operator works. I'll share this and will welcome feedback, especially as it pertains to possible improvements.
The codebase for this operator could be improved through refactoring. This is out of scope for this PR, but will be an ongoing focus for this project.
Similarly, the test suite for this operator needs improvement. Some tests are currently non-deterministic. Some edge cases could be tested independently. This will also be an ongoing effort, but should not hold up this PR.

@tigrannajaryan, I'd appreciate any thoughts you have on these changes, or on the suggestion that we merge this and continue to improve from there.

tigrannajaryan · 2021-06-15T20:27:45Z

@rockb1017 can you please update the PR description, squash the commit into one and add a commit message that fully explains why the changes are made and how they fix the problem.
Thanks for spending time on this for your patience. @djaglowski thanks a lot for your help and for the summary in your approval.

…s open till next poll cycle. Fix test cases for new implementaion and skip move-create tests for windows

tigrannajaryan · 2021-06-16T14:11:23Z

Thanks @rockb1017 and @djaglowski !

…lection#182

* reorg test files only, ported from open-telemetry/opentelemetry-log-collection#165 * port otel enhanced file input benchmarking open-telemetry/opentelemetry-log-collection#166 * skip closing files that are already closed * port otel file rotation tracking open-telemetry/opentelemetry-log-collection#182 * fix poll()

rockb1017 requested a review from a team June 11, 2021 21:39

rockb1017 mentioned this pull request Jun 12, 2021

[DRAFT] extreme case #183

Closed

rockb1017 force-pushed the track_rotated branch from 06a71c3 to 66aba1a Compare June 14, 2021 20:31

djaglowski mentioned this pull request Jun 15, 2021

Revert "track_rotated" #180

Merged

djaglowski approved these changes Jun 15, 2021

View reviewed changes

track files that are rotated out of Include parameter by keeping file…

9e316c3

…s open till next poll cycle. Fix test cases for new implementaion and skip move-create tests for windows

rockb1017 force-pushed the track_rotated branch from 66aba1a to 9e316c3 Compare June 16, 2021 02:46

tigrannajaryan merged commit cdbb6d6 into open-telemetry:main Jun 16, 2021

rockb1017 deleted the track_rotated branch June 16, 2021 15:56

This was referenced Jun 21, 2021

Several rotation tests are failing #179

Closed

Flaky test - File Input #8

Closed

FileReader - Data loss when tailing symlink file with log rotation (k8s container logs) open-telemetry/opentelemetry-collector-contrib#2946

Closed

jsirianni added a commit to observIQ/stanza that referenced this pull request Jun 25, 2021

port otel file rotation tracking open-telemetry/opentelemetry-log-col…

114e5cb

…lection#182

jsirianni mentioned this pull request Jun 25, 2021

Track rotated observIQ/stanza#346

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track rotated #182

Track rotated #182

rockb1017 commented Jun 11, 2021 •

edited

Loading

codecov bot commented Jun 11, 2021 •

edited

Loading

rockb1017 commented Jun 12, 2021

djaglowski commented Jun 14, 2021

rockb1017 commented Jun 14, 2021

rockb1017 commented Jun 15, 2021

djaglowski left a comment

tigrannajaryan commented Jun 15, 2021

tigrannajaryan commented Jun 16, 2021

Track rotated #182

Track rotated #182

Conversation

rockb1017 commented Jun 11, 2021 • edited Loading

codecov bot commented Jun 11, 2021 • edited Loading

Codecov Report

rockb1017 commented Jun 12, 2021

djaglowski commented Jun 14, 2021

rockb1017 commented Jun 14, 2021

rockb1017 commented Jun 15, 2021

djaglowski left a comment

Choose a reason for hiding this comment

tigrannajaryan commented Jun 15, 2021

tigrannajaryan commented Jun 16, 2021

rockb1017 commented Jun 11, 2021 •

edited

Loading

codecov bot commented Jun 11, 2021 •

edited

Loading