-
Notifications
You must be signed in to change notification settings - Fork 42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how simple this implementation is. It's much improved from the past version.
There is at least one significant issue that we needs to be addressed, relating to the modified test.
I have the following additional requests:
- Will you please add a note to the documentation, which describes the interaction between
max_concurrent_files
and files that are rotated out of the matching pattern. - Will you please rebase onto
main
, so we can run the new benchmarks before/after this change, and validate that performance changes are acceptable.
operator/builtin/input/file/file.go
Outdated
// Skip checking itself | ||
continue | ||
} | ||
for j := i + 1; j < len(fps); j++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this part back because it might be causing the issue on WIndow based machine.
I added closing all readers step to |
hmm for windows, it fails |
@rockb1017 It doesn't look like changing the poll interval fixed anything. Rerunning the test causes failure again. |
hmm but it is a different test that failed. |
Did increasing poll interval made it fail or was it because test failed before it got to run |
I couldn't trace back and understand why it failed on windows. So i spun up a windows server on EC2 and ran test.
everything passes.
it passes without failing. |
0146cca
to
5ae78d9
Compare
at this point,I think i am just fixing the flaky test on windows machine because I don't have any failing cases on my windows.
|
Codecov Report
@@ Coverage Diff @@
## main #168 +/- ##
=======================================
+ Coverage 75.6% 75.8% +0.1%
=======================================
Files 95 95
Lines 4360 4363 +3
=======================================
+ Hits 3298 3308 +10
+ Misses 739 733 -6
+ Partials 323 322 -1
|
I think we should not look at this as a matter of flaky tests. These tests have no history of being flaky, to my knowledge. Tuning timeouts/intervals is sometimes necessary at a macro-level, but our tests should pass with timeouts/intervals that span a reasonable range. Instead, let's assume that these changes may have introduced an issue that is causing the problem. We see the following error on several failing windows tests: The nature of this PR is to change the way in which we are "using" files, so it seems likely that these changes are directly related. |
@@ -355,6 +355,42 @@ func TestMoveFile(t *testing.T) { | |||
expectNoMessages(t, logReceived) | |||
} | |||
|
|||
func TestTrackMovedAwayFiles(t *testing.T) { | |||
if runtime.GOOS == "windows" { | |||
t.Skip("Moving files while open is unsupported on Windows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think opening with with FILE_SHARE_DELETE flag should make this possible. See https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea?redirectedfrom=MSDN
yes failing test after rebase is a little different. so i ran it on my windows machine, and yay i can reproduce them. |
it passed. feel free to re run pipeline just to check. |
I tried running benchmarks on this and found a problem. Some clarification is needed.
The problem is that the benchmark never completes. It hangs, either at the end of The good news is that, for the 3 cases that do complete, we see almost identical performance. Nice job. On main:
|
#174 and #176 have been merged. 174 will take care of the noisy "problem closing reader" logs, but you'll have to determine whether a similar change makes sense on this branch. 176 adds benchmarks to CI, which should demonstrate the problem. I'm still planning to write a dedicated unit test for this, but for the sake of expediency, I wanted to provide at least one way for you to validate. |
i figured out why. for MaxConcurrent, it was using 1.
on main
|
Simple enough. However, we need to either support |
I think user putting 1 for max concurrent file is not very common use case. I will produce error for config validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your persistence @rockb1017!
LGTM
This reverts commit c46512e.
@djaglowski @rockb1017 judging from how much effort you needed to put into this it was not a trivial problem. Can we document how this works now? I do not feel that the code is sufficient for understanding. Even a retroactive design document with a review in Log SIG would not be bad to do (but maybe that's overkill and it is not that complicated, you tell me). |
Meta note: for the future let's make sure we have proper PR title, commit title and commit message. |
@tigrannajaryan, I'll document how it currently works and how this change would work. The general design is not particularly complicated but the code could certainly be made more readable. Some small steps have been taken toward this end. More will be needed. The primary challenge with this overall effort has been in establishing confidence in the edge cases, particularly where file rotation is involved. Several tests are probabilistic because they rely on the timing of system calls. We need to establish that these tests can be made deterministic without sidestepping some critical feature of the edge case. I think it will be most helpful to articulate a couple specific test scenarios and attempt to establish how they can be deterministically tested. I can spend some more time on this effort next week, and will share particulars with you. Perhaps we can together establish the best path forward from there. |
@djaglowski sounds good, as soon as we are confident that the feature is rock solid and it is documented well enough for the future generations we are good. |
fixes #85
implementing #85 (comment)
in addition