Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

track_rotated #168

Merged
merged 4 commits into from
Jun 10, 2021
Merged

track_rotated #168

merged 4 commits into from
Jun 10, 2021

Conversation

rockb1017
Copy link
Contributor

fixes #85
implementing #85 (comment)

in addition

  • i removed unnecessary fingerprint comparison for copytrunc cases because it is repetitive.
  • I modified file batching test case (2nd part of test) because with this new approach, the way it handles already-discovered-recently-appended log files differently.

@rockb1017 rockb1017 requested a review from a team June 2, 2021 03:34
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how simple this implementation is. It's much improved from the past version.

There is at least one significant issue that we needs to be addressed, relating to the modified test.

I have the following additional requests:

  1. Will you please add a note to the documentation, which describes the interaction between max_concurrent_files and files that are rotated out of the matching pattern.
  2. Will you please rebase onto main, so we can run the new benchmarks before/after this change, and validate that performance changes are acceptable.

operator/builtin/input/file/file.go Outdated Show resolved Hide resolved
operator/builtin/input/file/file_test.go Outdated Show resolved Hide resolved
// Skip checking itself
continue
}
for j := i + 1; j < len(fps); j++ {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this part back because it might be causing the issue on WIndow based machine.

@rockb1017
Copy link
Contributor Author

I added closing all readers step to Stop() function. and removed fingerprint comparison again because it is not needed.

@rockb1017
Copy link
Contributor Author

hmm for windows, it fails TestMultiFileParallel_LiveFiles test. it consumed file1 and file4 twice.
I can't reproduce it. any suggestion?
Could you rerun just in case?

@djaglowski
Copy link
Member

@rockb1017 It doesn't look like changing the poll interval fixed anything. Rerunning the test causes failure again.

@rockb1017
Copy link
Contributor Author

hmm but it is a different test that failed. TestRotation/NoDeletion/MoveCreateSequential

@rockb1017
Copy link
Contributor Author

Did increasing poll interval made it fail or was it because test failed before it got to run TestRotation/NoDeletion/MoveCreateSequential test?

@rockb1017
Copy link
Contributor Author

I couldn't trace back and understand why it failed on windows. So i spun up a windows server on EC2 and ran test.

Administrator@EC2AMAZ-K6DFUOR MINGW64 ~/opentelemetry-log-collection/operator/builtin/input/file (track_rotated)
$ go test -v
=== RUN   TestUnmarshal
=== RUN   TestUnmarshal/default
=== RUN   TestUnmarshal/extra_field
=== RUN   TestUnmarshal/id_custom
=== RUN   TestUnmarshal/include_one
=== RUN   TestUnmarshal/include_multi
=== RUN   TestUnmarshal/include_glob
=== RUN   TestUnmarshal/include_glob_double_asterisk
=== RUN   TestUnmarshal/include_glob_double_asterisk_nested
=== RUN   TestUnmarshal/include_glob_double_asterisk_prefix
=== RUN   TestUnmarshal/include_inline
=== RUN   TestUnmarshal/include_invalid
=== RUN   TestUnmarshal/exclude_one
=== RUN   TestUnmarshal/exclude_multi
=== RUN   TestUnmarshal/exclude_glob
=== RUN   TestUnmarshal/exclude_glob_double_asterisk
=== RUN   TestUnmarshal/exclude_glob_double_asterisk_nested
=== RUN   TestUnmarshal/exclude_glob_double_asterisk_prefix
=== RUN   TestUnmarshal/exclude_inline
=== RUN   TestUnmarshal/exclude_invalid
=== RUN   TestUnmarshal/poll_interval_no_units
=== RUN   TestUnmarshal/poll_interval_1s
=== RUN   TestUnmarshal/poll_interval_1ms
=== RUN   TestUnmarshal/poll_interval_1000ms
=== RUN   TestUnmarshal/fingerprint_size_no_units
=== RUN   TestUnmarshal/fingerprint_size_1kb_lower
=== RUN   TestUnmarshal/fingerprint_size_1KB
=== RUN   TestUnmarshal/fingerprint_size_1kib_lower
=== RUN   TestUnmarshal/fingerprint_size_1KiB
=== RUN   TestUnmarshal/fingerprint_size_float
=== RUN   TestUnmarshal/include_file_name_lower
=== RUN   TestUnmarshal/include_file_name_upper
=== RUN   TestUnmarshal/include_file_name_on
=== RUN   TestUnmarshal/include_file_name_yes
=== RUN   TestUnmarshal/include_file_path_lower
=== RUN   TestUnmarshal/include_file_path_upper
=== RUN   TestUnmarshal/include_file_path_on
=== RUN   TestUnmarshal/include_file_path_yes
=== RUN   TestUnmarshal/include_file_path_off
=== RUN   TestUnmarshal/include_file_path_no
=== RUN   TestUnmarshal/include_file_path_nonbool
=== RUN   TestUnmarshal/multiline_line_start_string
=== RUN   TestUnmarshal/multiline_line_start_special
=== RUN   TestUnmarshal/multiline_line_end_string
=== RUN   TestUnmarshal/multiline_line_end_special
=== RUN   TestUnmarshal/multiline_random
=== RUN   TestUnmarshal/start_at_string
=== RUN   TestUnmarshal/max_concurrent_large
=== RUN   TestUnmarshal/max_log_size_mib_lower
=== RUN   TestUnmarshal/max_log_size_mib_upper
=== RUN   TestUnmarshal/max_log_size_mb_upper
=== RUN   TestUnmarshal/max_log_size_mb_lower
=== RUN   TestUnmarshal/encoding_lower
=== RUN   TestUnmarshal/encoding_upper
--- PASS: TestUnmarshal (0.02s)
    --- PASS: TestUnmarshal/default (0.00s)
    --- PASS: TestUnmarshal/extra_field (0.00s)
    --- PASS: TestUnmarshal/id_custom (0.00s)
    --- PASS: TestUnmarshal/include_one (0.00s)
    --- PASS: TestUnmarshal/include_multi (0.00s)
    --- PASS: TestUnmarshal/include_glob (0.00s)
    --- PASS: TestUnmarshal/include_glob_double_asterisk (0.00s)
    --- PASS: TestUnmarshal/include_glob_double_asterisk_nested (0.00s)
    --- PASS: TestUnmarshal/include_glob_double_asterisk_prefix (0.00s)
    --- PASS: TestUnmarshal/include_inline (0.00s)
    --- PASS: TestUnmarshal/include_invalid (0.00s)
    --- PASS: TestUnmarshal/exclude_one (0.00s)
    --- PASS: TestUnmarshal/exclude_multi (0.00s)
    --- PASS: TestUnmarshal/exclude_glob (0.00s)
    --- PASS: TestUnmarshal/exclude_glob_double_asterisk (0.00s)
    --- PASS: TestUnmarshal/exclude_glob_double_asterisk_nested (0.00s)
    --- PASS: TestUnmarshal/exclude_glob_double_asterisk_prefix (0.00s)
    --- PASS: TestUnmarshal/exclude_inline (0.00s)
    --- PASS: TestUnmarshal/exclude_invalid (0.00s)
    --- PASS: TestUnmarshal/poll_interval_no_units (0.00s)
    --- PASS: TestUnmarshal/poll_interval_1s (0.00s)
    --- PASS: TestUnmarshal/poll_interval_1ms (0.00s)
    --- PASS: TestUnmarshal/poll_interval_1000ms (0.00s)
    --- PASS: TestUnmarshal/fingerprint_size_no_units (0.00s)
    --- PASS: TestUnmarshal/fingerprint_size_1kb_lower (0.00s)
    --- PASS: TestUnmarshal/fingerprint_size_1KB (0.00s)
    --- PASS: TestUnmarshal/fingerprint_size_1kib_lower (0.00s)
    --- PASS: TestUnmarshal/fingerprint_size_1KiB (0.00s)
    --- PASS: TestUnmarshal/fingerprint_size_float (0.00s)
    --- PASS: TestUnmarshal/include_file_name_lower (0.00s)
    --- PASS: TestUnmarshal/include_file_name_upper (0.00s)
    --- PASS: TestUnmarshal/include_file_name_on (0.00s)
    --- PASS: TestUnmarshal/include_file_name_yes (0.00s)
    --- PASS: TestUnmarshal/include_file_path_lower (0.00s)
    --- PASS: TestUnmarshal/include_file_path_upper (0.00s)
    --- PASS: TestUnmarshal/include_file_path_on (0.00s)
    --- PASS: TestUnmarshal/include_file_path_yes (0.00s)
    --- PASS: TestUnmarshal/include_file_path_off (0.00s)
    --- PASS: TestUnmarshal/include_file_path_no (0.00s)
    --- PASS: TestUnmarshal/include_file_path_nonbool (0.00s)
    --- PASS: TestUnmarshal/multiline_line_start_string (0.00s)
    --- PASS: TestUnmarshal/multiline_line_start_special (0.00s)
    --- PASS: TestUnmarshal/multiline_line_end_string (0.00s)
    --- PASS: TestUnmarshal/multiline_line_end_special (0.00s)
    --- PASS: TestUnmarshal/multiline_random (0.01s)
    --- PASS: TestUnmarshal/start_at_string (0.00s)
    --- PASS: TestUnmarshal/max_concurrent_large (0.00s)
    --- PASS: TestUnmarshal/max_log_size_mib_lower (0.00s)
    --- PASS: TestUnmarshal/max_log_size_mib_upper (0.00s)
    --- PASS: TestUnmarshal/max_log_size_mb_upper (0.00s)
    --- PASS: TestUnmarshal/max_log_size_mb_lower (0.00s)
    --- PASS: TestUnmarshal/encoding_lower (0.00s)
    --- PASS: TestUnmarshal/encoding_upper (0.00s)
=== RUN   TestBuild
=== PAUSE TestBuild
=== RUN   TestMapStructureDecodeConfigWithHook
--- PASS: TestMapStructureDecodeConfigWithHook (0.00s)
=== RUN   TestMapStructureDecodeConfig
--- PASS: TestMapStructureDecodeConfig (0.00s)
=== RUN   TestCleanStop
=== PAUSE TestCleanStop
=== RUN   TestAddFileFields
=== PAUSE TestAddFileFields
=== RUN   TestReadExistingLogs
=== PAUSE TestReadExistingLogs
=== RUN   TestReadNewLogs
=== PAUSE TestReadNewLogs
=== RUN   TestReadExistingAndNewLogs
=== PAUSE TestReadExistingAndNewLogs
=== RUN   TestStartAtEnd
=== PAUSE TestStartAtEnd
=== RUN   TestStartAtEndNewFile
=== PAUSE TestStartAtEndNewFile
=== RUN   TestNoNewline
=== PAUSE TestNoNewline
=== RUN   TestSkipEmpty
=== PAUSE TestSkipEmpty
=== RUN   TestSplitWrite
=== PAUSE TestSplitWrite
=== RUN   TestDecodeBufferIsResized
=== PAUSE TestDecodeBufferIsResized
=== RUN   TestMultiFileSimple
=== PAUSE TestMultiFileSimple
=== RUN   TestMultiFileParallel_PreloadedFiles
=== PAUSE TestMultiFileParallel_PreloadedFiles
=== RUN   TestMultiFileParallel_LiveFiles
=== PAUSE TestMultiFileParallel_LiveFiles
=== RUN   TestOffsetsAfterRestart
=== PAUSE TestOffsetsAfterRestart
=== RUN   TestOffsetsAfterRestart_BigFiles
=== PAUSE TestOffsetsAfterRestart_BigFiles
=== RUN   TestOffsetsAfterRestart_BigFilesWrittenWhileOff
=== PAUSE TestOffsetsAfterRestart_BigFilesWrittenWhileOff
=== RUN   TestManyLogsDelivered
=== PAUSE TestManyLogsDelivered
=== RUN   TestFileBatching
=== PAUSE TestFileBatching
=== RUN   TestFileReader_FingerprintUpdated
=== PAUSE TestFileReader_FingerprintUpdated
=== RUN   TestEncodings
=== PAUSE TestEncodings
=== RUN   TestExclude
--- PASS: TestExclude (0.00s)
=== RUN   TestExcludeEmpty
--- PASS: TestExcludeEmpty (0.00s)
=== RUN   TestExcludeMany
--- PASS: TestExcludeMany (0.01s)
=== RUN   TestExcludeDuplicates
--- PASS: TestExcludeDuplicates (0.00s)
=== RUN   TestNewFingerprintDoesNotModifyOffset
--- PASS: TestNewFingerprintDoesNotModifyOffset (0.00s)
=== RUN   TestNewFingerprint
=== RUN   TestNewFingerprint/defaultExactFileSize
=== PAUSE TestNewFingerprint/defaultExactFileSize
=== RUN   TestNewFingerprint/defaultWithFileHalfOfFingerprint
=== PAUSE TestNewFingerprint/defaultWithFileHalfOfFingerprint
=== RUN   TestNewFingerprint/defaultWithFileTwiceFingerprint
=== PAUSE TestNewFingerprint/defaultWithFileTwiceFingerprint
=== RUN   TestNewFingerprint/minFingerprintExactFileSize
=== PAUSE TestNewFingerprint/minFingerprintExactFileSize
=== RUN   TestNewFingerprint/minFingerprintWithSmallerFileSize
=== PAUSE TestNewFingerprint/minFingerprintWithSmallerFileSize
=== RUN   TestNewFingerprint/minFingerprintWithLargerFileSize
=== PAUSE TestNewFingerprint/minFingerprintWithLargerFileSize
=== RUN   TestNewFingerprint/largeFingerprintSmallFile
=== PAUSE TestNewFingerprint/largeFingerprintSmallFile
=== RUN   TestNewFingerprint/largeFingerprintLargeFile
=== PAUSE TestNewFingerprint/largeFingerprintLargeFile
=== CONT  TestNewFingerprint/defaultExactFileSize
=== CONT  TestNewFingerprint/minFingerprintWithLargerFileSize
=== CONT  TestNewFingerprint/largeFingerprintLargeFile
=== CONT  TestNewFingerprint/largeFingerprintSmallFile
=== CONT  TestNewFingerprint/minFingerprintExactFileSize
=== CONT  TestNewFingerprint/defaultWithFileTwiceFingerprint
=== CONT  TestNewFingerprint/minFingerprintWithSmallerFileSize
=== CONT  TestNewFingerprint/defaultWithFileHalfOfFingerprint
--- PASS: TestNewFingerprint (0.00s)
    --- PASS: TestNewFingerprint/defaultExactFileSize (0.15s)
    --- PASS: TestNewFingerprint/largeFingerprintLargeFile (0.17s)
    --- PASS: TestNewFingerprint/minFingerprintWithLargerFileSize (0.17s)
    --- PASS: TestNewFingerprint/largeFingerprintSmallFile (0.18s)
    --- PASS: TestNewFingerprint/minFingerprintExactFileSize (0.18s)
    --- PASS: TestNewFingerprint/defaultWithFileHalfOfFingerprint (0.18s)
    --- PASS: TestNewFingerprint/minFingerprintWithSmallerFileSize (0.18s)
    --- PASS: TestNewFingerprint/defaultWithFileTwiceFingerprint (0.18s)
=== RUN   TestFingerprintCopy
=== PAUSE TestFingerprintCopy
=== RUN   TestFingerprintStartsWith
=== RUN   TestFingerprintStartsWith/same
=== PAUSE TestFingerprintStartsWith/same
=== RUN   TestFingerprintStartsWith/aStartsWithB
=== PAUSE TestFingerprintStartsWith/aStartsWithB
=== RUN   TestFingerprintStartsWith/bStartsWithA
=== PAUSE TestFingerprintStartsWith/bStartsWithA
=== RUN   TestFingerprintStartsWith/neither
=== PAUSE TestFingerprintStartsWith/neither
=== CONT  TestFingerprintStartsWith/same
=== CONT  TestFingerprintStartsWith/bStartsWithA
=== CONT  TestFingerprintStartsWith/neither
=== CONT  TestFingerprintStartsWith/aStartsWithB
--- PASS: TestFingerprintStartsWith (0.00s)
    --- PASS: TestFingerprintStartsWith/same (0.00s)
    --- PASS: TestFingerprintStartsWith/bStartsWithA (0.00s)
    --- PASS: TestFingerprintStartsWith/neither (0.00s)
    --- PASS: TestFingerprintStartsWith/aStartsWithB (0.00s)
=== RUN   TestFingerprintStartsWith_FromFile
--- PASS: TestFingerprintStartsWith_FromFile (1.08s)
=== RUN   TestMultiFileRotate
=== PAUSE TestMultiFileRotate
=== RUN   TestMultiFileRotateSlow
    rotation_test.go:88:
--- SKIP: TestMultiFileRotateSlow (0.00s)
=== RUN   TestMultiCopyTruncateSlow
--- PASS: TestMultiCopyTruncateSlow (2.42s)
=== RUN   TestRotation
=== RUN   TestRotation/NoRotation/MoveCreateTimestamped
=== RUN   TestRotation/NoRotation/MoveCreateSequential
=== RUN   TestRotation/NoRotation/CopyTruncateTimestamped
=== RUN   TestRotation/NoRotation/CopyTruncateSequential
=== RUN   TestRotation/NoDeletion/MoveCreateTimestamped
=== RUN   TestRotation/NoDeletion/MoveCreateSequential
=== RUN   TestRotation/NoDeletion/CopyTruncateTimestamped
=== RUN   TestRotation/NoDeletion/CopyTruncateSequential
=== RUN   TestRotation/Deletion/MoveCreateTimestamped
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion
=== RUN   TestRotation/Deletion/MoveCreateSequential
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion
=== RUN   TestRotation/Deletion/CopyTruncateTimestamped
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion
=== RUN   TestRotation/Deletion/CopyTruncateSequential
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion
=== RUN   TestRotation/Deletion/ExceedFingerprint/MoveCreateTimestamped
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion/ExceedFingerprint
=== RUN   TestRotation/Deletion/ExceedFingerprint/MoveCreateSequential
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion/ExceedFingerprint
=== RUN   TestRotation/Deletion/ExceedFingerprint/CopyTruncateTimestamped
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion/ExceedFingerprint
=== RUN   TestRotation/Deletion/ExceedFingerprint/CopyTruncateSequential
    rotation_test.go:277: Potentially unstable ephemerality expectation for test: Deletion/ExceedFingerprint
--- PASS: TestRotation (22.00s)
    --- PASS: TestRotation/NoRotation/MoveCreateTimestamped (0.32s)
    --- PASS: TestRotation/NoRotation/MoveCreateSequential (0.34s)
    --- PASS: TestRotation/NoRotation/CopyTruncateTimestamped (0.33s)
    --- PASS: TestRotation/NoRotation/CopyTruncateSequential (0.31s)
    --- PASS: TestRotation/NoDeletion/MoveCreateTimestamped (0.48s)
    --- PASS: TestRotation/NoDeletion/MoveCreateSequential (0.40s)
    --- PASS: TestRotation/NoDeletion/CopyTruncateTimestamped (0.46s)
    --- PASS: TestRotation/NoDeletion/CopyTruncateSequential (0.48s)
    --- PASS: TestRotation/Deletion/MoveCreateTimestamped (0.65s)
    --- PASS: TestRotation/Deletion/MoveCreateSequential (0.60s)
    --- PASS: TestRotation/Deletion/CopyTruncateTimestamped (0.60s)
    --- PASS: TestRotation/Deletion/CopyTruncateSequential (0.59s)
    --- PASS: TestRotation/Deletion/ExceedFingerprint/MoveCreateTimestamped (4.17s)
    --- PASS: TestRotation/Deletion/ExceedFingerprint/MoveCreateSequential (4.19s)
    --- PASS: TestRotation/Deletion/ExceedFingerprint/CopyTruncateTimestamped (4.04s)
    --- PASS: TestRotation/Deletion/ExceedFingerprint/CopyTruncateSequential (4.04s)
=== RUN   TestMoveFile
    rotation_test.go:334: Moving files while open is unsupported on Windows
--- SKIP: TestMoveFile (0.00s)
=== RUN   TestTrackMovedAwayFiles
    rotation_test.go:360: Moving files while open is unsupported on Windows
--- SKIP: TestTrackMovedAwayFiles (0.00s)
=== RUN   TestTruncateThenWrite
=== PAUSE TestTruncateThenWrite
=== RUN   TestCopyTruncateWriteBoth
=== PAUSE TestCopyTruncateWriteBoth
=== RUN   TestFileMovedWhileOff_BigFiles
=== PAUSE TestFileMovedWhileOff_BigFiles
=== CONT  TestBuild
=== CONT  TestEncodings
=== CONT  TestStartAtEndNewFile
=== RUN   TestEncodings/Nop
=== PAUSE TestEncodings/Nop
=== RUN   TestEncodings/InvalidUTFReplacement
=== PAUSE TestEncodings/InvalidUTFReplacement
=== RUN   TestBuild/Basic
=== CONT  TestMultiFileParallel_PreloadedFiles
=== CONT  TestMultiFileSimple
=== CONT  TestDecodeBufferIsResized
=== CONT  TestSplitWrite
=== CONT  TestCopyTruncateWriteBoth
=== RUN   TestEncodings/ValidUTF8
=== PAUSE TestBuild/Basic
=== PAUSE TestEncodings/ValidUTF8
=== RUN   TestBuild/BadIncludeGlob
=== PAUSE TestBuild/BadIncludeGlob
=== RUN   TestEncodings/ChineseCharacter
=== RUN   TestBuild/BadExcludeGlob
=== PAUSE TestBuild/BadExcludeGlob
=== PAUSE TestEncodings/ChineseCharacter
=== RUN   TestEncodings/SmileyFaceUTF16
=== RUN   TestBuild/MultilineConfiguredStartAndEndPatterns
=== PAUSE TestBuild/MultilineConfiguredStartAndEndPatterns
=== PAUSE TestEncodings/SmileyFaceUTF16
=== RUN   TestBuild/MultilineConfiguredStartPattern
=== PAUSE TestBuild/MultilineConfiguredStartPattern
=== RUN   TestEncodings/SmileyFaceNewlineUTF16
=== RUN   TestBuild/MultilineConfiguredEndPattern
=== PAUSE TestEncodings/SmileyFaceNewlineUTF16
=== PAUSE TestBuild/MultilineConfiguredEndPattern
=== RUN   TestEncodings/SmileyFaceNewlineUTF16LE
=== RUN   TestBuild/InvalidEncoding
=== PAUSE TestEncodings/SmileyFaceNewlineUTF16LE
=== PAUSE TestBuild/InvalidEncoding
=== RUN   TestBuild/LineStartAndEnd
=== PAUSE TestBuild/LineStartAndEnd
=== RUN   TestEncodings/ChineseCharacterBig5
=== PAUSE TestEncodings/ChineseCharacterBig5
=== RUN   TestBuild/NoLineStartOrEnd
=== PAUSE TestBuild/NoLineStartOrEnd
=== CONT  TestSkipEmpty
=== RUN   TestBuild/InvalidLineStartRegex
=== PAUSE TestBuild/InvalidLineStartRegex
=== RUN   TestBuild/InvalidLineEndRegex
=== PAUSE TestBuild/InvalidLineEndRegex
=== CONT  TestNoNewline
    file_test.go:174:
--- SKIP: TestNoNewline (0.00s)
=== CONT  TestReadNewLogs
--- PASS: TestStartAtEndNewFile (0.01s)
=== CONT  TestStartAtEnd
--- PASS: TestReadNewLogs (0.01s)
=== CONT  TestReadExistingAndNewLogs
--- PASS: TestSplitWrite (0.01s)
=== CONT  TestAddFileFields
--- PASS: TestReadExistingAndNewLogs (0.00s)
=== CONT  TestReadExistingLogs
--- PASS: TestStartAtEnd (0.21s)
=== CONT  TestFileMovedWhileOff_BigFiles
--- PASS: TestSkipEmpty (0.21s)
=== CONT  TestCleanStop
    file_test.go:36: Skipping due to goroutine leak in opencensus.
        See this issue for details: https://github.com/census-instrumentation/opencensus-go/issues/1191#issuecomment-610440163
=== CONT  TestManyLogsDelivered
--- SKIP: TestCleanStop (0.00s)
=== CONT  TestFileReader_FingerprintUpdated
--- PASS: TestAddFileFields (0.20s)
=== CONT  TestFileBatching
--- PASS: TestDecodeBufferIsResized (0.21s)
--- PASS: TestFileReader_FingerprintUpdated (0.00s)
=== CONT  TestOffsetsAfterRestart_BigFiles
--- PASS: TestReadExistingLogs (0.21s)
=== CONT  TestOffsetsAfterRestart
--- PASS: TestFileBatching (0.28s)
=== CONT  TestOffsetsAfterRestart_BigFilesWrittenWhileOff
=== CONT  TestMultiFileParallel_LiveFiles
--- PASS: TestManyLogsDelivered (0.41s)
--- PASS: TestOffsetsAfterRestart_BigFiles (0.41s)
=== CONT  TestMultiFileRotate
--- PASS: TestFileMovedWhileOff_BigFiles (0.41s)
=== CONT  TestTruncateThenWrite
--- PASS: TestOffsetsAfterRestart (0.40s)
=== CONT  TestFingerprintCopy
--- PASS: TestFingerprintCopy (0.00s)
=== CONT  TestEncodings/Nop
--- PASS: TestTruncateThenWrite (0.21s)
=== CONT  TestEncodings/SmileyFaceNewlineUTF16
=== CONT  TestEncodings/SmileyFaceUTF16
--- PASS: TestOffsetsAfterRestart_BigFilesWrittenWhileOff (0.42s)
=== CONT  TestEncodings/ChineseCharacter
--- PASS: TestCopyTruncateWriteBoth (1.01s)
=== CONT  TestEncodings/ValidUTF8
=== CONT  TestEncodings/InvalidUTFReplacement
=== CONT  TestEncodings/ChineseCharacterBig5
=== CONT  TestEncodings/SmileyFaceNewlineUTF16LE
=== CONT  TestBuild/Basic
--- PASS: TestMultiFileSimple (1.21s)
=== CONT  TestBuild/InvalidEncoding
=== CONT  TestBuild/InvalidLineEndRegex
=== CONT  TestBuild/InvalidLineStartRegex
=== CONT  TestBuild/NoLineStartOrEnd
=== CONT  TestBuild/LineStartAndEnd
=== CONT  TestBuild/MultilineConfiguredStartAndEndPatterns
=== CONT  TestBuild/MultilineConfiguredEndPattern
=== CONT  TestBuild/MultilineConfiguredStartPattern
=== CONT  TestBuild/BadExcludeGlob
=== CONT  TestBuild/BadIncludeGlob
--- PASS: TestBuild (0.00s)
    --- PASS: TestBuild/InvalidEncoding (0.00s)
    --- PASS: TestBuild/InvalidLineEndRegex (0.00s)
    --- PASS: TestBuild/Basic (0.00s)
    --- PASS: TestBuild/InvalidLineStartRegex (0.00s)
    --- PASS: TestBuild/LineStartAndEnd (0.00s)
    --- PASS: TestBuild/MultilineConfiguredStartAndEndPatterns (0.00s)
    --- PASS: TestBuild/NoLineStartOrEnd (0.00s)
    --- PASS: TestBuild/MultilineConfiguredStartPattern (0.00s)
    --- PASS: TestBuild/MultilineConfiguredEndPattern (0.00s)
    --- PASS: TestBuild/BadExcludeGlob (0.00s)
    --- PASS: TestBuild/BadIncludeGlob (0.00s)
--- PASS: TestMultiFileParallel_PreloadedFiles (1.25s)
--- PASS: TestEncodings (0.00s)
    --- PASS: TestEncodings/Nop (0.20s)
    --- PASS: TestEncodings/SmileyFaceNewlineUTF16 (0.20s)
    --- PASS: TestEncodings/SmileyFaceUTF16 (0.21s)
    --- PASS: TestEncodings/ChineseCharacter (0.22s)
    --- PASS: TestEncodings/ValidUTF8 (0.20s)
    --- PASS: TestEncodings/InvalidUTFReplacement (0.20s)
    --- PASS: TestEncodings/ChineseCharacterBig5 (0.21s)
    --- PASS: TestEncodings/SmileyFaceNewlineUTF16LE (0.20s)
--- PASS: TestMultiFileRotate (1.21s)
--- PASS: TestMultiFileParallel_LiveFiles (1.24s)
PASS
ok      github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file      27.607s

everything passes.
i kept running just in case.

Administrator@EC2AMAZ-K6DFUOR MINGW64 ~/opentelemetry-log-collection/operator/builtin/input/file (track_rotated)
$ go test
PASS
ok      github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file      27.225s

Administrator@EC2AMAZ-K6DFUOR MINGW64 ~/opentelemetry-log-collection/operator/builtin/input/file (track_rotated)
$ go test
gPASS
ok      github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file      20.838s

Administrator@EC2AMAZ-K6DFUOR MINGW64 ~/opentelemetry-log-collection/operator/builtin/input/file (track_rotated)
$ go test
PASS
ok      github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file      20.087s

Administrator@EC2AMAZ-K6DFUOR MINGW64 ~/opentelemetry-log-collection/operator/builtin/input/file (track_rotated)
$ go test
PASS
ok      github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file      27.322s

Administrator@EC2AMAZ-K6DFUOR MINGW64 ~/opentelemetry-log-collection/operator/builtin/input/file (track_rotated)
$ go test
PASS
ok      github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file      26.866s

it passes without failing.
Any clue on why the test is flaky on github actions?

@rockb1017 rockb1017 force-pushed the track_rotated branch 2 times, most recently from 0146cca to 5ae78d9 Compare June 9, 2021 06:39
@rockb1017
Copy link
Contributor Author

at this point,I think i am just fixing the flaky test on windows machine because I don't have any failing cases on my windows.

  1. I have rebased it.
  2. I have increase timeouts for unit test in the attempt to fix it.

@codecov
Copy link

codecov bot commented Jun 9, 2021

Codecov Report

Merging #168 (e4664e5) into main (d988d23) will increase coverage by 0.1%.
The diff coverage is 91.6%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #168     +/-   ##
=======================================
+ Coverage   75.6%   75.8%   +0.1%     
=======================================
  Files         95      95             
  Lines       4360    4363      +3     
=======================================
+ Hits        3298    3308     +10     
+ Misses       739     733      -6     
+ Partials     323     322      -1     
Impacted Files Coverage Δ
operator/builtin/input/file/config.go 73.4% <0.0%> (ø)
operator/builtin/input/file/file.go 76.1% <100.0%> (+4.7%) ⬆️
operator/builtin/input/file/reader.go 68.2% <100.0%> (+3.1%) ⬆️
operator/builtin/input/file/fingerprint.go 76.4% <0.0%> (-11.8%) ⬇️

@djaglowski
Copy link
Member

djaglowski commented Jun 9, 2021

at this point,I think i am just fixing the flaky test on windows machine because I don't have any failing cases on my windows.

  1. I have rebased it.
  2. I have increase timeouts for unit test in the attempt to fix it.

I think we should not look at this as a matter of flaky tests. These tests have no history of being flaky, to my knowledge. Tuning timeouts/intervals is sometimes necessary at a macro-level, but our tests should pass with timeouts/intervals that span a reasonable range.

Instead, let's assume that these changes may have introduced an issue that is causing the problem.

We see the following error on several failing windows tests: The process cannot access the file because it is being used by another process

The nature of this PR is to change the way in which we are "using" files, so it seems likely that these changes are directly related.

@@ -355,6 +355,42 @@ func TestMoveFile(t *testing.T) {
expectNoMessages(t, logReceived)
}

func TestTrackMovedAwayFiles(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("Moving files while open is unsupported on Windows")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think opening with with FILE_SHARE_DELETE flag should make this possible. See https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea?redirectedfrom=MSDN

@rockb1017
Copy link
Contributor Author

yes failing test after rebase is a little different. so i ran it on my windows machine, and yay i can reproduce them.
I added defer operator.Stop() in tests. test itself tries to delete the file after testing, but operator has them open. that was the issue. after I added it, it passed on my windows. let's see if what github action says.

@rockb1017
Copy link
Contributor Author

it passed. feel free to re run pipeline just to check.

@rockb1017 rockb1017 requested a review from djaglowski June 9, 2021 19:28
@djaglowski
Copy link
Member

djaglowski commented Jun 9, 2021

I tried running benchmarks on this and found a problem. Some clarification is needed.

  1. Address issues detected by gosec #157 introduced a noisy log message, which I think can be ignored. I will open a PR shortly. In the meantime you can search for "Problem closing reader" and comment out the log.
  2. This benchmark is apparently demonstrating a problem that unit tests are not. I will try to convert this into a unit test so that we can catch this more easily.

The problem is that the benchmark never completes. It hangs, either at the end of MultiGlob, or during MaxConcurrent.

The good news is that, for the 3 cases that do complete, we see almost identical performance. Nice job.

On main:

> make bench
go test -benchmem -run=^$ -bench ^* ./...
...
goos: darwin
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file
BenchmarkFileInput/Single-8               652237              2193 ns/op             688 B/op         10 allocs/op
BenchmarkFileInput/Glob-8                 111085             10936 ns/op            2762 B/op         42 allocs/op
BenchmarkFileInput/MultiGlob-8            113517             10727 ns/op            2762 B/op         42 allocs/op
BenchmarkFileInput/MaxConcurrent-8         41650             24309 ns/op            2757 B/op         42 allocs/op
BenchmarkFileInput/FngrPrntLarge-8        485581              2089 ns/op             689 B/op         10 allocs/op
BenchmarkFileInput/FngrPrntSmall-8        487039              2088 ns/op             688 B/op         10 allocs/op
> make bench
go test -benchmem -run=^$ -bench ^* ./...
...
goos: darwin
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file
BenchmarkFileInput/Single-8               473388              2140 ns/op             688 B/op         10 allocs/op
BenchmarkFileInput/Glob-8                 116604             10445 ns/op            2761 B/op         42 allocs/op
BenchmarkFileInput/MultiGlob-8            114943             10600 ns/op            2762 B/op         42 allocs/op

@djaglowski
Copy link
Member

#174 and #176 have been merged.

174 will take care of the noisy "problem closing reader" logs, but you'll have to determine whether a similar change makes sense on this branch.

176 adds benchmarks to CI, which should demonstrate the problem. I'm still planning to write a dedicated unit test for this, but for the sake of expediency, I wanted to provide at least one way for you to validate.

@rockb1017
Copy link
Contributor Author

i figured out why. for MaxConcurrent, it was using 1.
cfg.MaxConcurrentFiles = 1
now per batch, we open half of this number. so it wasn't able to open any file. I changed it to 2 and it works.
on this branch

pkg: github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file
BenchmarkFileInput/Single-16              504951              2003 ns/op             688 B/op         10 allocs/op
BenchmarkFileInput/Glob-16                122710              9888 ns/op            2761 B/op         42 allocs/op
BenchmarkFileInput/MultiGlob-16           123349              9809 ns/op            2761 B/op         42 allocs/op
BenchmarkFileInput/MaxConcurrent-16        65827             15271 ns/op            2757 B/op         42 allocs/op
BenchmarkFileInput/FngrPrntLarge-16       511382              2378 ns/op             690 B/op         10 allocs/op
BenchmarkFileInput/FngrPrntSmall-16       510836              2095 ns/op             688 B/op         10 allocs/op

on main

pkg: github.com/open-telemetry/opentelemetry-log-collection/operator/builtin/input/file
BenchmarkFileInput/Single-16              472940              2210 ns/op             688 B/op         10 allocs/op
BenchmarkFileInput/Glob-16                116560             12118 ns/op            2763 B/op         42 allocs/op
BenchmarkFileInput/MultiGlob-16           119990             13379 ns/op            2765 B/op         42 allocs/op
BenchmarkFileInput/MaxConcurrent-16        32445             30870 ns/op            2759 B/op         42 allocs/op
BenchmarkFileInput/FngrPrntLarge-16       508389              2413 ns/op             689 B/op         10 allocs/op
BenchmarkFileInput/FngrPrntSmall-16       474009              2588 ns/op             688 B/op         10 allocs/op

@djaglowski
Copy link
Member

Simple enough. However, we need to either support max_concurrent_files=1, or at least produce an error when validating the config if a user has specified 1.

operator/builtin/input/file/util_test.go Outdated Show resolved Hide resolved
operator/builtin/input/file/util_test.go Outdated Show resolved Hide resolved
docs/operators/file_input.md Outdated Show resolved Hide resolved
@rockb1017
Copy link
Contributor Author

Simple enough. However, we need to either support max_concurrent_files=1, or at least produce an error when validating the config if a user has specified 1.

I think user putting 1 for max concurrent file is not very common use case. I will produce error for config validation

@rockb1017 rockb1017 requested a review from djaglowski June 10, 2021 17:49
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your persistence @rockb1017!

LGTM

@djaglowski djaglowski merged commit c46512e into open-telemetry:main Jun 10, 2021
@rockb1017 rockb1017 deleted the track_rotated branch June 10, 2021 19:25
djaglowski added a commit that referenced this pull request Jun 10, 2021
@tigrannajaryan
Copy link
Member

@djaglowski @rockb1017 judging from how much effort you needed to put into this it was not a trivial problem. Can we document how this works now? I do not feel that the code is sufficient for understanding. Even a retroactive design document with a review in Log SIG would not be bad to do (but maybe that's overkill and it is not that complicated, you tell me).

@tigrannajaryan
Copy link
Member

Meta note: for the future let's make sure we have proper PR title, commit title and commit message.

djaglowski added a commit that referenced this pull request Jun 10, 2021
@rockb1017 rockb1017 restored the track_rotated branch June 10, 2021 21:06
@djaglowski
Copy link
Member

@tigrannajaryan, I'll document how it currently works and how this change would work. The general design is not particularly complicated but the code could certainly be made more readable. Some small steps have been taken toward this end. More will be needed.

The primary challenge with this overall effort has been in establishing confidence in the edge cases, particularly where file rotation is involved. Several tests are probabilistic because they rely on the timing of system calls. We need to establish that these tests can be made deterministic without sidestepping some critical feature of the edge case. I think it will be most helpful to articulate a couple specific test scenarios and attempt to establish how they can be deterministically tested.

I can spend some more time on this effort next week, and will share particulars with you. Perhaps we can together establish the best path forward from there.

@tigrannajaryan
Copy link
Member

@djaglowski sounds good, as soon as we are confident that the feature is rock solid and it is documented well enough for the future generations we are good.

@rockb1017 rockb1017 mentioned this pull request Jun 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

file_input - Data loss when tailing symlink file with log rotation (k8s container logs)
3 participants