Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto removal no block #914

Merged
merged 3 commits into from
Oct 19, 2023
Merged

Auto removal no block #914

merged 3 commits into from
Oct 19, 2023

Conversation

adam-mateen
Copy link
Contributor

Description of the issue

Previously there this issue - #381
Which was fixed by - #452

But now instead of an edge case where the agent will DROP log lines, the issue is that the agent will BLOCK waiting to upload ALL log lines and potentially miss discovering and reporting on a new file.
Consider the following events:

  1. File 1 is created, discovered, and reported
  2. File 2 is created, and discovered, but the agent is not finished reading File 1, so LogFile.FindLogSrc() is blocking.
  3. Before LogFile.FindLogSrc() completes, another 2 files matching the monitored pattern are created (File 3 and File 4)
  4. When the agent is done reading FIle 1 it deletes it and begins monitoring File 2.
  5. The Agent discovers FIle 4, but not File 3.

If this series of events repeats the host could be left with a bunch of unread, unreported, and not deleted files.

Description of changes

  • LogFile.FindLogSrc() will no longer block waiting for the tail to exit. This means there could be more than 1 tail running and reading LogEvents for a single log source. (This is already possible though)

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  • Modified existing unit test case for auto removal.

BEFORE the fix:

adamym@88665a370425 amazon-cloudwatch-agent % go test -v -count=1  ./plugins/inputs/logfile/ -run TestLogsFileAutoRemoval
=== RUN   TestLogsFileAutoRemoval
    logfile_test.go:390: Created temp file, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval1899158973
    logfile_test.go:377: create LogFile with FilePath = /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval*
    logfile_test.go:390: Created temp file, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval33957811
2023/10/19 08:24:10 The state file  for /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval33957811 does not exist: stat : no such file or directory
    logfile_test.go:415: start writing, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval33957811
    logfile_test.go:420: stop writing, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval33957811
    logfile_test.go:440: Verify every line written to the temp file is received.
    logfile_test.go:390: Created temp file, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval1680826170
    logfile_test.go:451: Verify child completed.
2023/10/19 08:24:17 The state file  for /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval1680826170 does not exist: stat : no such file or directory
    logfile_test.go:402: 
        	Error Trace:	/Users/adamym/go/src/github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile/logfile_test.go:402
        	            				/Users/adamym/go/src/github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile/logfile_test.go:432
        	            				/usr/local/Cellar/go/1.21.1/libexec/src/runtime/asm_amd64.s:1650
        	Error:      	"1.437046513s" is not less than "100ms"
        	Test:       	TestLogsFileAutoRemoval
    logfile_test.go:454: Child completed before timeout (as expected)
    logfile_test.go:458: Verify 1st temp file was auto deleted.
2023/10/19 08:24:17 I! [logfile] Successfully removed file /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval33957811 with auto_removal feature
    logfile_test.go:460: 
        	Error Trace:	/Users/adamym/go/src/github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile/logfile_test.go:460
        	            				/Users/adamym/go/src/github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile/logfile_test.go:482
        	Error:      	Should be true
        	Test:       	TestLogsFileAutoRemoval
    logfile_test.go:483: Verify 1st tmp file created and discovered.
    logfile_test.go:486: Parent completed before timeout (as expected)
    logfile_test.go:493: cleanup, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval1680826170
--- FAIL: TestLogsFileAutoRemoval (6.52s)
FAIL
FAIL	github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile	7.098s
FAIL

AFTER THE FIX:

adamym@88665a370425 amazon-cloudwatch-agent % go test -v -count=1  ./plugins/inputs/logfile/ -run TestLogsFileAutoRemoval
=== RUN   TestLogsFileAutoRemoval
    logfile_test.go:390: Created temp file, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval48041360
    logfile_test.go:377: create LogFile with FilePath = /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval*
    logfile_test.go:390: Created temp file, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2879642736
2023/10/19 08:24:32 The state file  for /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2879642736 does not exist: stat : no such file or directory
    logfile_test.go:415: start writing, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2879642736
    logfile_test.go:420: stop writing, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2879642736
    logfile_test.go:440: Verify every line written to the temp file is received.
    logfile_test.go:390: Created temp file, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2728170068
2023/10/19 08:24:37 The state file  for /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2728170068 does not exist: stat : no such file or directory
    logfile_test.go:415: start writing, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2728170068
    logfile_test.go:451: Verify child completed.
2023/10/19 08:24:38 I! [logfile] Successfully removed file /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2879642736 with auto_removal feature
    logfile_test.go:420: stop writing, /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2728170068
    logfile_test.go:440: Verify every line written to the temp file is received.
    logfile_test.go:454: Child completed before timeout (as expected)
    logfile_test.go:458: Verify 1st temp file was auto deleted.
    logfile_test.go:483: Verify 1st tmp file created and discovered.
    logfile_test.go:486: Parent completed before timeout (as expected)
2023/10/19 08:24:44 I! [logfile] Successfully removed file /var/folders/7c/rx6r0v6n44j4pwnc_77gt46w0000gs/T/TestLogsFileAutoRemoval2728170068 with auto_removal feature
--- PASS: TestLogsFileAutoRemoval (11.89s)
PASS
ok  	github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile	12.466s

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@adam-mateen adam-mateen requested a review from a team as a code owner October 19, 2023 13:31
@codecov-commenter
Copy link

codecov-commenter commented Oct 19, 2023

Codecov Report

Attention: 37 lines in your changes are missing coverage. Please review.

Comparison is base (96d4763) 57.58% compared to head (b5b9593) 62.66%.
Report is 428 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #914      +/-   ##
==========================================
+ Coverage   57.58%   62.66%   +5.07%     
==========================================
  Files         370      338      -32     
  Lines       17548    17044     -504     
==========================================
+ Hits        10105    10680     +575     
+ Misses       6848     5807    -1041     
+ Partials      595      557      -38     
Files Coverage Δ
cfg/commonconfig/commonconfig.go 8.00% <ø> (ø)
...md/amazon-cloudwatch-agent-config-wizard/wizard.go 59.55% <ø> (-8.51%) ⬇️
...amazon-cloudwatch-agent/amazon-cloudwatch-agent.go 2.64% <ø> (ø)
...oudwatch-agent/register_event_logger_notwindows.go 0.00% <ø> (ø)
...-cloudwatch-agent/register_event_logger_windows.go 0.00% <ø> (ø)
cmd/config-translator/translator.go 0.00% <ø> (ø)
cmd/xray-migration/commands_unix.go 42.50% <ø> (ø)
cmd/xray-migration/commands_windows.go 42.50% <ø> (ø)
cmd/xray-migration/xray-migration.go 30.28% <ø> (ø)
handlers/agentinfo/info.go 84.94% <ø> (ø)
... and 22 more

... and 208 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@SaxyPandaBear SaxyPandaBear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall my only concern is around e2e testing. Can we do a stress test on this to validate that under some workload, the current agent falls behind and skips logs, but the agent with this change doesn't? How do we prove that it's deterministic? I think that's the crux of my concern

plugins/inputs/logfile/logfile_test.go Show resolved Hide resolved
@adam-mateen
Copy link
Contributor Author

Overall my only concern is around e2e testing. Can we do a stress test on this to validate that under some workload, the current agent falls behind and skips logs, but the agent with this change doesn't? How do we prove that it's deterministic? I think that's the crux of my concern

Yes good call. I can work on adding a E2E which creates files every few seconds and spams them with log lines such that the agent will fall behind trying to read and upload (most likely throttled)... Beware this will be "expensive" since we are actually uploading those log lines.

@adam-mateen adam-mateen merged commit b1bee71 into main Oct 19, 2023
@adam-mateen adam-mateen deleted the auto-removal-no-block branch October 19, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants