Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty file edge case #36076

Merged
merged 5 commits into from
Jul 18, 2023
Merged

Fix empty file edge case #36076

merged 5 commits into from
Jul 18, 2023

Conversation

rdner
Copy link
Member

@rdner rdner commented Jul 17, 2023

What does this PR do?

It's possible that file scanning happens when the file was created but was not yet written to. In this case the size is 0.

We should not spawn any resources (e.g. harvesters) for such files until they actually have some content. We create events only when read something from a file, so having a harvester on an empty file is not useful.

Not handling this edge case also causes our tests to be flaky, sometimes an expected size does not match an actual size (0).

Why is it important?

To fix flaky tests and to make the code more efficient.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
    - [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Run Filebeat with this configuration:
filebeat.inputs:
  - type: filestream
    id: my-filestream-id
    paths:
      - "/path/to/your/logs/*.log"
    prospector.scanner.check_interval: 5s
path.data: "/path/to/your/data"
logging:
  level: debug
output.console:
  enabled: true

Note the path placeholders you need to fix first.

  1. Place an empty file (touch some.log) in /path/to/your/logs
  2. Wait up to 5 seconds to see this log message in Filebeat's output:
{                                                                   
  "log.level": "warn",                                              
  "@timestamp": "2023-07-17T09:51:27.315+0200",                     
  "log.logger": "scanner",                                          
  "log.origin": {                                                   
    "file.name": "filestream/fswatch.go",                           
    "file.line": 369                                                
  },                                                                
  "message": "file \"/Users/rdner/Projects/es_confs/empty-files/logs/some.log\" has no content yet, skipping",                          
  "service.name": "filebeat",                                       
  "ecs.version": "1.6.0"                                            
}
  1. Add a line to the file using the following command:
echo "line" >> some.log

you can see that Filebeat now detected the file, started ingestion and created an event:

{                                                                   
  "log.level": "debug",                                             
  "@timestamp": "2023-07-17T09:52:21.436+0200",                                                                                         
  "log.logger": "processors",                                       
  "log.origin": {                                                   
    "file.name": "processing/processors.go",                        
    "file.line": 213                                                
  },                                                                
  "message": "Publish event: {\n  \"@timestamp\": \"2023-07-17T07:52:21.436Z\",\n  \"@metadata\": {\n    \"beat\": \"filebeat\",\n    \"type\": \"_doc\",\n    \"version\": \"8.10.0\"\n  },\n  \"input\": {\n    \"type\": \"filestream\"\n  },\n  \"ecs\": {\n    \"version\": 
\"8.0.0\"\n  },\n  \"host\": {\n    \"name\": \"MacBook-Pro.localdomain\"\n  },\n  \"agent\": {\n    \"name\": \"MacBook-Pro.localdomain\",\n    \"type\": \"filebeat\",\n    \"version\": \"8.10.0\",\n    \"ephemeral_id\": \"97426136-07de-4b0a-99b4-bcc60a8c201a\",\n    \"id
\": \"7326a55b-6e54-470c-9557-7f2d0c5f0efc\"\n  },\n  \"log\": {\n    \"offset\": 0,\n    \"file\": {\n      \"path\": \"/Users/rdner/Projects/es_confs/empty-files/logs/some.log\"\n    }\n  },\n  \"message\": \"some\"\n}",
  "service.name": "filebeat",                                                                                                           
  "ecs.version": "1.6.0"                                                                                                                
} 

@rdner rdner added enhancement Filebeat Filebeat backport-7.17 Automated backport to the 7.17 branch with mergify labels Jul 17, 2023
@rdner rdner self-assigned this Jul 17, 2023
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 17, 2023
@rdner rdner force-pushed the fix-fswatcher-flaky-test branch 2 times, most recently from 74e3ce4 to 1db1158 Compare July 17, 2023 07:56
@rdner rdner added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jul 17, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 17, 2023
@rdner
Copy link
Member Author

rdner commented Jul 17, 2023

filebeat-goIntegTest tests make sure that the external behaviour stayed the same, I double-checked.

It's possible that file scanning happens when the file was created but
was not yet written to. In this case the size is 0.

We should not spawn any resources (e.g. harvesters) for such files until they actually
have some content. We create events only when read something from a
file, so having a harvester on an empty file is not useful.

Not handling this edge case also causes our tests to be flaky,
sometimes an expected size does not match an actual size (0).
@rdner rdner force-pushed the fix-fswatcher-flaky-test branch from 1db1158 to cc4c9c7 Compare July 17, 2023 13:25
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 17, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-18T06:56:43.955+0000

  • Duration: 70 min 4 sec

Test stats 🧪

Test Results
Failed 0
Passed 8058
Skipped 757
Total 8815

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@rdner rdner marked this pull request as ready for review July 17, 2023 15:18
@rdner rdner requested a review from a team as a code owner July 17, 2023 15:18
@rdner rdner requested review from ycombinator and leehinman July 17, 2023 15:18
@rdner rdner enabled auto-merge (squash) July 17, 2023 15:18
@rdner rdner requested a review from leehinman July 17, 2023 17:32
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rdner rdner merged commit 0cd5775 into elastic:main Jul 18, 2023
mergify bot pushed a commit that referenced this pull request Jul 18, 2023
It's possible that file scanning happens when the file was created but
was not yet written to. In this case the size is 0.

We should not spawn any resources (e.g. harvesters) for such files until they actually
have some content. We create events only when read something from a
file, so having a harvester on an empty file is not useful.

Not handling this edge case also causes our tests to be flaky,
sometimes an expected size does not match an actual size (0).

(cherry picked from commit 0cd5775)

# Conflicts:
#	filebeat/input/filestream/fswatch_test.go
@rdner rdner deleted the fix-fswatcher-flaky-test branch July 18, 2023 08:33
rdner added a commit that referenced this pull request Jul 18, 2023
* Fix empty new file edge case (#36076)

It's possible that file scanning happens when the file was created but
was not yet written to. In this case the size is 0.

We should not spawn any resources (e.g. harvesters) for such files until they actually
have some content. We create events only when read something from a
file, so having a harvester on an empty file is not useful.

Not handling this edge case also causes our tests to be flaky,
sometimes an expected size does not match an actual size (0).

(cherry picked from commit 0cd5775)

# Conflicts:
#	filebeat/input/filestream/fswatch_test.go

* Resolve conflicts

---------

Co-authored-by: Denis <[email protected]>
@miksir
Copy link

miksir commented Nov 17, 2023

my logs now full of spam "has no content yet, skipping" for all zero size logs. And it's on the "warning" level

Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
It's possible that file scanning happens when the file was created but
was not yet written to. In this case the size is 0.

We should not spawn any resources (e.g. harvesters) for such files until they actually
have some content. We create events only when read something from a
file, so having a harvester on an empty file is not useful.

Not handling this edge case also causes our tests to be flaky,
sometimes an expected size does not match an actual size (0).
@Alphayeeeet
Copy link

Maybe change the log level of that message to info or even debug to prevent spam? @Team

@rdner
Copy link
Member Author

rdner commented Mar 19, 2024

@miksir @Alphayeeeet sorry for the log spam, addressed in #38421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify enhancement Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants