Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create fsnotify watcher only when starting file_integrity module #19505

Merged
merged 2 commits into from
Jul 7, 2020

Conversation

vjsamuel
Copy link
Contributor

@vjsamuel vjsamuel commented Jun 30, 2020

Bug

What does this PR do?

It makes the creation of an FS notify watcher lazy. It is only set up when the metricset starts up.

Why is it important?

When FIM module is used with autodiscover, the Check function is called too many times where fsnotify watchers end up leaking and keeps increasing the number of file handles. It eventually causes auditbeat OOM.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

@vjsamuel vjsamuel requested a review from a team as a code owner June 30, 2020 08:01
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 30, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 30, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [andrewkroh commented: jenkins, run tests]

  • Start Time: 2020-07-06T12:38:51.962+0000

  • Duration: 40 min 58 sec

Test stats 🧪

Test Results
Failed 0
Passed 393
Skipped 68
Total 461

Steps errors

Expand to view the steps failures

  • Name: Report to Codecov
    • Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done

    • Duration: 1 min 28 sec

    • Start Time: 2020-07-06T13:09:29.776+0000

    • log

@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 30, 2020
@@ -39,19 +39,19 @@ type reader struct {

// NewEventReader creates a new EventProducer backed by fsnotify.
func NewEventReader(c Config) (EventProducer, error) {
watcher, err := monitor.New(c.Recursive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this was here to detect config mistakes early, @andrewkroh probably you can confirm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think much validation is gained with this particular instantiation from a config validation perspective. So moving it into Start should be OK.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this problem could affect any Metricbeat or Auditbeat module. The CheckConfig function instantiates modules (and hence metricsets), but it never invokes the optionally implemented mb.Closer on the metricsets that it creates. I expect that this produces a leak in more places than Auditbeat FIM.

If this correct then I think the best solution would be to fix the root of the problem and ensure that the CheckConfig implementation in Metricbeat always closes the MetricSets that it creates. Perhaps something like

diff --git a/metricbeat/mb/module/factory.go b/metricbeat/mb/module/factory.go
index d5b342afe..d35ef24e3 100644
--- a/metricbeat/mb/module/factory.go
+++ b/metricbeat/mb/module/factory.go
@@ -75,10 +75,13 @@ func (r *Factory) Create(p beat.PipelineConnector, c *common.Config) (cfgfile.Ru
 
 // CheckConfig checks if a config is valid or not
 func (r *Factory) CheckConfig(config *common.Config) error {
-       _, err := NewWrapper(config, mb.Registry, r.options...)
+       wrapper, err := NewWrapper(config, mb.Registry, r.options...)
        if err != nil {
                return err
        }
+       for _, ms := range wrapper.MetricSets() {
+               ms.close()
+       }
 
        return nil
 }

@andrewkroh
Copy link
Member

andrewkroh commented Jun 30, 2020

I think there is also a potential resource leak within NewWrapper from initMetricSets when an error occurs. This needs to make sure that any MetricSets instantiated prior to the error are closed().

@vjsamuel vjsamuel force-pushed the fix_watcher_leak branch from c0c3de7 to 042a502 Compare July 1, 2020 05:46
@vjsamuel
Copy link
Contributor Author

vjsamuel commented Jul 1, 2020

ideally we need to add a Check(*common.Config) error to the metricset interface. This will allow modules to optionally do an unmarshal of the config and do Validate() on the config if the config implements the same.

@vjsamuel vjsamuel force-pushed the fix_watcher_leak branch from 042a502 to f293c68 Compare July 1, 2020 23:15
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this change. But the root cause still needs to be addressed otherwise other modules could be leaking resources with every CheckConfig.

@@ -39,19 +39,19 @@ type reader struct {

// NewEventReader creates a new EventProducer backed by fsnotify.
func NewEventReader(c Config) (EventProducer, error) {
watcher, err := monitor.New(c.Recursive)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think much validation is gained with this particular instantiation from a config validation perspective. So moving it into Start should be OK.

@andrewkroh
Copy link
Member

jenkins, run tests

@andrewkroh
Copy link
Member

jenkins, run tests

@exekias exekias merged commit de897b0 into elastic:master Jul 7, 2020
@vjsamuel vjsamuel deleted the fix_watcher_leak branch July 8, 2020 05:15
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
…stic#19505)

* Create fsnotify watcher only when starting file_integrity module

* Add close to Start() so that failed starts don't create resource leaks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants