Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #16849 to 7.x: [Winlogbeat] Add experimental event log reader with increased performance #17022

Merged
merged 1 commit into from
Mar 20, 2020

Conversation

andrewkroh
Copy link
Member

@andrewkroh andrewkroh commented Mar 16, 2020

Cherry-pick of PR #16849 to 7.x branch. Original message:

What does this PR do?

This PR adds a new event log reader implementation that's behind a
feature flag for now. It achieves higher event throughput than the current
reader by not using XML and by heavily caching static metadata about events.
To enable it add api to each event log reader.

winlogbeat.event_logs:
- name: Security
  api: wineventlog-experimental

The existing reader requests each event as XML and then must unmarshal the XML
document. EvtFormatMessage is used to get the XML document from Windows. Then the
Go stdlib encoder/xml package is used to parse it. Both of these operations are
relatively slow (see golang/go#21823 about encoding/xml).

This new reader utilizes the publisher metadata APIs to fetch and cache metadata
about all event IDs associated with a provider. It does this the first time it
encounters a provider ID while reading events. Risk: Caching this info could
lead to having stale information in memory if metadata changes via software
update (see Edge Cases).
It caches the names of the event data parameters
and a templatized version of the message string.

To get the data for an event this reader receives EVT_VARIANT structs containing
the parameters rather than receiving and parsing XML. This is more efficient because
there are fewer and smaller memory allocations and no XML encoding or decoding.

To get the message for an event it utilizes the cached text/template it has
for the event ID and passes it the list of parameter values.

Edge Cases

There is no provider metadata installed on the host. Could happen for forwarded
events or reading from .evtx files.

  • Mitigate by falling back to getting parameter names by the event XML and rendering
    the message with EvtFormatMessage for each event.

Software is updated and an event ID changes it's event data parameters. Saw this
between Sysmon versions 9 and 10 with event ID 5.

  • Mitigate by fingerprinting the number of event data parameters and their types.
  • If the fingerprint changes, fetch the XML for the event and store the parameter
    names.

Benchmark Comparison

Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations.

PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v .
--- PASS: TestBenchmarkRead (231.68s)
--- PASS: TestBenchmarkRead/api=wineventlog (53.57s)
--- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s)
bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch
--- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s)
bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch
--- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s)
bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch
--- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s)
bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch
--- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s)
--- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s)
bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch
--- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s)
bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch
--- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s)
bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch
--- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s)
bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch
--- PASS: TestBenchmarkRead/api=eventlogging (13.29s)
bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch
PASS
ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s

Why is it important?

Some users were experiencing a bottleneck with Winlogbeat reading events. Most often this was with the Windows Event Collector architecture where all events were configured to flow into a single machine and single channel. Eventually this could lead to a data loss if the rates were sustained for so long that Winlogbeat could not catch up and log rotation occurred.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files N/A
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • Address XML character escaping.
  • Add the ability to test both the old and new implementation at the same time.
  • Implement include_xml. I documented this as not supported.

Author's Other Notes

  • Possible future enhancements
    • Disable event normalization
    • Change EventLog API to have a callback.
    • Make the reader multi-threaded. The new EventIterator and Renderer are ready for mutli-threading.

How to test this PR locally

Related issues

Screenshots

CPU Flame Graphs

Before
Screen Shot 2020-03-05 at 12 42 23 PM

After
Screen Shot 2020-03-05 at 12 42 47 PM

Mem Flame Graphs

Before
Screen Shot 2020-03-05 at 12 40 39 PM

After
Screen Shot 2020-03-05 at 12 41 51 PM

@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewkroh
Copy link
Member Author

One of the files I was using for testing also needed a backport so I opened: #17090

…#16849)

This PR adds a new event log reader implementation that's behind a
feature flag for now. It achieves higher event throughput than the current
reader by not using XML and by heavily caching static metadata about events.
To enable it add `api` to each event log reader.

```
winlogbeat.event_logs:
- name: Security
  api: wineventlog-experimental
```

The existing reader requests each event as XML and then must unmarshal the XML
document. EvtFormatMessage is used to get the XML document from Windows. Then the
Go stdlib encoder/xml package is used to parse it. Both of these operations are
relatively slow (see golang/go#21823 about encoding/xml).

This new reader utilizes the publisher metadata APIs to fetch and cache metadata
about all event IDs associated with a provider. It does this the first time it
encounters a provider ID while reading events. __Risk: Caching this info could
lead to having stale information in memory if metadata changes via software
update (see Edge Cases).__ It caches the names of the event data parameters
and a templatized version of the message string.

To get the data for an event this reader receives EVT_VARIANT structs containing
the parameters rather than receiving and parsing XML. This is more efficient because
there are fewer and smaller memory allocations and no XML encoding or decoding.

To get the message for an event it utilizes the cached text/template it has
for the event ID and passes it the list of parameter values.

Edge Cases

There is no provider metadata installed on the host. Could happen for forwarded
events or reading from .evtx files.
- Mitigate by falling back to getting parameter names by the event XML and rendering
  the message with EvtFormatMessage for each event.

Software is updated and an event ID changes it's event data parameters. Saw this
between Sysmon versions 9 and 10 with event ID 5.
- Mitigate by fingerprinting the number of event data parameters and their types.
- If the fingerprint changes, fetch the XML for the event and store the parameter
  names.

Benchmark Comparison

Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations.

PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v .
--- PASS: TestBenchmarkRead (231.68s)
    --- PASS: TestBenchmarkRead/api=wineventlog (53.57s)
        --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s)
            bench_test.go:128: 2067.28 events/sec        18283 B/event   182836 B/batch  251 allocs/event        2516 allocs/batch
        --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s)
            bench_test.go:128: 2144.50 events/sec        17959 B/event   1795989 B/batch         250 allocs/event        25020 allocs/batch
        --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s)
            bench_test.go:128: 1888.40 events/sec        17648 B/event   8824455 B/batch         250 allocs/event        125018 allocs/batch
        --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s)
            bench_test.go:128: 2064.14 events/sec        17650 B/event   17650459 B/batch        250 allocs/event        250012 allocs/batch
    --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s)
        --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s)
            bench_test.go:128: 16813.52 events/sec       3974 B/event    39744 B/batch   34 allocs/event         344 allocs/batch
        --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s)
            bench_test.go:128: 28300.30 events/sec       3634 B/event    363498 B/batch  33 allocs/event         3324 allocs/batch
        --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s)
            bench_test.go:128: 28266.73 events/sec       3332 B/event    1666041 B/batch         33 allocs/event         16597 allocs/batch
        --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s)
            bench_test.go:128: 28387.74 events/sec       3330 B/event    3330690 B/batch         33 allocs/event         33127 allocs/batch
    --- PASS: TestBenchmarkRead/api=eventlogging (13.29s)
        bench_test.go:128: 56243.80 events/sec   8043 B/event    6513053 B/batch         31 allocs/event         25151 allocs/batch
PASS
ok      github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s

(cherry picked from commit d81ef73)
@andrewkroh
Copy link
Member Author

jenkins, test this

2 similar comments
@andrewkroh
Copy link
Member Author

jenkins, test this

@andrewkroh
Copy link
Member Author

jenkins, test this

@andrewkroh andrewkroh merged commit 020d42a into elastic:7.x Mar 20, 2020
@andrewkroh andrewkroh deleted the backport_16849_7.x branch January 14, 2022 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants