-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Winlogbeat] Add experimental event log reader with increased performance #16849
[Winlogbeat] Add experimental event log reader with increased performance #16849
Conversation
Pinging @elastic/siem (Team:SIEM) |
15aca79
to
943c6f2
Compare
f2af7a9
to
faa6e80
Compare
jenkins run the tests please |
faa6e80
to
a3cc7a2
Compare
This PR adds a new event log reader implementation that's behind a feature flag for now. It achieves higher event throughput than the current reader by not using XML and by heavily caching static metadata about events. To enable it add `api` to each event log reader. ``` winlogbeat.event_logs: - name: Security api: wineventlog-experimental ``` The existing reader requests each event as XML and then must unmarshal the XML document. EvtFormatMessage is used to get the XML document from Windows. Then the Go stdlib encoder/xml package is used to parse it. Both of these operations are relatively slow (see golang/go#21823 about encoding/xml). This new reader utilizes the publisher metadata APIs to fetch and cache metadata about all event IDs associated with a provider. It does this the first time it encounters a provider ID while reading events. __Risk: Caching this info could lead to having stale information in memory if metadata changes via software update (see Edge Cases).__ It caches the names of the event data parameters and a templatized version of the message string. To get the data for an event this reader receives EVT_VARIANT structs containing the parameters rather than receiving and parsing XML. This is more efficient because there are fewer and smaller memory allocations and no XML encoding or decoding. To get the message for an event it utilizes the cached text/template it has for the event ID and passes it the list of parameter values. Edge Cases There is no provider metadata installed on the host. Could happen for forwarded events or reading from .evtx files. - Mitigate by falling back to getting parameter names by the event XML and rendering the message with EvtFormatMessage for each event. Software is updated and an event ID changes it's event data parameters. Saw this between Sysmon versions 9 and 10 with event ID 5. - Mitigate by fingerprinting the number of event data parameters and their types. - If the fingerprint changes, fetch the XML for the event and store the parameter names. Benchmark Comparison Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations. PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v . --- PASS: TestBenchmarkRead (231.68s) --- PASS: TestBenchmarkRead/api=wineventlog (53.57s) --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s) bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s) bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s) bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s) bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s) --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s) bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s) bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s) bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s) bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch --- PASS: TestBenchmarkRead/api=eventlogging (13.29s) bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch PASS ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s
This explain how and why to enable the experimental implementation. Bug fixes Wrap errors to add more details to the cause Account for sysmon not being installed in tests Increase event log size everywhere Use a unique test event log per package TIL Go will run tests for different packages in parallel.
493e36f
to
02483ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! I couldn't find anything wrong with it, but it could use another pair of eyes.
Also, it needs a changelog entry.
Any reasons why you prefer not to enable it by default?
|
||
// SetLength resets the buffer to a specific length. It may allocate a new | ||
// underlying buffer discarding any existing contents. | ||
func (b *byteBuffer) SetLength(n int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this method is the main reason why you can't use a bytes.Buffer, am I right? I just find the name a bit misleading, what about Resize(n) or Reserve(n)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of those names are better. I'll go with Reserve
.
The odd API comes out of its it dual use as both a generic byte array in Windows API calls and in its use as an io.Writer
in the UTF16 to UTF8 conversions. I struggled with whether or not to leave this as it is. I'm sure another layer could be added to clean up the API.
I couldn't use bytes.Buffer because I needed direct access to the backing slice. And also in the past I found that simple wrapper around a slice was a little faster than bytes.Buffer. The result seems much closer now than I recall.
pkg: github.com/elastic/beats/winlogbeat/sys
BenchmarkByteBuffer/byteBuffer-12 233508294 4.98 ns/op
BenchmarkByteBuffer/bytes.Buffer-12 154466978 7.81 ns/op
PASS
jenkins, test this |
I want it to get some more testing before rolling it out to everyone by default. If everything goes well with it then we can make it the default in a future 7.x release. Then in 8.0 fully remove the old reader. This will provide an opportunity to remove a lot of the old code. |
…#16849) This PR adds a new event log reader implementation that's behind a feature flag for now. It achieves higher event throughput than the current reader by not using XML and by heavily caching static metadata about events. To enable it add `api` to each event log reader. ``` winlogbeat.event_logs: - name: Security api: wineventlog-experimental ``` The existing reader requests each event as XML and then must unmarshal the XML document. EvtFormatMessage is used to get the XML document from Windows. Then the Go stdlib encoder/xml package is used to parse it. Both of these operations are relatively slow (see golang/go#21823 about encoding/xml). This new reader utilizes the publisher metadata APIs to fetch and cache metadata about all event IDs associated with a provider. It does this the first time it encounters a provider ID while reading events. __Risk: Caching this info could lead to having stale information in memory if metadata changes via software update (see Edge Cases).__ It caches the names of the event data parameters and a templatized version of the message string. To get the data for an event this reader receives EVT_VARIANT structs containing the parameters rather than receiving and parsing XML. This is more efficient because there are fewer and smaller memory allocations and no XML encoding or decoding. To get the message for an event it utilizes the cached text/template it has for the event ID and passes it the list of parameter values. Edge Cases There is no provider metadata installed on the host. Could happen for forwarded events or reading from .evtx files. - Mitigate by falling back to getting parameter names by the event XML and rendering the message with EvtFormatMessage for each event. Software is updated and an event ID changes it's event data parameters. Saw this between Sysmon versions 9 and 10 with event ID 5. - Mitigate by fingerprinting the number of event data parameters and their types. - If the fingerprint changes, fetch the XML for the event and store the parameter names. Benchmark Comparison Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations. PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v . --- PASS: TestBenchmarkRead (231.68s) --- PASS: TestBenchmarkRead/api=wineventlog (53.57s) --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s) bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s) bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s) bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s) bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s) --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s) bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s) bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s) bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s) bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch --- PASS: TestBenchmarkRead/api=eventlogging (13.29s) bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch PASS ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s (cherry picked from commit d81ef73)
…#17022) This PR adds a new event log reader implementation that's behind a feature flag for now. It achieves higher event throughput than the current reader by not using XML and by heavily caching static metadata about events. To enable it add `api` to each event log reader. ``` winlogbeat.event_logs: - name: Security api: wineventlog-experimental ``` The existing reader requests each event as XML and then must unmarshal the XML document. EvtFormatMessage is used to get the XML document from Windows. Then the Go stdlib encoder/xml package is used to parse it. Both of these operations are relatively slow (see golang/go#21823 about encoding/xml). This new reader utilizes the publisher metadata APIs to fetch and cache metadata about all event IDs associated with a provider. It does this the first time it encounters a provider ID while reading events. __Risk: Caching this info could lead to having stale information in memory if metadata changes via software update (see Edge Cases).__ It caches the names of the event data parameters and a templatized version of the message string. To get the data for an event this reader receives EVT_VARIANT structs containing the parameters rather than receiving and parsing XML. This is more efficient because there are fewer and smaller memory allocations and no XML encoding or decoding. To get the message for an event it utilizes the cached text/template it has for the event ID and passes it the list of parameter values. Edge Cases There is no provider metadata installed on the host. Could happen for forwarded events or reading from .evtx files. - Mitigate by falling back to getting parameter names by the event XML and rendering the message with EvtFormatMessage for each event. Software is updated and an event ID changes it's event data parameters. Saw this between Sysmon versions 9 and 10 with event ID 5. - Mitigate by fingerprinting the number of event data parameters and their types. - If the fingerprint changes, fetch the XML for the event and store the parameter names. Benchmark Comparison Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations. PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v . --- PASS: TestBenchmarkRead (231.68s) --- PASS: TestBenchmarkRead/api=wineventlog (53.57s) --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s) bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s) bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s) bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s) bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s) --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s) bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s) bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s) bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s) bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch --- PASS: TestBenchmarkRead/api=eventlogging (13.29s) bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch PASS ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s (cherry picked from commit d81ef73)
Hello, what's the right way to report an issue with this experimental flag? I get these log entries in some custom Windows channels created by LogBinder SuperCharger WARN [wineventlog.renderer] wineventlog/renderer.go:319 The number of event data parameters doesn't match the number of parameters in the template. {"channel": "Supercharger-Destination-WEC_ApplicationCrashes/Log", "provider": "Application Error", "event_id": 1000, "event_parameter_count": 13, "template_parameter_count": 15, "template_version": 0, "event_version": 0} WARN [wineventlog.renderer] wineventlog/renderer.go:319 The number of event data parameters doesn't match the number of parameters in the template. {"channel": "Supercharger-Destination-WEC_Firewall/Log", "provider": "Microsoft-Windows-Windows Firewall With Advanced Security", "event_id": 2004, "event_parameter_count": 23, "template_parameter_count": 26, "template_version": 0, "event_version": 0} WARN [wineventlog.renderer] wineventlog/renderer.go:319 The number of event data parameters doesn't match the number of parameters in the template. {"channel": "Supercharger-Destination-WEC_BITSClient/Log", "provider": "Microsoft-Windows-Bits-Client", "event_id": 59, "event_parameter_count": 9, "template_parameter_count": 10, "template_version": 1, "event_version": 1} WARN [wineventlog.renderer] wineventlog/renderer.go:319 The number of event data parameters doesn't match the number of parameters in the template. {"channel": "Supercharger-Destination-WEC_BITSClient/Log", "provider": "Microsoft-Windows-Bits-Client", "event_id": 60, "event_parameter_count": 15, "template_parameter_count": 17, "template_version": 1, "event_version": 1} |
Please open a new issue. See if you can grab the XML content of an event id 1000 sample from |
What does this PR do?
This PR adds a new event log reader implementation that's behind a
feature flag for now. It achieves higher event throughput than the current
reader by not using XML and by heavily caching static metadata about events.
To enable it add
api
to each event log reader.The existing reader requests each event as XML and then must unmarshal the XML
document. EvtFormatMessage is used to get the XML document from Windows. Then the
Go stdlib encoder/xml package is used to parse it. Both of these operations are
relatively slow (see golang/go#21823 about encoding/xml).
This new reader utilizes the publisher metadata APIs to fetch and cache metadata
about all event IDs associated with a provider. It does this the first time it
encounters a provider ID while reading events. Risk: Caching this info could
lead to having stale information in memory if metadata changes via software
update (see Edge Cases). It caches the names of the event data parameters
and a templatized version of the message string.
To get the data for an event this reader receives EVT_VARIANT structs containing
the parameters rather than receiving and parsing XML. This is more efficient because
there are fewer and smaller memory allocations and no XML encoding or decoding.
To get the message for an event it utilizes the cached text/template it has
for the event ID and passes it the list of parameter values.
Edge Cases
There is no provider metadata installed on the host. Could happen for forwarded
events or reading from .evtx files.
the message with EvtFormatMessage for each event.
Software is updated and an event ID changes it's event data parameters. Saw this
between Sysmon versions 9 and 10 with event ID 5.
names.
Benchmark Comparison
Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations.
Why is it important?
Some users were experiencing a bottleneck with Winlogbeat reading events. Most often this was with the Windows Event Collector architecture where all events were configured to flow into a single machine and single channel. Eventually this could lead to a data loss if the rates were sustained for so long that Winlogbeat could not catch up and log rotation occurred.
Checklist
I have made corresponding change to the default configuration filesN/AAuthor's Checklist
include_xml
. I documented this as not supported.Author's Other Notes
How to test this PR locally
Related issues
Screenshots
CPU Flame Graphs
Before
After
Mem Flame Graphs
Before
After