Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML #11006

Merged
merged 6 commits into from
Mar 1, 2019

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Mar 1, 2019

Golang's xml parser is refusing to process XML documents that contain an ASCII control character, resulting in those events being dropped. This patch pre-processes the XML document in order to replace invalid characters with the unicode escape sequence \uNNNN.

adriansr added 2 commits March 1, 2019 03:40
Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".
This updates the wineventlog logic to ingest windows event logs whose
rendering fails.
@adriansr adriansr requested a review from a team as a code owner March 1, 2019 03:10
return output(n)
}

func newXmlSafeReader(rawXML []byte) io.Reader {

This comment was marked as resolved.

@andrewkroh
Copy link
Member

andrewkroh commented Mar 1, 2019

As a side-effect, at batch_size=100 it's 47.2% faster! 🥇

(update: Something else must have been using CPU during the first run b/c I couldn't replicate the results from the first run. Both master and this branch performed about the same.)

PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -v -benchtime 10s -benchtest -benchmem -run TestB
enchmarkBatchReadSize  .
=== RUN   TestBenchmarkBatchReadSize
--- PASS: TestBenchmarkBatchReadSize (96.50s)
    bench_test.go:98: batch_size=10, total_events=30000, batch_time=4.452175ms, events_per_sec=2246.0932016374018, bytes_alloced_per_event=21 kB, total_allocs=7344626
    bench_test.go:98: batch_size=100, total_events=30000, batch_time=43.886812ms, events_per_sec=2278.5888389432344, bytes_alloced_per_event=19 kB, total_allocs=6592877
    bench_test.go:98: batch_size=500, total_events=25000, batch_time=230.723254ms, events_per_sec=2167.098423464503, bytes_alloced_per_event=14 kB, total_allocs=2437639
    bench_test.go:98: batch_size=1000, total_events=30000, batch_time=444.63076ms, events_per_sec=2249.0571727426145, bytes_alloced_per_event=6.9 kB, total_allocs=235204
PASS
ok      github.com/elastic/beats/winlogbeat/eventlog    96.544s
PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -v -benchtime 10s -benchtest -benchmem -run TestB
enchmarkBatchReadSize  .
=== RUN   TestBenchmarkBatchReadSize
--- PASS: TestBenchmarkBatchReadSize (65.73s)
    bench_test.go:98: batch_size=10, total_events=50000, batch_time=3.090045ms, events_per_sec=3236.1988255834463, bytes_alloced_per_event=18 kB, total_allocs=12380608
    bench_test.go:98: batch_size=100, total_events=50000, batch_time=29.812684ms, events_per_sec=3354.276991632152, bytes_alloced_per_event=18 kB, total_allocs=12310026
    bench_test.go:98: batch_size=500, total_events=50000, batch_time=148.310654ms, events_per_sec=3371.301969985245, bytes_alloced_per_event=18 kB, total_allocs=12301886
    bench_test.go:98: batch_size=1000, total_events=50000, batch_time=297.246802ms, events_per_sec=3364.2077669854966, bytes_alloced_per_event=18 kB, total_allocs=12300631
PASS
ok      github.com/elastic/beats/winlogbeat/eventlog    65.767s

@adriansr adriansr changed the title [WIP] [Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML Mar 1, 2019
@adriansr adriansr added bug Winlogbeat needs_backport PR is waiting to be backported to other branches. review labels Mar 1, 2019
@adriansr
Copy link
Contributor Author

adriansr commented Mar 1, 2019

@andrewkroh please have another look to see if you like the changes I did to RenderErr. Now error.message can be a list of strings.

Kill me if I know where that speedup comes from. Buffering? In my tests, decoding the XML was marginally slower than before (from 105 to 125 us per op)

@adriansr
Copy link
Contributor Author

adriansr commented Mar 1, 2019

jenkins, test this

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case passes that causes some invalid XML.

Otherwise it LGTM.

@adriansr adriansr merged commit a6102a8 into elastic:master Mar 1, 2019
@adriansr adriansr added v6.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Mar 1, 2019
adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019
…elastic#11006)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
@adriansr adriansr added the v6.6.2 label Mar 1, 2019
@adriansr adriansr added the v7.2.0 label Mar 1, 2019
adriansr added a commit that referenced this pull request Mar 2, 2019
…#11006) (#11039)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
adriansr added a commit that referenced this pull request Mar 2, 2019
…#11006) (#11040)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
adriansr added a commit that referenced this pull request Mar 2, 2019
…#11006) (#11041)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
adriansr added a commit that referenced this pull request Mar 2, 2019
…#11006) (#11038)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
andrewkroh pushed a commit to andrewkroh/beats that referenced this pull request Mar 4, 2019
…elastic#11006)

* [Winlogbeat] Escape control characters in XML

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
andrewkroh added a commit that referenced this pull request Mar 4, 2019
…#11006) (#11066)

* [Winlogbeat] Escape control characters in XML

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328
adriansr added a commit that referenced this pull request Mar 21, 2019
Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328
adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)
adriansr added a commit that referenced this pull request Mar 21, 2019
Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)
adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)
adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)
adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)
adriansr added a commit that referenced this pull request Mar 21, 2019
Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)
adriansr added a commit that referenced this pull request Mar 22, 2019
…1372)

Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)
adriansr added a commit that referenced this pull request Mar 28, 2019
…1370)

Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…elastic#11006) (elastic#11039)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit 6a078f5)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…ces (elastic#11370)

Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 5db0f15)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…elastic#11006) (elastic#11066)

* [Winlogbeat] Escape control characters in XML

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…ces (elastic#11372)

Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants