Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Event Log Receiver doesn't collect some legacy events correctly #35135

Closed
dharley-atlastix opened this issue Sep 11, 2024 · 5 comments · Fixed by #35227
Closed

Windows Event Log Receiver doesn't collect some legacy events correctly #35135

dharley-atlastix opened this issue Sep 11, 2024 · 5 comments · Fixed by #35227
Labels
bug Something isn't working needs triage New item requiring triage os:windows receiver/windowseventlog

Comments

@dharley-atlastix
Copy link

Component(s)

receiver/windowseventlog

What happened?

Description

When the Windows Event Log Receiver encounters an older style event, it is unable to correctly ingest the event details as the vendor xml template is seemingly missing. This "appears" to be because the source of the event uses some other legacy templating system (DLL) to create the event, and the receiver is setup to expect that a Vendor XML Template will always exist. Assuming the XML lookup is used because it is more efficient than directly rendering, it maybe best to implement a fallback strategy to ensure all data is captured using an alternative rendering API.

Steps to Reproduce

Setup a windows event log receiver and capture the events generated by the following built in Windows commands:
diskpart
chkdsk /c
This has also been observed from events generated by the DHCPServer service and VSS, however recreating these conditions is more complicated.

Expected Result

No error should be logged about a missing template as the event is structured correctly in the Windows Event Log viewer.
All fields should be populated the same as the Event Log viewer and per other events extracted via the receiver.

Actual Result

The following error is logged:

2024-09-11T12:01:22.070+1000    error   windows/input.go:283    Failed to render formatted event        {"kind": "receiver", "name": "windowseventlog/system", "data_type": "logs", "operator_id": "windows_eventlog_input", "operator_type": "windows_eventlog_input", "error": "syscall to 'EvtFormatMessage' failed: The message ID for the desired message could not be found."}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).processEvent
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/input/windows/input.go:283
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).read
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/input/windows/input.go:223
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).readToEnd
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/input/windows/input.go:198
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/windows.(*Input).readOnInterval
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/input/windows/input.go:186

Fields such as the Message, Keywords, Opcode contain missing or unexpected data.

{
    "channel": "System",
    "computer": "SVRSYDDHCP01",
    "event_data": {
        "data": [
            {
                "": "@2010001"
            }
        ]
    },
    "event_id": {
        "id": 4,
        "qualifiers": 16896
    },
    "execution": {
        "process_id": 0,
        "thread_id": 0
    },
    "keywords": [
        "0x80000000000000"
    ],
    "level": "4",
    "message": "",
    "opcode": "0",
    "provider": {
        "event_source": "",
        "guid": "",
        "name": "Virtual Disk Service"
    },
    "record_id": 29962,
    "system_time": "2024-09-11T02:01:21.5315561Z",
    "task": "0"
}

image

The Provider used to contain incomplete data in 0.107 and below, but the recent change to rendering the name using system properties appears to have addressed this.
#34914

Collector version

v0.109.0

Environment information

Environment

OS: ("Windows 2022")
Compiler(n/a)

OpenTelemetry Collector configuration

receivers:
  windowseventlog/application:
    channel: Application
    poll_interval: 1s
    exclude_providers: [VSS]
  windowseventlog/system:
    channel: System
    poll_interval: 1s
  windowseventlog/security:
    channel: Security
    poll_interval: 1s
    exclude_providers: [Service Control Manager]

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [windowseventlog/application, windowseventlog/system, windowseventlog/security]
      exporters: [debug]

Log output

No response

Additional context

No response

@dharley-atlastix dharley-atlastix added bug Something isn't working needs triage New item requiring triage labels Sep 11, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@pjanotti
Copy link
Contributor

/label os:windows

@pjanotti
Copy link
Contributor

pjanotti commented Sep 16, 2024

@dharley-atlastix as you said the receiver should actually fallback to the non-formatted event rendering when the event source is empty - at least the JSON that you attached and the event that I tested had an empty event source. If there is no source there isn't much to be "formatted" anyway. That said I couldn't repro the issue, as if the EvtFormatMessage API automatically does the fallback in my case. I ran chkdsk /c on Windows 11 and both the Event Viewer and the collector were able to render the event properly and no error logged by the collector. It seems that the box used in the report behaves different than the one that I used. Could you check if the collector logged any error opening an event source provider? The error should contain the failed to open the metadata for the string.

Anyway, this behavior can be changed on the collector to avoid the issue.

For reference the chkdsk /c event captured and processed on a Windows 11 box:

image

JSON captured by the collector:

{
    "channel": "Application",
    "computer": "<computer_name>",
    "event_data": {
        "binary": "00BB570053625100016C8D0000000000F20300004CCC0B000000000000000000",
        "data": [
            {
                "": "<chkdsk /c output omitted for brevity>"
            }
        ]
    },
    "opcode": "Info",
    "provider": {
        "event_source": "",
        "guid": "",
        "name": "Chkdsk"
    },
    "record_id": 3917664,
    "system_time": "2024-09-16T20:45:49.4408698Z",
    "task": "0"
}

@dharley-atlastix
Copy link
Author

@pjanotti Your interesting results made me go back and take a look in to the issue a bit further. Testing on Windows 11 as per your method yielded no issues, likewise testing an RTM Server 2016 was the same.
The test machines that are exhibiting and issue are all Server 2022 RTM (minimal test images with no updates). Updating the machines the the latest CU has resolved the issue. So this appears to be some kind of Windows bug or environmental issue, which if I'm honest, I don't intend to spend anymore time looking in to unless it pops it head up somewhere else.

Interestingly I tried test rendering the message field using a EvtFormatMessageEvent parameter against the a modifed evtFormatMessage function, and it returned the message data as per the eventlog. This suggest its possible to build a method to fallback when there when ErrorEvtMessageIdNotFound is returned (such as no vendor template), but for now that seems more like a feature request outside the scope of this ticket.

@pjanotti
Copy link
Contributor

Thanks for looking back at all these combinations @dharley-atlastix - this is useful information! I appreciate that.

Given that it happened in your deployment and we can workaround the issue in a transparent fashion, I'm going ahead with PR #35227. When that is released you shouldn't see the issue even on the 2022 RTM with no updates.

jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…ider (open-telemetry#35227)

**Description:**
When the event doesn't have a Publisher we should not rely on the
behavior of `EvtFormatMessage` API, given, that in some cases it reports
error. Instead we should fallback to the non-formatted message to avoid
logging error messages on the collector. See issue open-telemetry#35135.

**Link to tracking Issue:**
Fix open-telemetry#35135

**Testing:**
Local validation of the processing of the events reported in the issue.

**Documentation:**
N/A

---------

Co-authored-by: Daniel Jaglowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage os:windows receiver/windowseventlog
Projects
None yet
2 participants