Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: ECS compliant NAGIOSLOGLINE captures #276

Merged
merged 4 commits into from
Dec 21, 2020

Conversation

kares
Copy link
Contributor

@kares kares commented Jul 31, 2020

Here I have mostly struggled with getting the naming right - too much "state"/"type" use in Nagios terminology.
Have decided to name "state status" as status (previously nagios_state) and state type as state_type (legacy name nagios_statetype)
The structure is rather flat to mirror Nagios (introducing state: { type: "..." } would be an option but the terminology would not follow that well). It's a bit of a compromise, sample capture:

p.s. Some of the names were incorrect those have been fixed and specs should clearly reflect the change.

[1427925600] CURRENT HOST STATE: nagioshost;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 2.24 ms

  "nagios_epoch"=>"1427925600", 
  "nagios_type"=>"CURRENT HOST STATE", 
  "nagios_hostname"=>"nagioshost", 
  "nagios_state"=>"UP", 
  "nagios_statetype"=>"HARD", 
  "nagios_statecode"=>"1", # incorrect -> simply an attempt counter
  "nagios_message"=>"PING OK - Packet loss = 0%, RTA = 2.24 ms", 
  "timestamp"=>"1427925600", 
  "nagios"=>{
    "log"=>{
      "type"=>"CURRENT HOST STATE", 
      "status"=>"UP", 
      "state_type"=>"HARD",
      "attempt"=>1, 
    }
  }, 
  "host"=>{"hostname"=>"nagioshost"},
  "message"=>"PING OK - Packet loss = 0%, RTA = 2.24 ms" ...

NOTE: now that I looked at ^^^ again I am going to revisit naming once again as the trio feels ackward:

      "type"=>"CURRENT HOST STATE", 
      "status"=>"UP", 
      "state_type"=>"HARD",

... maybe rename status -> state and potentially come up with a better name for the type

@kares
Copy link
Contributor Author

kares commented Jul 31, 2020

Should be good for an (ECS) naming review, if anyone finds time to do so 🙇 // @webmat @ebeahan

@kares kares mentioned this pull request Aug 4, 2020
21 tasks
@ebeahan
Copy link

ebeahan commented Aug 5, 2020

Would there be an opportunity to leverage the service.* field set at all? For example, could [nagios][log][service] be mapped as [service][name]?

@webmat might have some better guidance on whether if we've ever advised using service.state for something like Nagios UP/DOWN states.

@kares
Copy link
Contributor Author

kares commented Aug 6, 2020

Would there be an opportunity to leverage the service.* field set at all? For example, could [nagios][log][service] be mapped as [service][name]?

that should do thanks.
have tried to keep everything under nagios.* since there's usually 2 types of the same kind of log lines e.g.
CURRENT SERVICE STATE: localhost;Current Load;OK;HARD;1;OK - load average: 2.54, 1.58, 0.86
CURRENT HOST STATE: s-host08;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 1.58 ms

service names are often nagios "internal" e.g. service.name => "Current Load" in this case ...

@webmat might have some better guidance on whether if we've ever advised using service.state for something like Nagios UP/DOWN states.

thing to consider is whether we would use different state labels, which might be confusing, for the 2 mentioned log lines.
(there's no service.name for the CURRENT HOST STATE log entry, are we okay with using service.state there?)

@webmat
Copy link

webmat commented Aug 6, 2020

Yes, I think service.name and service.state would both fit well for external checks. E.g. ssh or varnish from the tests:

{ "service": { "name": "Varnish Backend Connections", "state": "CRITICAL" }

👍

Since these groks don't come with a pipeline, the host checks are a bit tricky though. I think "PING [hostname]" or equivalent could be an appropriate service name in this case. But you can't fill that only with a grok. But I would use service.state for the status check nonetheless. I think a user these groks will be able to fill service.name in their pipeline to get the desired consistency.

@webmat
Copy link

webmat commented Aug 6, 2020

Everything else makes sense to me 👍

@kares kares requested a review from yaauie August 10, 2020 18:24
Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to just about everything here. Yet again, the field names are really well thought-through, and the additional spec coverage is super useful.

I noticed one weirdness with a spec that anticipates the captured message being appended onto the existing message, and would appreciate confirmation that my reasoning is sound. We may benefit from the plain and lines codecs being able to capture directly to log.original in their own ECS modes.

end

it "generates the nagios_message field" do
expect(grok).to include("nagios_message" => "PING OK - Packet loss = 0%, RTA = 2.24 ms")
if ecs_compatibility?
expect(grok).to include("message" => [message, "PING OK - Packet loss = 0%, RTA = 2.24 ms"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this artifact feels a little weird to me. I assume this is because we are matching on [message] so a capture appends to the existing message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we encourage people to move the inbound message to log.original before running grok as a general best practice? I assume that would solve this weirdness.

Copy link
Contributor Author

@kares kares Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a general pattern we're following here when a message field is to be replaced due ECS.
This is present on other patterns as well that do 'message' replacement".

Not sure at what point to "fix" this, was mostly thinking grok would eventually run with overwrite => [ "message" ]. Since this is fairly common pattern, we could adopt overwrite => [ "message" ] as a default (in ECS mode).

Although I do not like we would have different defaults for a plugin depending on ecs_compatibility (but there's already a precedence for that in the ES output plugin).
I was thinking the .original copy could happen "earlier" (on event creation) when using an ECS factory ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah copying or moving the original payload in the beginning is a good idea. Note however that ECS unfortunately has 2 such fields, log.original and event.original. At the next major we're likely to remove log.original, so I would recommend using event.original instead. Issue for deprecation/removal is here elastic/ecs#841

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have Event.new("message" => message) and "match" => [ "message" => .. ] hard-coded in spec helper, and also due testing in legacy mode, this seems sufficient for now moving forward.

have added the concern to our meta-issue's post TODO points, as when it gets addressed at a higher level we could get back here and refactor the helpers.

@kares kares merged commit beeb8b6 into logstash-plugins:ecs-wip Dec 21, 2020
@kares kares changed the title Feat: make NAGIOSLOGLINE captures ECS compliant Feat: ECS compliant NAGIOSLOGLINE captures Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants