Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nagios] Add Nagios package #2824

Closed
wants to merge 5 commits into from

Conversation

kush-elastic
Copy link
Collaborator

@kush-elastic kush-elastic commented Mar 15, 2022

  • Enhancement

What does this PR do?

  • Generated the skeleton of Nagios integration package.
  • Added 3 data stream (Logs, Host and Service)
  • Added data collection logic for the data streams.
  • Added the ingest pipeline for the data streams.
  • Mapped fields according to the ECS schema and added Fields metadata in the appropriate yaml files.
  • Added dashboards and visualizations.
  • Added system test cases for the data stream.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

  • Clone integrations repo.
  • Install elastic-package locally.
  • Start elastic stack using elastic-package.
  • Move to integrations/packages/nagios directory.
  • Run the following command to run tests.
    elastic-package test

Related issues

Screenshots

Integration Page
Integration Page

Nagios XI Overview Dashboard
Metrics Nagios XI Overview Dashboard

@kush-elastic kush-elastic requested a review from a team as a code owner March 15, 2022 12:43
@mtojek mtojek requested a review from a team March 15, 2022 12:45
@elasticmachine
Copy link

elasticmachine commented Mar 15, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-31T08:05:03.004+0000

  • Duration: 16 min 40 sec

Test stats 🧪

Test Results
Failed 0
Passed 34
Skipped 0
Total 34

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much experience with Nagios, especially from the configuration site and I'm not following the product since a few years. Based on your PR, I assume that we're tracking the "Nagios XI" integration, not the "Nagios"? Should we rename the integration to "nagios_xi"?

The first round, so mostly general comments:

  1. Please rename "logs" to "events" as this is about catching notifications and alerts.
  2. Are there more commands that Nagios provides, not just the "host check"? If so, maybe we need to add the "commands" data stream instead.
  3. Could we split the "services" into a few different data streams? I appreciate the idea of having a multi-select box, but we'd rather go with multiple data streams if possible. Feel free to propose any way of dividing them.

@kush-elastic kush-elastic self-assigned this Mar 15, 2022
@kush-elastic kush-elastic added enhancement New feature or request Team:Integrations Label for the Integrations team New Integration Issue or pull request for creating a new integration package. labels Mar 15, 2022
@elasticmachine
Copy link

Pinging @elastic/integrations (Team:Integrations)

@kush-elastic
Copy link
Collaborator Author

Hey Marcin,
Thanks for the Review.
We found a few things while changes were requested in Demo Call.

Please rename "logs" to "events" as this is about catching notifications and alerts.

Done

Are there more commands that Nagios provides, not just the "host check"? If so, maybe we need to add the "commands" data stream instead.

There is no more host check command other than the two we have added support for other commands are for services so I guess we don't need to change the data_stream name.

Could we split the "services" into a few different data streams? I appreciate the idea of having a multi-select box, but we'd rather go with multiple data streams if possible. Feel free to propose any way of dividing them.

So as we discussed over the same that we decided to have a single data_stream.
The suggestion was to use toggle buttons with fix 8 types mentioned and have all check commands based on selected toggles.
But the problem is we can not pass multiple values in the query. Check commands can also be dynamic (check_xi_service_ping!3000.0!80%!5000.0!100%). and Current API doesn't support this kind of query (query: check_command=lk:check_xi_service_ping).
we tried this : check_command=in:check_xi_service_ping but it also doesn't support multiple queries in a single call.
So we have to use service names in queries. (display_name=in:HTTP,Current Users,etc.....)

@kush-elastic kush-elastic requested a review from mtojek March 17, 2022 12:18
@mtojek
Copy link
Contributor

mtojek commented Mar 17, 2022

Thanks for the update, @kush-elastic.

So as we discussed over the same that we decided to have a single data_stream.
The suggestion was to use toggle buttons with fix 8 types mentioned and have all check commands based on selected toggles.
But the problem is we can not pass multiple values in the query. Check commands can also be dynamic (check_xi_service_ping!3000.0!80%!5000.0!100%). and Current API doesn't support this kind of query (query: check_command=lk:check_xi_service_ping).
we tried this : check_command=in:check_xi_service_ping but it also doesn't support multiple queries in a single call.
So we have to use service names in queries. (display_name=in:HTTP,Current Users,etc.....)

I'm wondering if there is an option to perform multiple queries, one per selected item. Having it defined as the query isn't the best user experience as @akshay-saraswat raised yesterday, it's error prone. Maybe we should split it into different data streams?

Let's collect some feedback around this - @ruflin, @jsoriano WDYT? You can take a look at the "query" on the screenshot above.

@ruflin
Copy link
Collaborator

ruflin commented Mar 17, 2022

Is there a limit to the nagios services metrics? How many are there? Do you have a link to the query language by chance?

@jsoriano
Copy link
Member

You can take a look at the "query" on the screenshot above.

What are these queries? Are these names of checks defined in Nagios?

@kush-elastic
Copy link
Collaborator Author

Hey @ruflin,
You can try the demo version which the official Nagios XI provides for a query language reference. Object Reference
Here's link.

@kush-elastic
Copy link
Collaborator Author

Hey @jsoriano,
We are using display_name from the below response.
https://nagiosxi.demos.nagios.com/nagiosxi/api/v1/objects/servicestatus?apikey=7I4CZHGQSbbE79Isaop9LG4IRkpFOC6bAaMP7oVYC7TZvBrJYT80PcIWvnCEgX56&pretty=1&display_name=in:SSH,Current Users

{
    "recordcount": 1,
    "servicestatus": [
        {
            "host_name": "localhost",
            "service_description": "Current Load",
            "display_name": "Current Load",
            "host_object_id": "145",
            "host_address": "127.0.0.1",
            "host_alias": "localhost",
            "icon_image": "",
            "icon_image_alt": "",
            "notes": "Testing 123",
            "notes_url": "",
            "action_url": "",
            "servicestatus_id": "21",
            "instance_id": "1",
            "service_object_id": "157",
            "status_update_time": "2020-04-06 14:16:30",
            "output": "OK - load average: 0.26, 0.32, 0.30",
            "long_output": "",
            "perfdata": "load1=0.260;5.000;10.000;0; load5=0.320;4.000;6.000;0; load15=0.300;3.000;4.000;0;",
            "current_state": "0",
            "has_been_checked": "1",
            "should_be_scheduled": "1",
            "current_check_attempt": "1",
            "max_check_attempts": "4",
            "last_check": "2020-04-06 14:16:30",
            "next_check": "2020-04-06 14:21:30",
            "check_type": "0",
            "check_options": "0",
            "last_state_change": "2020-02-01 16:10:38",
            "last_hard_state_change": "2020-02-01 16:10:38",
            "last_hard_state": "0",
            "last_time_ok": "2020-04-06 14:16:30",
            "last_time_warning": "1969-12-31 18:00:00",
            "last_time_unknown": "1969-12-31 18:00:00",
            "last_time_critical": "1969-12-31 18:00:00",
            "state_type": "1",
            "last_notification": "1969-12-31 18:00:00",
            "next_notification": "1969-12-31 18:00:00",
            "no_more_notifications": "0",
            "notifications_enabled": "1",
            "problem_has_been_acknowledged": "0",
            "acknowledgement_type": "0",
            "current_notification_number": "0",
            "passive_checks_enabled": "1",
            "active_checks_enabled": "1",
            "event_handler_enabled": "1",
            "flap_detection_enabled": "1",
            "is_flapping": "0",
            "percent_state_change": "0",
            "latency": "0.00219999998807907",
            "execution_time": "0.00254",
            "scheduled_downtime_depth": "0",
            "failure_prediction_enabled": "0",
            "process_performance_data": "1",
            "obsess_over_service": "1",
            "modified_service_attributes": "0",
            "event_handler": "",
            "check_command": "check_local_load!5.0,4.0,3.0!10.0,6.0,4.0\\!test!!!!!!",
            "normal_check_interval": "5",
            "retry_check_interval": "1",
            "check_timeperiod_object_id": "129"
        }
    ]
}

Users have to specify the Services Names that they want to monitor.
image

@kush-elastic
Copy link
Collaborator Author

Hey @mtojek,
If we split data_streams with commands, then we have this many commands. and we will also have to create as many data_streams. and in the future, if we want to support more data_streams. we will have to add as many data_streams.

check_local_users
check_local_load
check_ssh
check_xi_service_ssh
check_ping
check_xi_service_ping
check_local_swap
check_local_procs
check_http
check_xi_service_http
check_local_disk

What do you think we should do??

@mtojek
Copy link
Contributor

mtojek commented Mar 21, 2022

Don't worry about the number of data streams, it's still fine. Take a look at the zeek package :)

@jsoriano
Copy link
Member

Current approach of filtering by display name can be error prone, and it can be also seen as arbitrary to some users that may prefer to filter by checks or hosts.

Having datastreams per check may not fit so well for something like nagios, where it is very likely that the customer has custom checks. Nagios checks are basically scripts, and someone relying on nagios for monitoring probably has custom scripts for their custom applications. We wouldn't be supporting this common use case. Or we would need something different for these use cases, what can be a bad user experience.

Have we considered to start by collecting everything by default? This is what we use to do when collecting data from other monitoring systems (see prometheus, statsd or munin).

Then we can add specific fields for specific filters. For example it seems possible to filter per host using the host_address field (example). We could have an optional field to filter per host. Display name could be another optional filter.
I haven't found a reliable way to filter per command when they have parameters, but there may be one.

With this kind of filters we could also leverage autodiscover at some point. There could be a nagios autodiscover provider that queries the inventory of hosts there, and generates one config per monitored host, with the possibility of using conditions, variables and all the available features.

Btw, another question I have is how are we handling very big API responses? I guess that the size of the responses may easily grow for users with hundreds or thousands of hosts.

@mtojek mtojek requested a review from ruflin March 29, 2022 08:42

## Compatibility

This module has been tested against `Nagios-XI Version: 5.8.7`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This integration

@@ -0,0 +1,14 @@
- name: nagios_xi.event
type: group
release: beta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop this property.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Thanks.

field: json.last_check
formats:
- yyyy-MM-dd HH:mm:ss
target_field: nagios_xi.metrics.host.last_check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is fishy here. Why do we need the . metrics. group in the middle? It should be nagios_xi.host.....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusion, we updated logs to events. to avoid conflicts we added metrics before host and services. as we have events now we will update this with your suggestion.
Thanks

type: group
release: beta
fields:
- name: host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please re-check these fields against ECS? I guess you can map some of them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will check it out.
Thanks

type: group
release: beta
fields:
- name: service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story here: please check these fields against ECS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

- monitoring
release: beta
conditions:
kibana.version: ^8.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that you can pair it with next release of Kibana, ^8.2.0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @akshay-saraswat ,
What are your thoughts on this? Should we just keep the next Kibana release i.e. 8.2.0. If so, we would have to test it on 8.2.0 image since as we discussed, we have just tested it on 8.0.0 and 8.1.0.

Copy link
Contributor

@akshay-saraswat akshay-saraswat Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8.2.0 sounds okay to me. If we release it with 8.2.0, we anyways would have to test it with that version. If we keep the minimum Kibana version 8.2.0, we will have fewer versions to consider for backward compatibility and support.

icons:
- src: /img/nagios_xi-logo.svg
title: Nagios XI logo
size: 279x187
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. could you please use a standard-sized logo :) like 256x256 or 512x512.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

- name: api_key
type: text
title: API Key
description: "API key for the Nagios XI API. (Location of API_KEY in Nagios XI: Click on your username on the top right corner of Nagios XI Panel -> My Account -> Account Information -> General Account Settings -> API Key)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -0,0 +1,434 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about splitting it into following dashboards:

  1. Overview
  2. Service dashboard
  3. Events dashboard
  4. Hosts dashboard
    ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have much visualizations in Nagios XI as most of them are metrics only.
What do you think of current dashboard as it has more compact and full view of current status? (I have Updated SS in PR description)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of current dashboard as it has more compact a

  1. I would say it's packed with too many information at once. For example, I'd like to see a screen with logged events and the percentage of incoming ones, to see which are the most frequent.
  2. What does the HTTP response size mean? and response time? is it related to a particular endpoint?
  3. What kind of hosts does it present?

@akshay-saraswat
Copy link
Contributor

For dashboards, I have the same suggestion that Marcin said during the demo. Let's move the Event and Host visualizations somewhere after the service visualizations because those are lists of logs and IP addresses that may not be apt for an overview dashboard that is constantly displayed on a TV in an office.

@ruflin
Copy link
Collaborator

ruflin commented Mar 30, 2022

Can we please follow the same approach as recommended in #2811 (comment) here and split up the work to data streams?

@kush-elastic kush-elastic marked this pull request as draft March 30, 2022 18:22
@kush-elastic kush-elastic marked this pull request as draft March 30, 2022 18:22
@kush-elastic
Copy link
Collaborator Author

kush-elastic commented Mar 31, 2022

@ruflin
Copy link
Collaborator

ruflin commented Mar 31, 2022

Is my understanding correct that the one we should start with reviewing is #2940 Lets set the other 2 to draft until we have this one in to make sure we don't review things twice. As soon as the base skeleton is in, the PR's can be opened in parallel.

@kush-elastic
Copy link
Collaborator Author

Yes @ruflin, Right you can start reviewing from #2940.
I have set other two PRs as Draft PR.
Thanks

@kush-elastic
Copy link
Collaborator Author

kush-elastic commented May 9, 2022

Closing this PR as it was split up into multiple PRs as mentioned in the comment #2824 (comment). All the parts are now merged and the linked issue (#2626) has been closed.

Thanks a lot @mtojek, @ruflin, @jsoriano, @akshay-saraswat and @lalit-satapathy for taking out time to review the PRs and providing valuable feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request New Integration Issue or pull request for creating a new integration package. Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants