- Overview
- Module Description - What the module does and why it is useful
- Usage - Configuration options and additional functionality
- Examples
- Limitations - OS compatibility, etc.
- Development - Guide for contributing to the module
This module installs and manages Nagios, NRPE, NSCA, BPI and PNP4Nagios to give you a full monitoring stack.
While Nagios itself is not too complex, a full stack installation includes a number of optional components. Let's have a look at the terminology - if you are new to Nagios you should definitely read and understand the definitions before attempting to use this module.
This module is quite opinionated about how Nagios should be set up. I've made it as configurable as I can without deviating from the model that I believe is best, which has been extensively tested in our local environment before publishing. It makes assumptions about how you want to group things that mean you can start benefiting from Nagios quickly without having to set too much up.
This module makes heavy use of Puppet exported resources to configure Nagios. You must have a working Puppet and PuppetDB environment with exported resources before using this module.
Warning: This module uses puppetlabs/apache
to configure the web frontend. Be aware that puppetlabs/apache
will purge all other Apache config that is not managed with puppetlabs/apache
. This Nagios module with play nicely with other web sites configured with apache::vhost
but it will break anything else that has been configured manually.
Nagios is the name of the main monitoring application, and it includes a web application and a backend daemon. The daemon does the actual monitoring by executing plugins which send probes to clients, and then displaying the results in the web application or sending them via notifications.
Be careful with the terminology: here we use server to refer to the Nagios server, and client to refer to the Nagios clients, even though they may be servers in their own right.
+--------+ +--------+
| Nagios | ---> | Client |
+--------+ +--------+
While Nagios is good at sending probes to clients that are offering services (e.g. sending HTTP requests to web servers) it needs something extra to probe non-public aspects of a client, e.g. checking CPU usage.
To achieve this, we run the NRPE daemon on the client which listens for the server and executes plugins to probe the local system. The Nagios server probes NRPE on the client which runs the plugin and returns the result to Nagios.
+--------+ +--------+ +--------+
| Nagios | ---> | NRPE | ---> | Client |
+--------+ +--------+ +--------+
NSCA works the other way round from NRPE. NSCA runs on the server and listens for clients to submit passive checks to Nagios on their own schedule (e.g. via cron) rather than the Nagios server initiating the probes.
+--------+ +--------+ +--------+
| Nagios | <--- | NSCA | <--- | Client |
+--------+ +--------+ +--------+
BPI (Business Process Intelligence) is an addon for Nagios which is able to model real-world applications based on a set of probes. For example: you may have a cluster of 2 web servers and so long as either server is up, the overall service is up. You might not care if only one server is down. BPI uses logic like this to work out if your real services are up or down and send appropriate alerts.
Some Nagios plugins return performance data as well as a status code. Out of the box, Nagios can't do anything with this data, but PNP4Nagios can process this data with RRD and automatically draw graphs.
This module is designed so the base class ::nagios
configures a Nagios monitoring server. Other classes are available such as ::nagios::client
which configures a Nagios client to be monitored. There are also some defined types which should be directly called where necessary to configure extras.
The ::nagios
class installs a Nagios monitoring server and related components.
Install components to run a Nagios client, i.e. a server that is monitored. Default: true
Install components to run a Nagios monitoring server. Default: false
Install support for NRPE, which is required if you want to execute Nagios checks on remote servers (clients). Default: false
Install support for NSCA, which is required if you want to execute passive Nagios checks. Default: false
Manage SELinux rules to allow Nagios components to run properly on the clients and server. Strongly recommended if you are running a Red Hat family distro, and SELinux is enabled on your system. Requires puppet/selinux
. Default: false
Manage firewall rules on Nagios clients and server. Strongly recommended to allow Nagios components to work properly. Caution: firewall rules are managed by puppetlabs/firewall
. That module purges any firewall rules that are not managed with puppetlabs/firewall
so be extremely careful before enabling this option. Default: false
Override the hostname that your Nagios server will run on, if you don't want it to run on the server's $::fqdn
. Default: $::fqdn
Array of alternative hostnames that your Nagios server should respond to. Don't forget to set these as alternate names in your SSL certificate. Default: []
Set a flag to mark this Nagios server as a development/testing server. This suppresses active notifications from Nagios. Default: false
Server admin email address for use by Apache. Default: root@localhost
Whether to send Nagios host and service notifications to $serveradmin
. Default: false
Whether to automatically add this client to a hostgroup of its OS type. Default: true
Whether to automatically add this client to a hostgroup of its hardware/virtualised platform. Default: true
Array of other hostgroups to add the system to. Default: []
Name of a parent object. Default: undef
Set alias for a host. Default: undef
Name of the NRPE package. You shouldn't need to override this. If you need to add support for a new distro, please send a pull request or raise an issue.
Location of the webroot on the filesystem. If you need to add support for a new distro, please send a pull request or raise an issue.
Location of the CGI root on the filesystem. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NSCA client package. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NRPE service. If you need to add support for a new distro, please send a pull request or raise an issue.
Path to the NRPE config file. If you need to add support for a new distro, please send a pull request or raise an issue.
Path to the NRPE conf.d directory. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NRPE plugin package. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NSCA server package. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NSCA service. If you need to add support for a new distro, please send a pull request or raise an issue.
Path to the NSCA config file. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the Nagios package. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the Nagios service. If you need to add support for a new distro, please send a pull request or raise an issue.
The ::nagios::client
class installs components needed for a system to be monitored by a Nagios monitoring server.
Whether to enable support for NRPE. Default: true
Whether to enable support for NSCA. Default: true
Whether to manage SELinux policies to allow plugins to execute properly via NRPE. Default: true
Whether to manage firewall rules to allow plugin to execute properly via NRPE. Default: true
Whether to set up a basic set of checks that should work on all systems (e.g. ping). Default: true
Name of the NRPE client package. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NSCA client package. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NRPE service. If you need to add support for a new distro, please send a pull request or raise an issue.
Path to the NRPE config file. If you need to add support for a new distro, please send a pull request or raise an issue.
Path to the NRPE conf.d directory. If you need to add support for a new distro, please send a pull request or raise an issue.
Name of the NRPE plugin package. If you need to add support for a new distro, please send a pull request or raise an issue.
Path to SSL server certificate. Default: /path/to/cert.crt
Path to SSL private key. Default: /path/to/key.key
Path to SSL certificate chain file. Default: undef
Allowed SSL ciphers. Defaults to a more secure list than ships with puppetlabs/apache
. Default: HIGH:!MEDIUM:!aNULL:!MD5:!RC4:!3DES
The ::nagios::service
defined type installs a service, a command and other related components required to monitor something.
Hostname of the system that the check should be associated with. Default: $::fqdn
Override the name of the check command in the service definition. Default: $title
Human-readable name for the service.
Name of the Nagios template to inherit from. Default: undef
One or more additional servicegroups that this service should be a member of. It will automatically be added to a
servicegroup with the same name as the check. Default: undef
Whether to automatically create the servicegroup that this service belongs to by default. Default: true
Whether to automatically add a service dependency on NRPE, if this service is a NRPE-based check. Default: true
Whether to override active checks. Default: undef
Whether to override the maximum number of check attempts before reporting hard state. Default: undef
Override check freshness. Probably only useful for passive checks. Default: undef
Override freshness threshold. Probably only useful for passive checks. Default: undef
The command line used to execute the plugin. The default can be used only if no arguments are required. Default: $check_command
Override the check interval on a per-service basis. This is usually inherited from a template with use
. Default: undef
Whether to execute this check on the monitored host via NRPE. Default: false
Whether to use sudo when executing this check. Default: false
The username to use when executing plugins with sudo when $use_sudo = true
. Default undef
Whether to install the Nagios plugin on the system. Default: true
Provider for the plugin installation, if $install_plugin = true
. Default: package
Source for installation of the plugin if $install_plugin = true
. Default: undef
Add arbitrary service dependencies on other services on this host. Default: undef
The hostname of the Nagios server that will be monitoring this host. Default: hiera('nagios_server')
The ::nagios::bpi::config
defined type configures a BPI "service", i.e. a group or one or more
monitored objects in Nagios. The title of this resource forms the BPI groupID and must be alphanumeric
characters with no spaces. This ID is used internally by the program as well as for the check_bpi.php
plugin.
This can be a bit confusing to configure, especially the members
option, so it is probably
best to read the examples below.
# Group of DNS servers created by checking the `DNS` Nagios service on all DNS servers.
# If one or more DNS servers is up, this group counts as up.
nagios::bpi::config { 'dns':
displayname => 'DNS',
members => [
{
host => 'dns1.example.com',
service => 'DNS',
opt => '&',
},
{
host => 'dns2.example.com',
service => 'DNS',
opt => '&',
},
],
priority => 2,
primary => 0,
}
# Group of DHCP servers created by checking the `DHCP` Nagios service on all DHCP servers.
# If one or more DHCP servers is up, this group counts as up.
nagios::bpi::config { 'dhcp':
displayname => 'DHCP',
members => [
{
host => 'dhcp1.example.com',
service => 'DHCP',
opt => '&',
},
{
host => 'dhcp2.example.com',
service => 'DHCP',
opt => '&',
},
],
priority => 2,
primary => 0,
}
# Virtual group to reflect the state of the whole network. If the DNS and DHCP groups
# are both up, this group is up. If either DNS or DHCP is down, this group is down.
nagios::bpi::config { 'network':
displayname => 'Network',
members => [
{
host => '$dns',
opt => '|',
},
{
host => '$dhcp',
opt => '|',
},
],
priority => 1,
primary => 1,
}
The display name for the BPI group (required)
Members of this BPI group, which can consist or services and other BPI groups. Data should be expressed as an array of hashes with the following keys:
host
: The hostname of a host in Nagios or the groupID of a BPI group. Required.service
: The servicename of a service in Nagios, ifhost
is a Nagios host. Not required ifhost
is a BPI group.opt
: an&
or|
character where&
means service is part of a cluster and|
means it is an essential service for the group.
For example: a critical service with an |
option will cause a critical state for the entire group.
For clusters, critical is only reached when ALL services in a cluster are NOT OK.
Automatically create a Nagios check for this BPI group. Default: true
Description for a bpi group. Optional, default: undef
Primary/Top-Level groups are 1
, subgroups are 0
. Setting 0
hides the BPI group except where is explicitly
referenced as a component of another BPI group. Default: 1
Link to internal or external webpage. Optional, default: undef
The number of problems a group reaches before going 'warning'. Default: 0
The number of problems a group reaches before going 'critical'. Default: 0
The display priority on screen between 1-3
, 1
being 'high priority'. Default: 1
Set an event handler for this BPI group's Nagios check. Only makes sense if nagios=true
. Default: undef
Enable periodic email reports about uptime of BPI services. Choose from yesterday
, lastweek
,
lastmonth
, lastyear
. Default: undef
One or more email addresses who should receive the uptime report. Mus be expressed as an array
even if there is only one email address. Default: $serveradmin
class ::profile::nagios {
# Install Nagios server
class { 'nagios':
nrpe => true, # Set up NRPE for monitoring of remote hosts
nsca => false, # Skip NSCA, which is needed for passive checks
selinux => true, # Manage SELinux policies to allow Nagios to run smoothly
firewall => true, # Manage firewall rules to allow Nagios/NRPE to run smoothly
url => 'nagios.example.com', # Service URL of Nagios, if different from the system hostname
serveradmin => '[email protected]', # Admin's email address
ssl_cert => '/etc/pki/tls/certs/nagios.example.com.pem', # Path to SSL cert for HTTPS
ssl_key => '/etc/pki/tls/private/nagios.example.com.key', # Path to SSL key for HTTPS
auth_type => 'CAS', # Override Apache basic auth and use CAS single sign-on instead
}
# Deploy HTTPS certificate
file { '/etc/pki/tls/certs/nagios.example.com.pem':
source => 'puppet:///modules/profile/nagios/nagios.example.com.pem',
mode => '0644',
owner => 'root',
group => 'root',
}
# Deploy HTTPS private key
file { '/etc/pki/tls/private/nagios.example.com.key':
source => 'puppet:///modules/profile/nagios/nagios.example.com.key',
mode => '0600',
owner => 'root',
group => 'root',
}
}
This service definition monitors the host remotely, directly from the Nagios server. This is ideal for monitoring services that are available on the remote host, such as HTTP.
nagios::service { 'check_http':
service_description => 'HTTP',
plugin_source => 'nagios-plugins-http',
command_definition => 'check_http -I $HOSTADDRESS$ $ARG1$',
}
This service definition installs the plugin on the monitored host and configures NRPE. The check itself is installed on the Nagios server. This is ideal for monitoring attributes of the remote host that are not available externally.
nagios::service { 'check_users':
use_nrpe => true, # Execute this on the host via NRPE
service_description => 'Current users', # Human-readable description
plugin_source => 'nagios-plugins-users', # Package that provides this plugin
command_definition => 'check_users -w 10 -c 20', # Syntax for actually calling the plugin
}
This service definition is applied to the Nagios server, and the host name is overridden to point at a different system (one that is not managed by Puppet). This is ideal for monitoring "dumb" devices such as switches or other people's servers that you have no access to.
nagios::service { 'check_ping_router':
host_name => 'router.example.com',
plugin_source => 'nagios-plugins-ping',
service_description => 'Ping',
command_definition => 'check_ping -H $HOSTADDRESS$ -w 100,10% -c 1000,50% -p 5',
}
This service definition is applied to the Nagios server and the host name is overriden to point at a different system which is manually managed, and has a manually-configured NRPE agent but no Puppet agent. This is ideal for monitoring legacy servers where you can't retrofit Puppet.
nagios::service { 'check_load_legacysystem.example.com':
check_command => 'check_load', # Name of the command we have manually set on the remote system
use_nrpe => true, # Use NRPE, which we have manually set up
service_description => 'Load',
host_name => 'legacysystem.example.com', # Override monitored server name
install_plugin => false, # Don't attempt to manage the plugin
}
This module has been developed for Nagios 4 on CentOS 7. It's pretty flexible so it should work on other platforms too but they have had little-to-no testing.
This module is currently functional but not feature-complete. There are rough edges and things not implemented yet. Please look at the issue tracker to look for outstanding issues and feature requests.
In particular the HTTPS/SSL config is rough around the edges and quite a few options are hard-coded in and need to be brought out to parameters.
This module was written initially for internal use - features we haven't needed to use probably haven't been written. Please send pull requests with new features and bug fixes. You are also welcome to file issues but I make no guarantees of development effort if the features aren't useful to my employer.