Overview
Nagios can send automated notifications via email and SMS text messages. This article explains how to request that such notifications be used, and it explains how to interpret the somewhat cryptic messages that you will receive.
Procedures
Requesting Changes to Notifications
All requests for changes to the notification settings for a server need to be submitted via an ITSSC form. Use the instructions in the “How to Modify an Existing Server or Test in Nagios” article to do so.
What Notification Options are Available?
The following customization options are available with respect to Nagiosʼs notifications:
- Contact List
- This is a list of the people, and their email addresses, that you wish to have notified.
- The use of RPI email addresses is preferred, but not required.
- The notifications can be sent to a cellular phone as an SMS message using the carrierʼs email-to-SMS gateway.
- Notification Period
- This is the time frame during which you wish to receive notifications. While 24×7 is the default, you can specify fairly complex sets of time frames (ex: 09:00–17:00, M–F and 12:00–13:00 on Weekends).
- Notification Triggers
- These are the events that will cause notifications to be sent. They are:
- The host disappears from the network.
- The host reappears on the network.
- The service passes the warning or critical alert threshold.
- The service recovers.
- The host/service is quickly switching between failures and recoveries.
- The host/service enters or exits a Scheduled Downtime.
- The default settings are:
- Alerts will be sent for everything except Scheduled Downtime.
- These are the events that will cause notifications to be sent. They are:
- Notification Interval
- The amount of time that will elapse between repetitions of a notification email.
- If you request an interval, the notifications will be repeatedly sent until the test recovers.
Interpreting the Notification Email
Nagiosʼs notification emails can be cryptic.
Here is an example of a notification telling you that a server has disappeared from the network:
***** Nagios XI Alert *****
Nagios has detected a problem with this host.
Notification Type: PROBLEM
Host: myserver.net
State: DOWN
Address: 19.168.0.1
Info: CRITICAL - 19.168.0.1: rta nan, lost 100%
Date/Time: 08/21/2020 21:15:52
Respond: https://nagios.itops.rpi.edu/nagiosxi/rr.php?oid=13245&token=c0460dc
Meanwhile, here is an example of a notification telling you that a specific aspect of a server is having issues:
***** Nagios XI Alert *****
Nagios has detected a problem with this service.
Notification Type: PROBLEM
Service: CPU Usage
Host: myserver.net
Address: 19.168.0.1
State: WARNING
Info:
WARNING: Percent was 51.10 %
Date/Time: 08/21/2020 22:39:05
Respond: https://nagios.itops.rpi.edu/nagiosxi/rr.php?oid=13245&token=67dd75a76
Here is what the fields mean:
- The “Notification Type” field is indicating that you have a problem.
- The “Service” field only appears on service-related notifications, and it tells you which service test is causing problems. In this case, the CPU for the server is experiencing heavier-than-usual loads.
- The “Host” field tells you the name of the server being tested.
- The “State” field tells you, roughly, what's going on.
- “DOWN”: The server has disappeared from the network.
- “WARNING” / “CRITICAL”: The service test is returning a result that exceeds the warning/critical threshold values. (ex: The disk is running out of space.)
- “UP” / “OK”: The server / service has recovered.
- The “Date/Time” field tells you what time the notification was sent. It does not indicate when the problem first occurred.
- The “Respond” field provides you with a URL that will quickly allow you to log in to the Nagios system and view the status of your server. Please note that you need to be connected to RPIʼs DUO VPN network in order to use this URL.
Comments
0 comments
Article is closed for comments.