What is Checkmk?
DotCIO runs a network/server-monitoring application called Checkmk. Checkmk offers monitoring and alerting services for servers, switches, applications, and services. It is the primary means that DotCIO uses to monitor its IT infrastructure.
What happened to Nagios?
Prior to Checkmk, DotCIO ran a monitoring application known as Nagios XI. In 2023, that application was retired in favor of Checkmk. In the process, all of the servers being monitored by Nagios were automatically migrated to Checkmk.
How does Checkmk monitor my server?
Checkmk monitors servers in two ways.
- Remotely connecting to a server’s external services (such as the web server port) in order to verify that the service is working. This allows Checkmk to detect failures and notify you quickly.
- Using an agent installed on the servers in order to obtain information (such as available disk space) which can be used to detect impending problems before they occur.
How do I access Checkmk?
New users to the Checkmk system will need to submit an ITSSC support request to obtain access to the Checkmk system.
Be aware that, for security reasons, the Checkmk web interface is only accessible from authorized locations. DotCIO staff will work with you to ensure that you have access to such a location.
Once access has been granted, the Checkmk system can be accessed using this link: https://checkmk.itops.rpi.edu/production/
Why should I use Checkmk?
Your servers are important and people depend upon them to be running. Therefore, you want to minimize the downtime for these servers. However, monitoring a server manually all of the time is usually impossible. An automated monitoring tool takes care of all of the grunt work of polling the services and statistics and crunching the numbers 24×7, without needing to take time off for holidays. You could implement your own home-grown monitoring system. However, taking advantage of RPI’s Checkmk system will give you the following benefits:
- You won’t need to spend time building and maintaining the monitoring system.
- You will get enterprise-grade monitoring tools and redundancy.
- The Checkmk system is monitored 24×7 by DotCIO’s Operations department. Thus, you get both the advantages of automated monitoring plus a human brain to do sanity checking and phone call escalation.