This machine runs a lot of services and I don't use all of them. After breaking several of them and not noticing (again), I decided to finally set up service monitoring. After some research, Monit was relatively easy to set up and seems to meet my needs. I figured other people might want some examples of how to use it, so this post describes how to set it up and you can see my config file at the end.
Why Monit
My use-case is one server monitoring itself. The obvious question is "who monitors the monitor?", but my main concern is not noticing services I don't use. If the entire server is down I'll probably notice eventually. A bigger problem is that I'm not testing the firewall rules and routing. A separate server really would be ideal, but since I can't run it from home (ISP port filtering blocks SMTP), I'd have to pay for another VPS, and it doesn't seem worth it right now.
I've worked with sysadmins in the past who liked Nagios, so that was my first choice for this, but it's complicated to set up and extreme overkill for monitoring one server. I looked at Sensu too, and it seems nicer but still overkill. I chose Monit because it's easy to setup (see next) and the barebones UI doesn't matter to me (I just want a basic up/down status and emails).
Setup
To install Monit on Fedora, you run:
sudo dnf install monit
# edit config files
sudo monit -t # check config syntax
sudo systemctl start monit
sudo systemctl enable monit
Configuration
The edit config files
step is by-far the longest. First you'll want to edit /etc/monitrc
and set your mail server and who to sent alerts to. For me this was just:
set mailserver localhost
set alert self@brendanlong.com
If you're using a remote mail server, you probably need to configure a user name and password. I also uncommented the eventqueue
lines so alerts won't be lost if the mail server goes down.
The only other thing I changed in this file is the set httpd port ...
section, where I changed the admin password and removed the localhost restriction (so I can access it remotely at http://status.brendanlong.com).
I put the rest of my configuration in individual files in /etc/monit.d
. For example, monitoring of things accessed at "brendanlong.com" is in a file named /etc/monit.d/brendanlong.com
and http://etherealspring.com/ is in /etc/monit.d/etherealspring.com
. This is just personal preference, but it will be easier to handle package updates this way, and I should be able to find things faster.
To write these config files, refer to the Monit documentation. Monit can check a lot of things like processes and system health, but I'm a believer in checking the thing you actually care about. I don't care what processes are running or what files exist, I just want to sure that you can get the correct pages from each HTTP server and the other servers are responding in reasonable ways. To do that, I used check host
rules exclusively.
Host rules take the form:
check host [unique name] with address [actual domain name]
if failed
[rules]
then [alert / restart / etc.]
The unique name part is annoying, since Monit creates a rule for your server automatically, and you can't add to it (as far as I can tell — email me if this isn't true). I got around this by naming it "localhost" instead of "brendanlong.com".
The part I had the most trouble with was figuring out the rules. Here's what I found:
- You'll always want a
port = [num]
rule. This works how you'd expect. - If you're testing one of the supported protocols, add a
protocol
section. It supports all of the major protocols like HTTP(S), SMTP(S), IMAP(S), etc. If you tell it the protocol, it will ensure that the endpoint not only connects, but gives a reasonable response. - For protocols that have logins, you can give a
username
andpassword
and it will test if the login succeeds. - For HTTP, you can give the expected
status
and text that should be in the response withcontent
. - Order matters. For example,
protocol http status 200 content = "Brendan Long"
is valid, butprotocol http content = "Brendan Long" status 200
is not. See the syntax in the documentation for the correct order. - If you don't set
protocol
, it will just test if a TCP connection succeeds. You can do more complicated checks withsend
andexpect
(send a text or binary message and check the response). - You probably want to setfault tolerance for some rules. In my case, the connection to my SMTP server would randomly fail, but it doesn't really matter as long as a retry works. I made it quieter by adding
for 3 cycles
to that rule.
Examples
Here's the config file for brendanlong.com. At some point I'll make it do more extensive testing for Minecraft and SyncThing, but this gets me 90% of what I wanted:
check host localhost with address brendanlong.com
if failed
port 22
protocol ssh
then alert
if failed
port 443
protocol https
status = 301
then alert
if failed
port 80
protocol http
status = 301
then alert
if failed
port 25
protocol smtp
for 3 cycles
then alert
if failed
port 465
protocol smtps
then alert
if failed
port 143
protocol imap
then alert
if failed
port 993
protocol imaps
then alert
# Minecraft
if failed
port 25565
then alert
# SyncThing
if failed
port 22000
then alert
check host www.brendanlong.com with address www.brendanlong.com
if failed
port 443
protocol https
status = 200
content = "Brendan Long"
then alert
if failed
port 80
protocol http
status = 301
then alert
check host wiki.brendanlong.com with address wiki.brendanlong.com
if failed
port 80
protocol http
status = 403
then alert