[Box Backup] monitoring bbackupd
Martin Ebourne
boxbackup@fluffy.co.uk
Fri, 26 Aug 2005 00:31:47 +0100
--=-AXj9aJ6skXtDXjouqBsg
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
On Fri, 2005-08-26 at 08:49 +1000, Scott McNee wrote:
> Hi Martin,
> Any chance of saving some work and getting a copy of the
> script?
I put restart_service (attached) in /etc/nagios/eventhandlers. It's a
generic service restart script, so can be used for anything including
bbstored and bbackupd.
In misccommands.cfg I have:
define command {
command_name restart_service
command_line $USER6$/restart_service "$SERVICESTATE$" "$STATETYPE$" "$SERVICEATTEMPT$" "$SERVICEDESC$" >> /var/log/nagios/event_handler.log 2>&1
}
You'll need to make sure $USER6$ is set for the restart_service script
location, or hard-code it.
I then have the following in services.cfg:
define service {
use generic-service
host_name hordein
service_description bbackupd
check_command check_service!bbackupd
event_handler restart_service
}
define service {
use generic-service
host_name hordein
service_description bbstored
check_command check_service!bbstored
event_handler restart_service
}
You'll need a suitable generic-service set up.
If you want to get a central machine to monitor remote ones then use the
same script on the remote machine and set up nrpe. If you need help with
that then I think the nagios lists would be more appropriate.
One day I'll write one that uses bbackupquery to log in and check the
store out, free space etc.
This is all on Fedora Core. YMMV.
Cheers,
Martin.
--=-AXj9aJ6skXtDXjouqBsg
Content-Disposition: attachment; filename=restart_service
Content-Type: application/x-shellscript; name=restart_service
Content-Transfer-Encoding: 7bit
#!/bin/sh
#
# Event handler script for restarting unix services on the local machine
state="$1"
failure="$2"
attempt="$3"
service="$4"
# What state is the service in?
case "$state" in
OK)
# The service just came back up, so don't do anything...
;;
WARNING)
# We don't really care about warning states, since the service is probably still running...
;;
UNKNOWN)
# We don't know what might be causing an unknown error, so don't do anything...
;;
CRITICAL)
# Is this a "soft" or a "hard" state?
case "$failure" in
SOFT)
# What check attempt are we on?
case "$attempt" in
2)
echo "Restarting $service service (2nd soft critical state)..."
/usr/bin/sudo /sbin/service $service restart
;;
esac
;;
HARD)
echo "Restarting $service service..."
# Call the init script to restart the server
/usr/bin/sudo /sbin/service $service restart
;;
esac
;;
esac
exit 0
--=-AXj9aJ6skXtDXjouqBsg--