[Box Backup] monitoring bbackupd

Martin Ebourne boxbackup@fluffy.co.uk
Fri, 26 Aug 2005 00:31:47 +0100


--=-AXj9aJ6skXtDXjouqBsg
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Fri, 2005-08-26 at 08:49 +1000, Scott McNee wrote:
> Hi Martin,
> 		Any chance of saving some work and getting a copy of the
> script?

I put restart_service (attached) in /etc/nagios/eventhandlers. It's a
generic service restart script, so can be used for anything including
bbstored and bbackupd.

In misccommands.cfg I have:

define command {
  command_name restart_service
  command_line $USER6$/restart_service "$SERVICESTATE$" "$STATETYPE$" "$SERVICEATTEMPT$" "$SERVICEDESC$" >> /var/log/nagios/event_handler.log 2>&1
}

You'll need to make sure $USER6$ is set for the restart_service script
location, or hard-code it.

I then have the following in services.cfg:

define service {
  use                           generic-service
  host_name                     hordein
  service_description           bbackupd
  check_command                 check_service!bbackupd
  event_handler                 restart_service
}

define service {
  use                           generic-service
  host_name                     hordein
  service_description           bbstored
  check_command                 check_service!bbstored
  event_handler                 restart_service
}

You'll need a suitable generic-service set up.

If you want to get a central machine to monitor remote ones then use the
same script on the remote machine and set up nrpe. If you need help with
that then I think the nagios lists would be more appropriate.

One day I'll write one that uses bbackupquery to log in and check the
store out, free space etc.

This is all on Fedora Core. YMMV.

Cheers,

Martin.

--=-AXj9aJ6skXtDXjouqBsg
Content-Disposition: attachment; filename=restart_service
Content-Type: application/x-shellscript; name=restart_service
Content-Transfer-Encoding: 7bit

#!/bin/sh
#
# Event handler script for restarting unix services on the local machine

state="$1"
failure="$2"
attempt="$3"
service="$4"

# What state is the service in?
case "$state" in
  OK)
    # The service just came back up, so don't do anything...
    ;;

  WARNING)
    # We don't really care about warning states, since the service is probably still running...
    ;;

  UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

  CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$failure" in
      SOFT)
	# What check attempt are we on?
	case "$attempt" in
	  2)
	    echo "Restarting $service service (2nd soft critical state)..."
	    /usr/bin/sudo /sbin/service $service restart
	    ;;
	esac
	;;

      HARD)
	echo "Restarting $service service..."

	# Call the init script to restart the server
	/usr/bin/sudo /sbin/service $service restart
	;;
    esac
    ;;
esac
exit 0

--=-AXj9aJ6skXtDXjouqBsg--