Advanced usage and scheduling notes =================================== upsmon can call out to a helper script or program when the device changes state. The example upsmon.conf has a full list of which state changes are available - ONLINE, ONBATT, LOWBATT, and more. There are two options, that will be presented in details: - the simple approach: create your own helper, and manage all events and actions yourself, - the advanced approach: use the NUT provided helper, called 'upssched'. The simple approach, using your own script ------------------------------------------ How it works relative to upsmon ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Your command will be called with the full text of the message as one argument. For the default values, refer to the sample upsmon.conf file. The environment string NOTIFYTYPE will contain the type string of whatever caused this event to happen - ONLINE, ONBATT, LOWBATT, ... Making this some sort of shell script might be a good idea, but the helper can be in any programming or scripting language. NOTE: Remember that your helper must be *executable*. If you are using a script, make sure the execution flags are set. For more information, refer to linkman:upsmon[8] and linkman:upsmon.conf[5] manual pages. Setting up everything ~~~~~~~~~~~~~~~~~~~~~ - Set EXEC flags on various things in linkman:upsmon.conf[5]: + NOTIFYFLAG ONBATT EXEC NOTIFYFLAG ONLINE EXEC + If you want other things like WALL or SYSLOG to happen, just add them: + NOTIFYFLAG ONBATT EXEC+WALL+SYSLOG + You get the idea. - Tell upsmon where your script is NOTIFYCMD /path/to/my/script - Make a simple script like this at that location: #! /bin/bash echo "$*" | sendmail -F"ups@mybox" bofh@pager.example.com - Restart upsmon, pull the plug, and see what happens. That approach is bare-bones, but you should get the text content of the alert in the body of the message, since upsmon passes the alert text (from NOTIFYMSG) as an argument. Using more advanced features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Your helper script will be run with a few environment variables set. - UPSNAME: the name of the system that generated the change. + This will be one of your identifiers from the MONITOR lines in upsmon.conf. - NOTIFYTYPE: this will be ONLINE, ONBATT, or whatever event took place which made upsmon call your script. You can use these to do different things based on which system has changed state. You could have it only send pages for an important system while totally ignoring a known trouble spot, for example. Suppressing notify storms ~~~~~~~~~~~~~~~~~~~~~~~~~ upsmon will call your script every time an event happens that has the EXEC flag set. This means a quick power failure that lasts mere seconds might generate a notification storm. To suppress this sort of annoyance, use upssched as your NOTIFYCMD program, and configure it to call your command after a timer has elapsed. The advanced approach, using upssched ------------------------------------- upssched is a helper for upsmon that will invoke commands for you at some interval relative to a UPS event. It can be used to send pages, mail out notices about things, or even shut down the box early. There will be examples scattered throughout. Change them to suit your pathnames, UPS locations, and so forth. How upssched works relative to upsmon ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When an event occurs, upsmon will call whatever you specify as a 'NOTIFYCMD' in your upsmon.conf, if you also enable the 'EXEC' in your 'NOTIFYFLAGS'. In this case, we want upsmon to call upssched as the notifier, since it will be doing all the work for us. So, in the upsmon.conf: NOTIFYCMD /usr/local/ups/sbin/upssched Then we want upsmon to actually _use_ it for the notify events, so again in the upsmon.conf we set the flags: NOTIFYFLAG ONLINE SYSLOG+EXEC NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC ... and so on. For the purposes of this document I will only use those three, but you can set the flags for any of the valid notify types. Setting up your upssched.conf ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once upsmon has been configured with the NOTIFYCMD and EXEC flags, you're ready to deal with the upssched.conf details. In this file, you specify just what will happen when a given event occurs on a particular UPS. First you need to define the name of the script or program that will handle timers that trigger. This is your CMDSCRIPT, and needs to be above any AT defines. There's an example provided with the program, so we'll use that here: CMDSCRIPT /usr/local/ups/bin/upssched-cmd Then you have to define the variables PIPEFN and LOCKFN; the former sets the file name of the FIFO that will pass communications between processes to start and stop timers, while the latter sets the file name for a temporary file created by upssched in order to avoid a race condition under some circumstances. Please see the relevant comments in upssched.conf for additional information and advice about these variables. Now you can tell your CMDSCRIPT what to do when it is called by upsmon. The big picture ^^^^^^^^^^^^^^^ The design in a nutshell is: upsmon ---> calls upssched ---> calls your CMDSCRIPT Ultimately, the CMDSCRIPT does the actual useful work, whether that's initiating an early shutdown with 'upsmon -c fsd', sending a page by calling sendmail, or opening a subspace channel to V'ger. Establishing timers ^^^^^^^^^^^^^^^^^^^ Let's say that you want to receive a page when any UPS has been running on battery for 30 seconds. Create a handler that starts a 30 second timer for an ONBATT condition. AT ONBATT * START-TIMER onbattwarn 30 This means "when any UPS (the *) goes on battery, start a timer called onbattwarn that will trigger in 30 seconds". We'll come back to the onbattwarn part in a moment. Right now we need to make sure that we don't trigger that timer if the UPS happens to come back before the time is up. In essence, if it goes back on line, we need to cancel it. So, let's tell upssched that. AT ONLINE * CANCEL-TIMER onbattwarn Executing commands immediately ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As an example, consider the scenario where a UPS goes onto battery power. However, the users are not informed until 60 seconds later - using a timer as described above. Whilst this may let the *logged in* users know that the UPS is on battery power, it does not inform any users subsequently logging in. To enable this we could, at the same time, create a file which is read and displayed to any user trying to login whilst the UPS is on battery power. If the UPS comes back onto utility power within 60 seconds, then we can cancel the timer and remove the file, as described above. However, if the UPS comes back onto utility power say 5 minutes later then we do not want to use any timers but we still want to remove the file. To do this we could use: AT ONLINE * EXECUTE ups-back-on-power This means that when upsmon detects that the UPS is back on utility power it will signal upssched. Upssched will see the above command and simply pass 'ups-back-on-power' as an argument directly to CMDSCRIPT. This occurs immediately, there are no timers involved. Writing the command script handler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OK, now that upssched knows how the timers are supposed to work, let's give it something to do when one actually triggers. The name of the example timer is onbattwarn, so that's the argument that will be passed into your CMDSCRIPT when it triggers. This means we need to do some shell script writing to deal with that input. -------------------------------------------------------------------------------- #! /bin/sh case $1 in onbattwarn) echo "The UPS has been on battery for awhile" \ | mail -s"UPS monitor" bofh@pager.example.com ;; ups-back-on-power) /bin/rm -f /some/path/ups-on-battery ;; *) logger -t upssched-cmd "Unrecognized command: $1" ;; esac -------------------------------------------------------------------------------- This is a very simple script example, but it shows how you can test for the presence of a given trigger. With multiple ATs creating various timer names, you will need to test for each possibility and handle it according to your desires. NOTE: You can invoke just about anything from inside the CMDSCRIPT. It doesn't need to be a shell script, either - that's just an example. If you want to write a program that will parse argv[1] and deal with the possibilities, that will work too. Early Shutdowns ~~~~~~~~~~~~~~~ One thing that gets requested a lot is early shutdowns in upsmon. With upssched, you can now have this functionality. Just set a timer for some length of time at ONBATT which will invoke a shutdown command if it elapses. Just be sure to cancel this timer if you go back ONLINE before then. The best way to do this is to use the upsmon callback feature. You can make upsmon set the "forced shutdown" (FSD) flag on the upsd so your slave systems shut down early too. Just do something like this in your CMDSCRIPT: /usr/local/ups/sbin/upsmon -c fsd It's not a good idea to call your system's shutdown routine directly from the CMDSCRIPT, since there's no synchronization with the slave systems hooked to the same UPS. FSD is the master's way of saying "we're shutting down *now* like it or not, so you'd better get ready". Background ~~~~~~~~~~ This program was written primarily to fulfill the requests of users for the early shutdown scenario. The "outboard" design of the program (relative to upsmon) was intended to reduce the load on the average system. Most people don't have the requirement of shutting down after n seconds on battery, since the usual OB+LB testing is sufficient. This program was created separately so those people don't have to spend CPU time and RAM on something that will never be used in their environments. The design of the timer handler is also geared towards minimizing impact. It will come and go from the process list as necessary. When a new timer is started, a process will be forked to actually watch the clock and eventually start the CMDSCRIPT. When a timer triggers, it is removed from the queue. Canceling a timer will also remove it from the queue. When no timers are present in the queue, the background process exits. This means that you will only see upssched running when one of two things is happening: 1. There's a timer of some sort currently running 2. upsmon just called it, and you managed to catch the brief instance The final optimization handles the possibility of trying to cancel a timer when there's none running. If there's no process already running, there are no timers to cancel, and furthermore there is no need to start a clock-watcher. As a result, it skips that step and exits sooner.