Desc: Configuring automatic UPS shutdowns File: shutdown.txt Date: 24 August 2003 Auth: Russell Kroll Shutdown design =============== When your UPS batteries get low, the operating system needs to be brought down cleanly. Also, the UPS load should be turned off so that all devices that are attached to it are forcibly rebooted. Here are the steps that occur when a critical power event happens: 1. The UPS goes on battery 2. The UPS reaches low battery (a "critical" UPS) 3. The upsmon master notices and sets "FSD" - the "forced shutdown" flag to tell all slave systems that it will soon power down the load. (If you have no slaves, skip to step 6) 4. upsmon slave systems see "FSD" and: - generate a NOTIFY_SHUTDOWN event - wait FINALDELAY seconds - typically 5 - call their SHUTDOWNCMD - disconnect from upsd 5. The upsmon master system waits up to HOSTSYNC seconds (typically 15) for the slaves to disconnect from upsd. If any are connected after this time, upsmon stops waiting and proceeds with the shutdown process. 6. The upsmon master: - generates a NOTIFY_SHUTDOWN event - waits FINALDELAY seconds - typically 5 - creates the POWERDOWNFLAG file - usually /etc/killpower - calls the SHUTDOWNCMD 7. On most systems, init takes over, kills your processes, syncs and unmounts some filesystems, and remounts some read-only. 8. init then runs your shutdown script. This checks for the POWERDOWNFLAG, finds it, and tells the UPS driver(s) to power off the load. 9. The system loses power. 10. Time passes. The power returns, and the UPS switches back on. 11. All systems reboot and go back to work. How you set it up ================= 1. Make sure your POWERDOWNFLAG setting in upsmon.conf points somewhere reasonable. Specifically, that filesystem must be mounted when your shutdown script runs. 2. Edit your shutdown scripts to check for the POWERDOWNFLAG so they know when to power off the UPS. You must check for this file, as you don't want this to happen during normal shutdowns! You can use upsdrvctl to start the shutdown process in your UPS hardware. Use this script as an example, but change the paths to suit your system: if (test -f /etc/killpower) then echo "Killing the power, bye!" /usr/local/ups/bin/upsdrvctl shutdown sleep 120 # uh oh... the UPS poweroff failed! # you probably should reboot here to avoid getting stuck # *** see the section on power races below *** fi Make sure the filesystem containing upsdrvctl, ups.conf and your UPS driver(s) is mounted when the system gets to this point. Otherwise it won't be able to figure out what to do. RAID warning ============ NOTE: If you run any sort of RAID equipment, make sure your arrays are either halted (if possible) or switched to "read-only" mode. Otherwise you may suffer a long resync once the system comes back up. The kernel may not ever run its final shutdown procedure, so you must take care of all array shutdowns in userspace before upsdrvctl runs. If you use software RAID (md) on Linux, get mdadm and try using 'mdadm --readonly' to put your arrays in a safe state. This has to happen after your shutdown scripts have remounted the filesystems. On hardware RAID or other kernels, you have to do some detective work. It may be necessary to contact the vendor or the author of your driver to find out how to put the array in a state where a power loss won't leave it "dirty". My understanding is that 3ware devices on Linux will be fine unless there are pending writes. Make sure your filesystems are remounted read-only and you should be covered. Multiple UPS shutdowns ====================== If you have multiple UPSes connected to your system, chances are that you need to shut them down in a specific order. The goal is to shut down everything but the one keeping upsmon alive at first, then you do that one last. To set the order in which your UPSes receive the shutdown commands, define the "sdorder" value in your ups.conf. [bigone] driver = apcsmart port = /dev/ttyS0 sdorder = 2 [littleguy] driver = bestups port = /dev/ttyS1 sdorder = 1 [misc] driver = megatec port = /dev/ttyS2 sdorder = 0 The order runs from 0 to the highest number available. So, for this configuration, the order of shutdowns would be misc, littleguy, and then bigone. If you have a UPS that shouldn't be shutdown when running "upsdrvctl shutdown", set the sdorder to -1. Testing shutdowns ================= To see how upsdrvctl will behave without actually turning off power, use the -t argument. It will display the sequence without actually calling the drivers. Other issues ============ You may delete the POWERDOWNFLAG in the startup scripts, but it is not necessary. upsmon will clear that file for you when it starts. Remember that some operating systems unmount a good number of filesystems when going into read-only mode. If the UPS software is installed to /usr and it's not mounted, your shutdowns will fail. If this happens, either make sure it stays mounted at shutdown, or install to another partition. Power races =========== There is a situation where the power may return during the shutdown process. This is known as a race. Here's how we handle it. "Smart" UPSes typically handle this by using a command that forces the UPS to power the load off and back on. This way, you are assured that the systems will restart even if the power returns at the worst possible moment. Contact closure units (ala genericups), on the other hand, have the potential for a race when feeding multiple systems. This is due to the design of most contact closure UPSes. Typically, the "kill power" line only functions when running on battery. As a result, if the line power returns during the shutdown process, there is no way to power down the load. The workaround is to force your systems to reboot after some interval. This way, they won't be stuck in the halted state with the UPS running on line power. Testing power races =================== The easiest way to see if your configuration will handle a power race successfully is to do 'upsmon -c fsd'. This will force the UPS software to shut down as if it had a OB+LB situation, and your shutdown script should call the UPS driver(s) in shutdown mode. If everything works correctly, the computer will be forcibly powered off, may remain off for a few seconds to a few minutes (depending on the driver and UPS type), then will power on again. If your UPS just sits there and never resets the load, you are vulnerable to the above power race and should add the "reboot after timeout" hack at the very least. Know your hardware ================== UPS equipment varies from manufacturer to manufacturer and even within model lines. You should test the shutdown sequence on your systems before leaving them unattended. A successful sequence is one where the OS halts before the battery runs out, and the system restarts when power returns. One more tip ============ If your UPS powers up immediately after a power failure instead of waiting for the batteries to recharge, you can rig up a little hack to handle it in software. Essentially, you need to test for the POWERDOWNFLAG in your *startup* scripts while the filesystems are still read-only. If it's there, you know your last shutdown was caused by a power failure and the UPS battery is probably still quite weak. In this situation, your best bet is to sleep it off. Pausing in your startup script to let the batteries recharge with the filesystems in a safe state is recommended. This way, if the power goes out again, you won't face a situation where there's not enough battery capacity left for upsmon to do its thing. Exactly how long to wait is a function of your UPS hardware, and will require careful testing. If this is too evil for you, buy another kind of UPS that will either wait for a minimum amount of charge, a minimum amount of time, or both.