nut/docs/man/upsmon.conf.txt

332 lines
12 KiB
Text
Raw Normal View History

2011-01-26 09:35:08 +00:00
UPSMON.CONF(5)
==============
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
NAME
----
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
upsmon.conf - Configuration for Network UPS Tools upsmon
DESCRIPTION
-----------
This file's primary job is to define the systems that linkman:upsmon[8]
2010-03-25 23:20:59 +00:00
will monitor and to tell it how to shut down the system when necessary.
2011-01-26 09:35:08 +00:00
It will contain passwords, so keep it secure. Ideally, only the upsmon
2010-03-25 23:20:59 +00:00
process should be able to read it.
Additionally, other optional configuration values can be set in this
file.
2011-01-26 09:35:08 +00:00
CONFIGURATION DIRECTIVES
------------------------
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
*DEADTIME* 'seconds'::
2010-03-25 23:20:59 +00:00
upsmon allows a UPS to go missing for this many seconds before declaring
it "dead". The default is 15 seconds.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
upsmon requires a UPS to provide status information every few seconds
(see POLLFREQ and POLLFREQALERT) to keep things updated. If the status
fetch fails, the UPS is marked stale. If it stays stale for more than
DEADTIME seconds, the UPS is marked dead.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
A dead UPS that was last known to be on battery is assumed to have
changed to a low battery condition. This may force a shutdown if it is
providing a critical amount of power to your system. This seems
disruptive, but the alternative is barreling ahead into oblivion and
crashing when you run out of power.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Note: DEADTIME should be a multiple of POLLFREQ and POLLFREQALERT.
Otherwise, you'll have "dead" UPSes simply because upsmon isn't polling
them quickly enough. Rule of thumb: take the larger of the two POLLFREQ
values, and multiply by 3.
2011-01-26 09:35:08 +00:00
*FINALDELAY* 'seconds'::
2010-03-25 23:20:59 +00:00
When running in master mode, upsmon waits this long after sending the
NOTIFY_SHUTDOWN to warn the users. After the timer elapses, it then
runs your SHUTDOWNCMD. By default this is set to 5 seconds.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
If you need to let your users do something in between those events,
increase this number. Remember, at this point your UPS battery is
almost depleted, so don't make this too big.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Alternatively, you can set this very low so you don't wait around when
it's time to shut down. Some UPSes don't give much warning for low
battery and will require a value of 0 here for a safe shutdown.
2011-01-26 09:35:08 +00:00
+
NOTE: If FINALDELAY on the slave is greater than HOSTSYNC on the master,
2010-03-25 23:20:59 +00:00
the master will give up waiting for the slave to disconnect.
2011-01-26 09:35:08 +00:00
*HOSTSYNC* 'seconds'::
2010-03-25 23:20:59 +00:00
upsmon will wait up to this many seconds in master mode for the slaves
to disconnect during a shutdown situation. By default, this is 15
seconds.
2011-01-26 09:35:08 +00:00
+
When a UPS goes critical (on battery + low battery, or "FSD": forced
2010-03-25 23:20:59 +00:00
shutdown), the slaves are supposed to disconnect and shut down right
away. The HOSTSYNC timer keeps the master upsmon from sitting there
forever if one of the slaves gets stuck.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
This value is also used to keep slave systems from getting stuck if
the master fails to respond in time. After a UPS becomes critical,
the slave will wait up to HOSTSYNC seconds for the master to set the
FSD flag. If that timer expires, the slave will assume that the master
is broken and will shut down anyway.
2011-01-26 09:35:08 +00:00
+
This keeps the slaves from shutting down during a short-lived status
2010-03-25 23:20:59 +00:00
change to "OB LB" that the slaves see but the master misses.
2011-01-26 09:35:08 +00:00
*MINSUPPLIES* 'num'::
2010-03-25 23:20:59 +00:00
Set the number of power supplies that must be receiving power to keep
this system running. Normal computers have just one power supply, so
the default value of 1 is acceptable.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Large/expensive server type systems usually have more, and can run
with a few missing. The HP NetServer LH4 can run with 2 out of 4, for
example, so you'd set it to 2. The idea is to keep the box running
as long as possible, right?
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Obviously you have to put the redundant supplies on different UPS
2011-01-26 09:35:08 +00:00
circuits for this to make sense! See big-servers.txt in the docs
2010-03-25 23:20:59 +00:00
subdirectory for more information and ideas on how to use this
feature.
2011-01-26 09:35:08 +00:00
+
Also see the section on "power values" in linkman:upsmon[8].
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
*MONITOR* 'system' 'powervalue' 'username' 'password' 'type'::
2010-03-25 23:20:59 +00:00
Each UPS that you need to be monitor should have a MONITOR line. Not
all of these need supply power to the system that is running upsmon.
You may monitor other systems if you want to be able to send
notifications about status changes on them.
2011-01-26 09:35:08 +00:00
You must have at least one MONITOR directive in `upsmon.conf`.
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
'system' is a UPS identifier. It is in this form:
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
+<upsname>[@<hostname>[:<port>]]+
2010-03-25 23:20:59 +00:00
The default hostname is "localhost". Some examples:
2011-01-26 09:35:08 +00:00
- "su700@mybox" means a UPS called "su700" on a system called "mybox".
2010-03-25 23:20:59 +00:00
This is the normal form.
2011-01-26 09:35:08 +00:00
- "fenton@bigbox:5678" is a UPS called "fenton" on a system called
"bigbox" which runs linkman:upsd[8] on port "5678".
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
'powervalue' is an integer representing the number of power supplies
2010-03-25 23:20:59 +00:00
that the UPS feeds on this system. Most normal computers have one power
supply, and the UPS feeds it, so this value will be 1. You need a very
large or special system to have anything higher here.
2011-01-26 09:35:08 +00:00
You can set the 'powervalue' to 0 if you want to monitor a UPS that
2010-03-25 23:20:59 +00:00
doesn't actually supply power to this system. This is useful when you
want to have upsmon do notifications about status changes on a UPS
without shutting down when it goes critical.
2011-01-26 09:35:08 +00:00
The 'username' and 'password' on this line must match an entry
in that system's linkman:upsd.users[5]. If your username is "monmaster"
2010-03-25 23:20:59 +00:00
and your password is "blah", the MONITOR line might look like this:
2011-01-26 09:35:08 +00:00
+MONITOR myups@bigserver 1 monmaster blah master+
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
Meanwhile, the `upsd.users` on `bigserver` would look like this:
2010-03-25 23:20:59 +00:00
[monmaster]
password = blah
2011-01-26 09:35:08 +00:00
upsmon master # (or slave)
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
The 'type' refers to the relationship with linkman:upsd[8]. It can
be either "master" or "slave". See linkman:upsmon[8] for more information
2010-03-25 23:20:59 +00:00
on the meaning of these modes. The mode you pick here also goes in
2011-01-26 09:35:08 +00:00
the `upsd.users` file, as seen in the example above.
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
*NOCOMMWARNTIME* 'seconds'::
2010-03-25 23:20:59 +00:00
upsmon will trigger a NOTIFY_NOCOMM after this many seconds if it can't
reach any of the UPS entries in this configuration file. It keeps
warning you until the situation is fixed. By default this is 300
seconds.
2011-01-26 09:35:08 +00:00
*NOTIFYCMD* 'command'::
2010-03-25 23:20:59 +00:00
upsmon calls this to send messages when things happen.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
This command is called with the full text of the message as one
argument. The environment string NOTIFYTYPE will contain the type
string of whatever caused this event to happen.
2011-01-26 09:35:08 +00:00
+
If you need to use linkman:upssched[8], then you must make it your
2010-03-25 23:20:59 +00:00
NOTIFYCMD by listing it here.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Note that this is only called for NOTIFY events that have EXEC set with
NOTIFYFLAG. See NOTIFYFLAG below for more details.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Making this some sort of shell script might not be a bad idea. For
more information and ideas, see pager.txt in the docs directory.
2011-01-26 09:35:08 +00:00
+
Remember, this command also needs to be one element in the configuration file,
2010-03-25 23:20:59 +00:00
so if your command has spaces, then wrap it in quotes.
2011-01-26 09:35:08 +00:00
+
+NOTIFYCMD "/path/to/script --foo --bar"+
+
This script is run in the background--that is, upsmon forks before it
2010-03-25 23:20:59 +00:00
calls out to start it. This means that your NOTIFYCMD may have multiple
instances running simultaneously if a lot of stuff happens all at once.
Keep this in mind when designing complicated notifiers.
2011-01-26 09:35:08 +00:00
*NOTIFYMSG* 'type' 'message'::
2010-03-25 23:20:59 +00:00
upsmon comes with a set of stock messages for various events. You can
change them if you like.
NOTIFYMSG ONLINE "UPS %s is getting line power"
NOTIFYMSG ONBATT "Someone pulled the plug on %s"
2011-01-26 09:35:08 +00:00
+
Note that +%s+ is replaced with the identifier of the UPS in question.
+
The message must be one element in the configuration file, so if it
contains spaces, you must wrap it in quotes.
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
NOTIFYMSG NOCOMM "Someone stole UPS %s"
+
Possible values for 'type':
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
ONLINE;; UPS is back online
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
ONBATT;; UPS is on battery
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
LOWBATT;; UPS is on battery and has a low battery (is critical)
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
FSD;; UPS is being shutdown by the master (FSD = "Forced Shutdown")
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
COMMOK;; Communications established with the UPS
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
COMMBAD;; Communications lost to the UPS
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
SHUTDOWN;; The system is being shutdown
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
REPLBATT;; The UPS battery is bad and needs to be replaced
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
NOCOMM;; A UPS is unavailable (can't be contacted for monitoring)
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
*NOTIFYFLAG* 'type' 'flag'[\+'flag'][+'flag']...::
2010-03-25 23:20:59 +00:00
By default, upsmon sends walls global messages to all logged in users)
via /bin/wall and writes to the syslog when things happen. You can
change this.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Examples:
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
NOTIFYFLAG ONLINE SYSLOG
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Possible values for the flags:
2011-01-26 09:35:08 +00:00
+
SYSLOG;; Write the message to the syslog
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
WALL;; Write the message to all users with /bin/wall
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
EXEC;; Execute NOTIFYCMD (see above) with the message
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
IGNORE;; Don't do anything
+
2010-03-25 23:20:59 +00:00
If you use IGNORE, don't use any other flags on the same line.
2011-01-26 09:35:08 +00:00
*POLLFREQ* 'seconds'::
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
Normally upsmon polls the linkman:upsd[8] server every 5 seconds. If this
2010-03-25 23:20:59 +00:00
is flooding your network with activity, you can make it higher. You can
also make it lower to get faster updates in some cases.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
There are some catches. First, if you set the POLLFREQ too high, you
2011-01-26 09:35:08 +00:00
may miss short-lived power events entirely. You also risk triggering
2010-03-25 23:20:59 +00:00
the DEADTIME (see above) if you use a very large number.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Second, there is a point of diminishing returns if you set it too low.
While upsd normally has all of the data available to it instantly, most
drivers only refresh the UPS status once every 2 seconds. Polling any
more than that usually doesn't get you the information any faster.
2011-01-26 09:35:08 +00:00
*POLLFREQALERT* 'seconds'::
2010-03-25 23:20:59 +00:00
This is the interval that upsmon waits between polls if any of its UPSes
are on battery. You can use this along with POLLFREQ above to slow down
polls during normal behavior, but get quicker updates when something bad
happens.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
This should always be equal to or lower than the POLLFREQ value. By
default it is also set 5 seconds.
2011-01-26 09:35:08 +00:00
+
The warnings from the POLLFREQ entry about too-high and too-low values
2010-03-25 23:20:59 +00:00
also apply here.
2011-01-26 09:35:08 +00:00
*POWERDOWNFLAG* 'filename'::
2010-03-25 23:20:59 +00:00
upsmon creates this file when running in master mode when the UPS needs
to be powered off. You should check for this file in your shutdown
2011-01-26 09:35:08 +00:00
scripts and call `upsdrvctl shutdown` if it exists.
+
2010-03-25 23:20:59 +00:00
This is done to forcibly reset the slaves, so they don't get stuck at
the "halted" stage even if the power returns during the shutdown
2011-01-26 09:35:08 +00:00
process. This usually does not work well on contact-closure UPSes that
2010-03-25 23:20:59 +00:00
use the genericups driver.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
See the shutdown.txt file in the docs subdirectory for more information.
2011-01-26 09:35:08 +00:00
*RBWARNTIME* 'seconds'::
2010-03-25 23:20:59 +00:00
When a UPS says that it needs to have its battery replaced, upsmon will
2011-01-26 09:35:08 +00:00
generate a NOTIFY_REPLBATT event. By default, this happens every 43200
seconds (12 hours).
+
2010-03-25 23:20:59 +00:00
If you need another value, set it here.
2011-01-26 09:35:08 +00:00
*RUN_AS_USER* 'username'::
2010-03-25 23:20:59 +00:00
upsmon normally runs the bulk of the monitoring duties under another user
ID after dropping root privileges. On most systems this means it runs
2011-01-26 09:35:08 +00:00
as "nobody", since that's the default from compile-time.
+
2010-03-25 23:20:59 +00:00
The catch is that "nobody" can't read your upsmon.conf, since by default
it is installed so that only root can open it. This means you won't be
able to reload the configuration file, since it will be unavailable.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
The solution is to create a new user just for upsmon, then make it run
as that user. I suggest "nutmon", but you can use anything that isn't
already taken on your system. Just create a regular user with no special
privileges and an impossible password.
2011-01-26 09:35:08 +00:00
+
Then, tell upsmon to run as that user, and make `upsmon.conf` readable by it.
2010-03-25 23:20:59 +00:00
Your reloads will work, and your config file will stay secure.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
This file should not be writable by the upsmon user, as it would be
possible to exploit a hole, change the SHUTDOWNCMD to something
malicious, then wait for upsmon to be restarted.
2011-01-26 09:35:08 +00:00
*SHUTDOWNCMD* 'command'::
2010-03-25 23:20:59 +00:00
upsmon runs this command when the system needs to be brought down. If
it is a slave, it will do that immediately whenever the current overall
power value drops below the MINSUPPLIES value above.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
When upsmon is a master, it will allow any slaves to log out before
starting the local shutdown procedure.
2011-01-26 09:35:08 +00:00
+
2010-03-25 23:20:59 +00:00
Note that the command needs to be one element in the config file. If
your shutdown command includes spaces, then put it in quotes to keep it
together, i.e.:
2011-01-26 09:35:08 +00:00
SHUTDOWNCMD "/sbin/shutdown -h +0"
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
SEE ALSO
--------
linkman:upsmon[8], linkman:upsd[8], linkman:nutupsdrv[8].
2010-03-25 23:20:59 +00:00
2011-01-26 09:35:08 +00:00
Internet resources:
~~~~~~~~~~~~~~~~~~~
2010-03-25 23:20:59 +00:00
The NUT (Network UPS Tools) home page: http://www.networkupstools.org/