422 lines
15 KiB
Text
422 lines
15 KiB
Text
UPSMON.CONF(5)
|
|
==============
|
|
|
|
NAME
|
|
----
|
|
|
|
upsmon.conf - Configuration for Network UPS Tools upsmon
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
|
|
This file's primary job is to define the systems that linkman:upsmon[8]
|
|
will monitor and to tell it how to shut down the system when necessary.
|
|
It will contain passwords, so keep it secure. Ideally, only the upsmon
|
|
process should be able to read it.
|
|
|
|
Additionally, other optional configuration values can be set in this
|
|
file.
|
|
|
|
CONFIGURATION DIRECTIVES
|
|
------------------------
|
|
|
|
*DEADTIME* 'seconds'::
|
|
|
|
upsmon allows a UPS to go missing for this many seconds before declaring
|
|
it "dead". The default is 15 seconds.
|
|
+
|
|
upsmon requires a UPS to provide status information every few seconds
|
|
(see POLLFREQ and POLLFREQALERT) to keep things updated. If the status
|
|
fetch fails, the UPS is marked stale. If it stays stale for more than
|
|
DEADTIME seconds, the UPS is marked dead.
|
|
+
|
|
A dead UPS that was last known to be on battery is assumed to have
|
|
changed to a low battery condition. This may force a shutdown if it is
|
|
providing a critical amount of power to your system. This seems
|
|
disruptive, but the alternative is barreling ahead into oblivion and
|
|
crashing when you run out of power.
|
|
+
|
|
Note: DEADTIME should be a multiple of POLLFREQ and POLLFREQALERT.
|
|
Otherwise, you'll have "dead" UPSes simply because upsmon isn't polling
|
|
them quickly enough. Rule of thumb: take the larger of the two POLLFREQ
|
|
values, and multiply by 3.
|
|
|
|
*FINALDELAY* 'seconds'::
|
|
|
|
When running in primary mode, upsmon waits this long after sending the
|
|
NOTIFY_SHUTDOWN to warn the users. After the timer elapses, it then
|
|
runs your SHUTDOWNCMD. By default this is set to 5 seconds.
|
|
+
|
|
If you need to let your users do something in between those events,
|
|
increase this number. Remember, at this point your UPS battery is
|
|
almost depleted, so don't make this too big.
|
|
+
|
|
Alternatively, you can set this very low so you don't wait around when
|
|
it's time to shut down. Some UPSes don't give much warning for low
|
|
battery and will require a value of 0 here for a safe shutdown.
|
|
+
|
|
NOTE: If FINALDELAY on the secondary is greater than HOSTSYNC on the
|
|
primary, the primary will give up waiting for that secondary upsmon
|
|
to disconnect.
|
|
|
|
*HOSTSYNC* 'seconds'::
|
|
|
|
upsmon will wait up to this many seconds in primary mode for the secondaries
|
|
to disconnect during a shutdown situation. By default, this is 15
|
|
seconds.
|
|
+
|
|
When a UPS goes critical (on battery + low battery, or "FSD": forced
|
|
shutdown), the secondary systems are supposed to disconnect and shut
|
|
down right away. The HOSTSYNC timer keeps the primary upsmon from sitting
|
|
there forever if one of the secondaries gets stuck.
|
|
+
|
|
This value is also used to keep secondary systems from getting stuck if
|
|
the primary fails to respond in time. After a UPS becomes critical, the
|
|
secondary will wait up to HOSTSYNC seconds for the primary to set the
|
|
FSD flag. If that timer expires, the secondary upsmon will assume that the
|
|
primary (or communications path to it) is broken and will shut down anyway.
|
|
+
|
|
This keeps the secondaries from shutting down during a short-lived status
|
|
change to "OB LB" and back that the secondaries see but the primary misses.
|
|
|
|
*MINSUPPLIES* 'num'::
|
|
|
|
Set the number of power supplies that must be receiving power to keep
|
|
this system running. Normal computers have just one power supply, so
|
|
the default value of 1 is acceptable.
|
|
+
|
|
Large/expensive server type systems usually have more, and can run
|
|
with a few missing. The HP NetServer LH4 can run with 2 out of 4, for
|
|
example, so you'd set it to 2. The idea is to keep the box running
|
|
as long as possible, right?
|
|
+
|
|
Obviously you have to put the redundant supplies on different UPS
|
|
circuits for this to make sense! See big-servers.txt in the docs
|
|
subdirectory for more information and ideas on how to use this
|
|
feature.
|
|
+
|
|
Also see the section on "power values" in linkman:upsmon[8].
|
|
|
|
*MONITOR* 'system' 'powervalue' 'username' 'password' 'type'::
|
|
|
|
Each UPS that you need to be monitor should have a MONITOR line. Not
|
|
all of these need supply power to the system that is running upsmon.
|
|
You may monitor other systems if you want to be able to send
|
|
notifications about status changes on them.
|
|
|
|
You must have at least one MONITOR directive in `upsmon.conf`.
|
|
|
|
'system' is a UPS identifier. It is in this form:
|
|
|
|
+<upsname>[@<hostname>[:<port>]]+
|
|
|
|
The default hostname is "localhost". Some examples:
|
|
|
|
- "su700@mybox" means a UPS called "su700" on a system called "mybox".
|
|
This is the normal form.
|
|
- "fenton@bigbox:5678" is a UPS called "fenton" on a system called
|
|
"bigbox" which runs linkman:upsd[8] on port "5678".
|
|
|
|
'powervalue' is an integer representing the number of power supplies
|
|
that the UPS feeds on this system. Most normal computers have one power
|
|
supply, and the UPS feeds it, so this value will be 1. You need a very
|
|
large or special system to have anything higher here.
|
|
|
|
You can set the 'powervalue' to 0 if you want to monitor a UPS that
|
|
doesn't actually supply power to this system. This is useful when you
|
|
want to have upsmon do notifications about status changes on a UPS
|
|
without shutting down when it goes critical.
|
|
|
|
The 'username' and 'password' on this line must match an entry in
|
|
the `upsd` server system's linkman:upsd.users[5] file.
|
|
|
|
If your username is "observer" and your password is "abcd", the MONITOR
|
|
line might look like this (likely on a remote secondary system):
|
|
|
|
+MONITOR myups@bigserver 1 observer abcd secondary+
|
|
|
|
Meanwhile, the `upsd.users` on `bigserver` would look like this:
|
|
|
|
[observer]
|
|
password = abcd
|
|
upsmon secondary
|
|
|
|
[upswired]
|
|
password = blah
|
|
upsmon primary
|
|
|
|
And the copy of upsmon on that bigserver would run with the primary
|
|
configuration:
|
|
|
|
+MONITOR myups@bigserver 1 upswired blah primary+
|
|
|
|
|
|
The 'type' refers to the relationship with linkman:upsd[8]. It can
|
|
be either "primary" or "secondary". See linkman:upsmon[8] for more
|
|
information on the meaning of these modes. The mode you pick here
|
|
also goes in the `upsd.users` file, as seen in the example above.
|
|
|
|
*NOCOMMWARNTIME* 'seconds'::
|
|
|
|
upsmon will trigger a NOTIFY_NOCOMM after this many seconds if it can't
|
|
reach any of the UPS entries in this configuration file. It keeps
|
|
warning you until the situation is fixed. By default this is 300
|
|
seconds.
|
|
|
|
*NOTIFYCMD* 'command'::
|
|
|
|
upsmon calls this to send messages when things happen.
|
|
+
|
|
This command is called with the full text of the message as one
|
|
argument. The environment string NOTIFYTYPE will contain the type
|
|
string of whatever caused this event to happen.
|
|
+
|
|
If you need to use linkman:upssched[8], then you must make it your
|
|
NOTIFYCMD by listing it here.
|
|
+
|
|
Note that this is only called for NOTIFY events that have EXEC set with
|
|
NOTIFYFLAG. See NOTIFYFLAG below for more details.
|
|
+
|
|
Making this some sort of shell script might not be a bad idea. For
|
|
more information and ideas, see docs/scheduling.txt
|
|
+
|
|
Remember, this command also needs to be one element in the configuration file,
|
|
so if your command has spaces, then wrap it in quotes.
|
|
+
|
|
+NOTIFYCMD "/path/to/script --foo --bar"+
|
|
+
|
|
This script is run in the background--that is, upsmon forks before it
|
|
calls out to start it. This means that your NOTIFYCMD may have multiple
|
|
instances running simultaneously if a lot of stuff happens all at once.
|
|
Keep this in mind when designing complicated notifiers.
|
|
|
|
*NOTIFYMSG* 'type' 'message'::
|
|
|
|
upsmon comes with a set of stock messages for various events. You can
|
|
change them if you like.
|
|
|
|
NOTIFYMSG ONLINE "UPS %s is getting line power"
|
|
|
|
NOTIFYMSG ONBATT "Someone pulled the plug on %s"
|
|
+
|
|
Note that +%s+ is replaced with the identifier of the UPS in question.
|
|
+
|
|
The message must be one element in the configuration file, so if it
|
|
contains spaces, you must wrap it in quotes.
|
|
|
|
NOTIFYMSG NOCOMM "Someone stole UPS %s"
|
|
+
|
|
Possible values for 'type':
|
|
|
|
ONLINE;; UPS is back online
|
|
|
|
ONBATT;; UPS is on battery
|
|
|
|
LOWBATT;; UPS is on battery and has a low battery (is critical)
|
|
|
|
FSD;; UPS is being shutdown by the primary (FSD = "Forced Shutdown")
|
|
|
|
COMMOK;; Communications established with the UPS
|
|
|
|
COMMBAD;; Communications lost to the UPS
|
|
|
|
SHUTDOWN;; The system is being shutdown
|
|
|
|
REPLBATT;; The UPS battery is bad and needs to be replaced
|
|
|
|
NOCOMM;; A UPS is unavailable (can't be contacted for monitoring)
|
|
|
|
*NOTIFYFLAG* 'type' 'flag'[+'flag']...::
|
|
|
|
By default, upsmon sends walls global messages to all logged in users)
|
|
via /bin/wall and writes to the syslog when things happen. You can
|
|
change this.
|
|
+
|
|
Examples:
|
|
+
|
|
NOTIFYFLAG ONLINE SYSLOG
|
|
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
|
|
+
|
|
Possible values for the flags:
|
|
+
|
|
SYSLOG;; Write the message to the syslog
|
|
|
|
WALL;; Write the message to all users with /bin/wall
|
|
|
|
EXEC;; Execute NOTIFYCMD (see above) with the message
|
|
|
|
IGNORE;; Don't do anything
|
|
+
|
|
If you use IGNORE, don't use any other flags on the same line.
|
|
|
|
*POLLFREQ* 'seconds'::
|
|
|
|
Normally upsmon polls the linkman:upsd[8] server every 5 seconds. If this
|
|
is flooding your network with activity, you can make it higher. You can
|
|
also make it lower to get faster updates in some cases.
|
|
+
|
|
There are some catches. First, if you set the POLLFREQ too high, you
|
|
may miss short-lived power events entirely. You also risk triggering
|
|
the DEADTIME (see above) if you use a very large number.
|
|
+
|
|
Second, there is a point of diminishing returns if you set it too low.
|
|
While upsd normally has all of the data available to it instantly, most
|
|
drivers only refresh the UPS status once every 2 seconds. Polling any
|
|
more than that usually doesn't get you the information any faster.
|
|
|
|
*POLLFREQALERT* 'seconds'::
|
|
|
|
This is the interval that upsmon waits between polls if any of its UPSes
|
|
are on battery. You can use this along with POLLFREQ above to slow down
|
|
polls during normal behavior, but get quicker updates when something bad
|
|
happens.
|
|
+
|
|
This should always be equal to or lower than the POLLFREQ value. By
|
|
default it is also set 5 seconds.
|
|
+
|
|
The warnings from the POLLFREQ entry about too-high and too-low values
|
|
also apply here.
|
|
|
|
*POWERDOWNFLAG* 'filename'::
|
|
|
|
upsmon creates this file when running in primary mode when the UPS needs
|
|
to be powered off. You should check for this file in your shutdown
|
|
scripts and call `upsdrvctl shutdown` if it exists.
|
|
+
|
|
This is done to forcibly reset the secondary systems, so they don't get
|
|
stuck at the "halted" stage even if the power returns during the shutdown
|
|
process. This usually does not work well on contact-closure UPSes that
|
|
use the genericups driver.
|
|
+
|
|
See the config-notes.txt file in the docs subdirectory for more information.
|
|
Refer to the section:
|
|
[[UPS_shutdown]] "Configuring automatic shutdowns for low battery events",
|
|
or refer to the online version.
|
|
|
|
*RBWARNTIME* 'seconds'::
|
|
|
|
When a UPS says that it needs to have its battery replaced, upsmon will
|
|
generate a NOTIFY_REPLBATT event. By default, this happens every 43200
|
|
seconds (12 hours).
|
|
+
|
|
If you need another value, set it here.
|
|
|
|
*RUN_AS_USER* 'username'::
|
|
|
|
upsmon normally runs the bulk of the monitoring duties under another user
|
|
ID after dropping root privileges. On most systems this means it runs
|
|
as "nobody", since that's the default from compile-time.
|
|
+
|
|
The catch is that "nobody" can't read your upsmon.conf, since by default
|
|
it is installed so that only root can open it. This means you won't be
|
|
able to reload the configuration file, since it will be unavailable.
|
|
+
|
|
The solution is to create a new user just for upsmon, then make it run
|
|
as that user. I suggest "nutmon", but you can use anything that isn't
|
|
already taken on your system. Just create a regular user with no special
|
|
privileges and an impossible password.
|
|
+
|
|
Then, tell upsmon to run as that user, and make `upsmon.conf` readable by it.
|
|
Your reloads will work, and your config file will stay secure.
|
|
+
|
|
This file should not be writable by the upsmon user, as it would be
|
|
possible to exploit a hole, change the SHUTDOWNCMD to something
|
|
malicious, then wait for upsmon to be restarted.
|
|
|
|
*SHUTDOWNCMD* 'command'::
|
|
|
|
upsmon runs this command when the system needs to be brought down. If
|
|
it is a secondary, it will do that immediately whenever the current
|
|
overall power value drops below the MINSUPPLIES value above.
|
|
+
|
|
When upsmon is a primary, it will allow any secondaries to log out before
|
|
starting the local shutdown procedure.
|
|
+
|
|
Note that the command needs to be one element in the config file. If
|
|
your shutdown command includes spaces, then put it in quotes to keep it
|
|
together, i.e.:
|
|
|
|
SHUTDOWNCMD "/sbin/shutdown -h +0"
|
|
|
|
*CERTPATH* 'certificate file or database'::
|
|
|
|
When compiled with SSL support, you can enter the certificate path here.
|
|
+
|
|
With NSS:;;
|
|
Certificates are stored in a dedicated database (data split in 3 files).
|
|
Specify the path of the database directory.
|
|
With OpenSSL:;;
|
|
Directory containing CA certificates in PEM format, used to verify
|
|
the server certificate presented by the upsd server. The files each
|
|
contain one CA certificate. The files are looked up by the CA subject
|
|
name hash value, which must hence be available.
|
|
|
|
*CERTIDENT* 'certificate name' 'database password'::
|
|
|
|
When compiled with SSL support with NSS, you can specify the certificate
|
|
name to retrieve from database to authenticate itself and the password
|
|
required to access certificate related private key.
|
|
|
|
*CERTHOST* 'hostname' 'certificate name' 'certverify' 'forcessl'::
|
|
|
|
When compiled with SSL support with NSS, you can specify security directive
|
|
for each server you can contact.
|
|
+
|
|
Each entry maps server name with the expected certificate name and flags
|
|
indicating if the server certificate is verified and if the connection
|
|
must be secure.
|
|
|
|
*CERTVERIFY* '0 | 1'::
|
|
|
|
When compiled with SSL support, make upsmon verify all connections with
|
|
certificates.
|
|
+
|
|
Without this, there is no guarantee that the upsd is the right host.
|
|
Enabling this greatly reduces the risk of man-in-the-middle attacks.
|
|
This effectively forces the use of SSL, so don't use this unless
|
|
all of your upsd hosts are ready for SSL and have their certificates
|
|
in order.
|
|
+
|
|
When compiled with NSS support of SSL, can be overridden for host
|
|
specified with a CERTHOST directive.
|
|
|
|
*FORCESSL* '0 | 1'::
|
|
|
|
When compiled with SSL, specify that a secured connection must be used
|
|
to communicate with upsd.
|
|
+
|
|
If you don't use 'CERTVERIFY 1', then this will at least make sure
|
|
that nobody can sniff your sessions without a large effort. Setting
|
|
this will make upsmon drop connections if the remote upsd doesn't
|
|
support SSL, so don't use it unless all of them have it running.
|
|
+
|
|
When compiled with NSS support of SSL, can be overridden for host
|
|
specified with a CERTHOST directive.
|
|
|
|
*DEBUG_MIN* 'INTEGER'::
|
|
|
|
Optionally specify a minimum debug level for `upsmon` daemon, e.g. for
|
|
troubleshooting a deployment, without impacting foreground or background
|
|
running mode directly. Command-line option `-D` can only increase this
|
|
verbosity level.
|
|
+
|
|
NOTE: if the running daemon receives a `reload` command, presence of the
|
|
`DEBUG_MIN NUMBER` value in the configuration file can be used to tune
|
|
debugging verbosity in the running service daemon (it is recommended to
|
|
comment it away or set the minimum to explicit zero when done, to avoid
|
|
huge journals and I/O system abuse). Keep in mind that for this run-time
|
|
tuning, the `DEBUG_MIN` value *present* in *reloaded* configuration files
|
|
is applied instantly and overrides any previously set value, from file
|
|
or CLI options, regardless of older logging level being higher or lower
|
|
than the newly found number; a missing (or commented away) value however
|
|
does not change the previously active logging verbosity.
|
|
|
|
SEE ALSO
|
|
--------
|
|
|
|
linkman:upsmon[8], linkman:upsd[8], linkman:nutupsdrv[8].
|
|
|
|
Internet resources:
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
The NUT (Network UPS Tools) home page: http://www.networkupstools.org/
|