In retry() the function do_outgoing_connection() is called, which can delete
items from the connection_tree, so when walking the tree we must first save the
pointer to the next item.
DeviceType = multicast allows one to specify a multicast address and port with
a Device statement. Tinc will then read/send packets to that multicast group
instead of to a tun/tap device. This allows interaction with UML, QEMU and KVM
instances that are listening on the same group.
When making outgoing connections, tinc goes through the list of Addresses and
tries all of them until one succeeds. However, before it would consider
establishing a TCP connection a success, even when the authentication failed.
This would be a problem if the first Address would point to a hostname and port
combination that belongs to the wrong tinc node, or perhaps even to a non-tinc
service, causing tinc to endlessly try this Address instead of moving to the
next one.
Problem found by Delf Eldkraft.
This allows tincctl to receive log messages from a running tincd,
independent of what is logged to syslog or to file. Tincctl can receive
debug messages with an arbitrary level.
Apart from the platform specific tun/tap driver, link with the dummy and
raw_socket devices, and optionally with support for UML and VDE devices.
At runtime, the DeviceType option can be used to select which driver to
use.
First of all, if there really are two nodes with the same name, much
more than 10 contradicting ADD_EDGE and DEL_EDGE messages will be sent.
Also, we forgot to reset the counters when nothing happened.
In case there is a ADD_EDGE/DEL_EDGE storm, we do not shut down, but
sleep an increasing amount of time, allowing tinc to recover gracefully
from temporary failures.
Instead of UNIX time, the log messages now start with the time in RFC3339
format, which human-readable and still easy for the computer to parse and sort.
The HUP signal will also cause the log file to be closed and reopened, which is
useful when log rotation is used. If there is an error while opening the log
file, this is logged to stderr.
In commit 4a21aabada, code was added to detect
contradicting ADD_EDGE and DEL_EDGE messages being sent, which is an indication
of two nodes with the same Name connected to the same VPN. However, these
contradictory messages can also happen when there is a network partitioning. In
the former case a loop happens which causes many contradictory message, while
in the latter case only a few of those messages will be sent. So, now we
increase the threshold to at least 10 of both ADD_EDGE and DEL_EDGE messages.
Although transient errors sometimes happen on the tun/tap device (for example,
if the kernel is temporarily out of buffer space), there are situations where
the tun/tap device becomes permanently broken. Instead of endlessly spamming
the syslog, we now sleep an increasing amount of time between consecutive read
errors, and if reads still fail after 10 attempts (approximately 3 seconds),
tinc will quit.
In this situation, the two nodes will start fighting over the edges they announced.
When we have to contradict both ADD_EDGE and DEL_EDGE messages, we log a warning,
and with 25% chance per PingTimeout we quit.
If a node is unreachable, and not connected to an edge anymore, it gets
deleted. When this happens its subnets are also removed, which should
not happen with StrictSubnets=yes.
Solution:
- do not remove subnets in src/net.c::purge(), we know that all subnets
in the list came from our hosts files.
I think here you got the check wrong by looking at the tunnelserver
code below it - with strictsubnets we still inform others but do not
remove the subnet from our data.
- do not remove nodes in net.c::purge() that still have subnets
attached.
When this option is enabled, tinc will not accept dynamic updates of Subnets
from other nodes, but will only use Subnets read from local host config files
to build its routing table.
Before, we immediately retried select() if it returned -1 and errno is EAGAIN
or EINTR, and if it returned 0 it would check for network events even if we
know there are none. Now, if -1 or 0 is returned we skip checking network
events, but we do check for timer and signal events.
One reason to send the ALRM signal is to let tinc immediately try to connect to
outgoing nodes, for example when PPP or DHCP configuration of the outgoing
interface finished. Conversely, when the outgoing interface goes down one can
now send this signal to let tinc quickly detect that links are down too.
UNIX domain sockets, of course, don't exist on Windows. For now, when compiling
tinc in a MinGW environment, try to use a TCP socket bound to localhost as an
alternative.