Apart from the platform specific tun/tap driver, link with the dummy and
raw_socket devices, and optionally with support for UML and VDE devices.
At runtime, the DeviceType option can be used to select which driver to
use.
First of all, if there really are two nodes with the same name, much
more than 10 contradicting ADD_EDGE and DEL_EDGE messages will be sent.
Also, we forgot to reset the counters when nothing happened.
In case there is a ADD_EDGE/DEL_EDGE storm, we do not shut down, but
sleep an increasing amount of time, allowing tinc to recover gracefully
from temporary failures.
Instead of UNIX time, the log messages now start with the time in RFC3339
format, which human-readable and still easy for the computer to parse and sort.
The HUP signal will also cause the log file to be closed and reopened, which is
useful when log rotation is used. If there is an error while opening the log
file, this is logged to stderr.
In commit 4a21aabada, code was added to detect
contradicting ADD_EDGE and DEL_EDGE messages being sent, which is an indication
of two nodes with the same Name connected to the same VPN. However, these
contradictory messages can also happen when there is a network partitioning. In
the former case a loop happens which causes many contradictory message, while
in the latter case only a few of those messages will be sent. So, now we
increase the threshold to at least 10 of both ADD_EDGE and DEL_EDGE messages.
Although transient errors sometimes happen on the tun/tap device (for example,
if the kernel is temporarily out of buffer space), there are situations where
the tun/tap device becomes permanently broken. Instead of endlessly spamming
the syslog, we now sleep an increasing amount of time between consecutive read
errors, and if reads still fail after 10 attempts (approximately 3 seconds),
tinc will quit.
In this situation, the two nodes will start fighting over the edges they announced.
When we have to contradict both ADD_EDGE and DEL_EDGE messages, we log a warning,
and with 25% chance per PingTimeout we quit.
If a node is unreachable, and not connected to an edge anymore, it gets
deleted. When this happens its subnets are also removed, which should
not happen with StrictSubnets=yes.
Solution:
- do not remove subnets in src/net.c::purge(), we know that all subnets
in the list came from our hosts files.
I think here you got the check wrong by looking at the tunnelserver
code below it - with strictsubnets we still inform others but do not
remove the subnet from our data.
- do not remove nodes in net.c::purge() that still have subnets
attached.
When this option is enabled, tinc will not accept dynamic updates of Subnets
from other nodes, but will only use Subnets read from local host config files
to build its routing table.
Before, we immediately retried select() if it returned -1 and errno is EAGAIN
or EINTR, and if it returned 0 it would check for network events even if we
know there are none. Now, if -1 or 0 is returned we skip checking network
events, but we do check for timer and signal events.
One reason to send the ALRM signal is to let tinc immediately try to connect to
outgoing nodes, for example when PPP or DHCP configuration of the outgoing
interface finished. Conversely, when the outgoing interface goes down one can
now send this signal to let tinc quickly detect that links are down too.
UNIX domain sockets, of course, don't exist on Windows. For now, when compiling
tinc in a MinGW environment, try to use a TCP socket bound to localhost as an
alternative.
When the HUP signal is sent while some outgoing connections have not been made
yet, or are being retried, a NULL pointer could be dereferenced resulting in
tinc crashing. We fix this by more careful handling of outgoing_ts, and by
deleting all connections that have not been fully activated yet at the HUP
signal is received.
Git's log and blame tools were used to find out which files had significant
contributions from authors who sent in patches that were applied before we used
git.
This feature is not necessary anymore since we have tools like valgrind today
that can catch stack overflow errors before they make a backtrace in gdb
impossible.
The TAP-Win32 device is not a socket, and select() under Windows only works
with sockets. Tinc used a separate thread to read from the TAP-Win32 device,
and passed this via a local socket to the main thread which could then select()
from it. We now use a global mutex, which is only unlocked when the main thread
is waiting for select(), to allow the TAP reader thread to process packets
directly.