This adds a new mechanism by which tinc can determine if a node is
reachable via UDP. The new mechanism is currently redundant with the
PMTU discovery mechanism - that will be fixed in a future commit.
Conceptually, the UDP discovery mechanism works similarly to PMTU
discovery: it sends UDP probes (of minmtu size, to make sure the tunnel
is fully usable), and assumes UDP is usable if it gets replies. It
assumes UDP is broken if too much time has passed since the last reply.
The big difference with the current PMTU discovery mechanism, however,
is that UDP discovery probes are only triggered as part of the
packet TX path (through try_tx()). This is quite interesting, because
it means tinc will never send UDP pings more often than normal packets,
and most importantly, it will automatically stop sending pings as soon
as packets stop flowing, thereby nicely reducing network chatter.
Of course, there are small drawbacks in some edge cases: for example,
if a node only sends one packet every minute to another node, these
packets will only be sent over TCP, because the interval between packets
is too long for tinc to maintain the UDP tunnel. I consider this a
feature, not a bug: I believe it is appropriate to use TCP in scenarios
where traffic is negligible, so that we don't pollute the network with
pings just to maintain a UDP tunnel that's seeing negligible usage.
The option "--disable-legacy-protocol" was added to the configure
script. The new protocol does not depend on any external crypto
libraries, so when the option is used tinc is no longer linked to
OpenSSL's libcrypto.
This introduces a new type of identifier for nodes, which complements
node names: node IDs. Node IDs are defined as the first 6 bytes of the
SHA-256 hash of the node name. They will be used in future code in lieu
of node names as unique node identifiers in contexts where space is at
a premium (such as VPN packets).
The semantics of node IDs is that they are supposed to be unique in a
tinc graph; i.e. two different nodes that are part of the same graph
should not have the same ID, otherwise things could break. This
solution provides this guarantee based on realistic probabilities:
indeed, according to the birthday problem, with a 48-bit hash, the
probability of at least one collision is 1e-13 with 10 nodes, 1e-11
with 100 nodes, 1e-9 with 1000 nodes and 1e-7 with 10000 nodes. Things
only start getting hairy with more than 1 million nodes, as the
probability gets over 0.2%.
This introduces a new way of doing local discovery: when tinc has
local address information for the recipient node, it will send local
discovery packets directly to the local address of that node, instead
of using broadcast packets.
This new way of doing local discovery provides numerous advantages compared to
using broadcasts:
- No broadcast packets "polluting" the local network;
- Reliable even if the sending host has multiple network interfaces (in
contrast, broadcasts will only be sent through one unpredictable
interface)
- Works even if the two hosts are not on the same broadcast domain. One
example is a large LAN where the two hosts might be on different local
subnets. In fact, thanks to UDP hole punching this might even work if
there is a NAT sitting in the middle of the LAN between the two nodes!
- Sometimes a node is reachable through its "normal" address, and via a
local subnet as well. One might think the local subnet is the best route
to the node in this case, but more often than not it's actually worse -
one example is where the local segment is a third party VPN running in
parallel, or ironically it can be the local segment formed by the tinc
VPN itself! Because this new algorithm only checks the addresses for
which an edge is already established, it is less likely to fall into
these traps.
This gets rid of the rest of the symbolic links. However, as a consequence, the
crypto header files have now moved to src/, and can no longer contain
library-specific declarations. Therefore, cipher_t, digest_t, ecdh_t, ecdsa_t
and rsa_t are now all opaque types, and only pointers to those types can be
used.
Without adding any extra traffic, we can measure round trip times, estimate the
bandwidth and packet loss between nodes. The RTT and bandwidth can be measured
by timing the MTU probe packets. The RTT is the difference between the time a
burst of MTU probes was sent and when the first reply is received. The
bandwidth can be estimated by multiplying the size of the probe packets by the
time between succesive received probe replies of the same burst. The packet
loss can be estimated for incoming traffic by comparing how many packets have
actually been received to the increase in the sequence numbers.
The estimates are not perfect. Especially bandwidth is difficult to measure,
the only accurate way is to continuously send as much data as possible, but
that is obviously not desirable. The packet loss rate is also almost always
a few percent when sending a lot of data over the VPN via TCP, since TCP
*needs* packet loss to work properly.
Keep track of the number of correct, non-replayed UDP packets that have been
received, regardless of their content. This can be compared to the sequence
number to determine the real packet loss.
There are several reasons for this:
- MacOS/X doesn't support polling the tap device using kqueue, requiring a
workaround to fall back to select().
- On Windows only sockets are properly handled, therefore tinc uses a second
thread that does a blocking ReadFile() on the TAP-Win32/64 device. However,
this does not mix well with libevent.
- Libevent, event just the core, is quite large, and although it is easy to get
and install on many platforms, it can be a burden.
- Libev is more lightweight and seems technically superior, but it doesn't
abstract away all the platform differences (for example, async events are not
supported on Windows).
Similar to old style key exchange requests, keep track of whether a key
exchange is already in progress and how long it took. If no key is known yet
or if key exchange takes too long, (re)start a new key exchange.
When two nodes which support SPTPS want to send packets to each other, they now
always use SPTPS. The node initiating the SPTPS session send the first SPTPS
packet via an extended REQ_KEY messages. All other handshake messages are sent
using ANS_KEY messages. This ensures that intermediate nodes using an older
version of tinc can still help with NAT traversal. After the authentication
phase is over, SPTPS packets are sent via UDP, or are encapsulated in extended
REQ_KEY messages instead of PACKET messages.
REQ_KEY requests have an extra field indicating key exchange version.
If it is present and > 0, the sender supports ECDH. If the receiver also
does, then it will generate a new keypair and sends the public key in a
ANS_KEY request with "ECDH:" prefixed. The ans_key_h() function will
compute the shared secret, which, at the moment,is used as is to set the
cipher and HMAC keys. However, this must be changed to use a proper KDF.
In the future, the ECDH key exchange must also be signed.
When we got a key request for or from a node we don't know, we disconnected the
node that forwarded us that request. However, especially in TunnelServer mode,
disconnecting does not help. We now ignore such requests, but since there is no
way of telling the original sender that the request was dropped, we now retry
sending REQ_KEY requests when we don't get an ANS_KEY back.
The control socket code was completely different from how meta connections are
handled, resulting in lots of extra code to handle requests. Also, not every
operating system has UNIX sockets, so we have to resort to another type of
sockets or pipes for those anyway. To reduce code duplication and make control
sockets work the same on all platforms, we now just connect to the TCP port
where tincd is already listening on.
To authenticate, the program that wants to control a running tinc daemon must
send the contents of a cookie file. The cookie is a random 256 bits number that
is regenerated every time tincd starts. The cookie file should only be readable
by the same user that can start a tincd.
Instead of the binary-ish protocol previously used, we now use an ASCII
protocol similar to that of the meta connections, but this can still change.
Since event.h is not part of tinc, we include it in have.h were all other
system header files are included. We also ensure -levent comes before -lgdi32
when compiling with MinGW, apparently it doesn't work when the order is
reversed.
Options should have a fixed width anyway, but this also fixes a possible MinGW
compiler bug where %lx tries to print a 64 bit value, even though a long int is
only 32 bits.
Previously, tinc used a fixed address and port for each node for UDP packet
exchange. The port was the one advertised by that node as its listening port.
However, due to NAT the port might be different. Now, tinc sends a different
session key to each node. This way, the sending node can be determined from
incoming packets by checking the MAC against all session keys. If a match is
found, the address and port for that node are updated.