This makes sure UDP_INFO messages are only sent at the maximum rate of
5 per second (by default). As usual with these "probe" mechanisms, the
rate of these messages cannot be higher than the rate of data packets
themselves, since they are sent from the RX path.
If we receive any traffic from another node, we periodically send back a
gratuitous type 2 probe reply with the maximum received packet length.
On the other node, this causes the udp and perhaps mtu probe timers to
be reset, so it does not need to send a probe request. Gratuitous probe
replies from another node also count as received traffic for this
purpose, so for nodes that also have a meta-connection, UDP keepalive
packets in principle can now solely be type 2 replies. This reduces the
amount of probe traffic even more.
To work, gratuitous replies should be sent slightly more often than
udp_discovery_keepalive_interval, so probe requests won't be triggered.
This also means that the timer resolution must be smaller than the
difference between the two, and at the moment it's kind of a hack.
This is not working at all anymore. Just remove it, and we'll do another
attempt at RTT, bandwidth and packet loss estimation after the new
probing code stabilizes.
In tinc 1.0.x, this was tracked in node->inkey, however in tinc 1.1 we have an abstraction layer for
the legacy cipher and digest, and we don't keep an explicit copy of the key around. We cannot use
cipher_active() or digest_active(), since it is possible to set both to the null algorithm. So add a bit to
node_status_t.
tinc bandwidth estimation has always been quite unreliable (at least in
my experience), but there's no chance of it working anymore since the
last changes to MTU discovery code, because packets are not sent in
batches of three anymore.
This commit removes the dead code - fortunately, nothing depends on this
estimation (it's not even shown in node info). We probably need be
smarter about this if we do want this estimation back.
Currently, the PMTU discovery code is run by a timeout callback,
independently of tunnel activity. This commit moves it into the TX
path, meaning that send_mtu_probe_handler() is only called if a
packet is about to be sent. Consequently, it has been renamed to
try_mtu() for consistency with try_tx(), try_udp() and try_sptps().
Running PMTU discovery code only as part of the TX path prevents
PMTU discovery from generating unreasonable amounts of traffic when
the "real" traffic is negligible. One extreme example is sending one
real packet and then going silent: in the current code this one little
packet will result in the entire PMTU discovery algorithm being run
from start to finish, resulting in absurd write traffic amplification.
With this patch, PMTU discovery stops as soon as "real" packets stop
flowing, and will be no more aggressive than the underlying traffic.
Furthermore, try_mtu() only runs if there is confirmed UDP
connectivity as per the UDP discovery mechanism. This prevents
unnecessary network chatter - previously, the PMTU discovery code
would send bursts of (potentially large) probe packets every second
even if there was nothing on the other side. With this patch, the
PMTU code only does that if something replied to the lightweight UDP
discovery pings.
These inefficiencies were made even worse when the node is not a
direct neighbour, as tinc will use PMTU discovery both on the
destination node *and* the relay. UDP discovery is more lightweight for
this purpose.
As a bonus, this code simplifies overall code somewhat - state is
easier to manage when code is run in predictable contexts as opposed
to "surprise callbacks". In addition, there is no need to call PMTU
discovery code outside of net_packet.c anymore, thereby simplifying
module boundaries.
This adds a new mechanism by which tinc can determine if a node is
reachable via UDP. The new mechanism is currently redundant with the
PMTU discovery mechanism - that will be fixed in a future commit.
Conceptually, the UDP discovery mechanism works similarly to PMTU
discovery: it sends UDP probes (of minmtu size, to make sure the tunnel
is fully usable), and assumes UDP is usable if it gets replies. It
assumes UDP is broken if too much time has passed since the last reply.
The big difference with the current PMTU discovery mechanism, however,
is that UDP discovery probes are only triggered as part of the
packet TX path (through try_tx()). This is quite interesting, because
it means tinc will never send UDP pings more often than normal packets,
and most importantly, it will automatically stop sending pings as soon
as packets stop flowing, thereby nicely reducing network chatter.
Of course, there are small drawbacks in some edge cases: for example,
if a node only sends one packet every minute to another node, these
packets will only be sent over TCP, because the interval between packets
is too long for tinc to maintain the UDP tunnel. I consider this a
feature, not a bug: I believe it is appropriate to use TCP in scenarios
where traffic is negligible, so that we don't pollute the network with
pings just to maintain a UDP tunnel that's seeing negligible usage.
The option "--disable-legacy-protocol" was added to the configure
script. The new protocol does not depend on any external crypto
libraries, so when the option is used tinc is no longer linked to
OpenSSL's libcrypto.
This introduces a new type of identifier for nodes, which complements
node names: node IDs. Node IDs are defined as the first 6 bytes of the
SHA-256 hash of the node name. They will be used in future code in lieu
of node names as unique node identifiers in contexts where space is at
a premium (such as VPN packets).
The semantics of node IDs is that they are supposed to be unique in a
tinc graph; i.e. two different nodes that are part of the same graph
should not have the same ID, otherwise things could break. This
solution provides this guarantee based on realistic probabilities:
indeed, according to the birthday problem, with a 48-bit hash, the
probability of at least one collision is 1e-13 with 10 nodes, 1e-11
with 100 nodes, 1e-9 with 1000 nodes and 1e-7 with 10000 nodes. Things
only start getting hairy with more than 1 million nodes, as the
probability gets over 0.2%.
This introduces a new way of doing local discovery: when tinc has
local address information for the recipient node, it will send local
discovery packets directly to the local address of that node, instead
of using broadcast packets.
This new way of doing local discovery provides numerous advantages compared to
using broadcasts:
- No broadcast packets "polluting" the local network;
- Reliable even if the sending host has multiple network interfaces (in
contrast, broadcasts will only be sent through one unpredictable
interface)
- Works even if the two hosts are not on the same broadcast domain. One
example is a large LAN where the two hosts might be on different local
subnets. In fact, thanks to UDP hole punching this might even work if
there is a NAT sitting in the middle of the LAN between the two nodes!
- Sometimes a node is reachable through its "normal" address, and via a
local subnet as well. One might think the local subnet is the best route
to the node in this case, but more often than not it's actually worse -
one example is where the local segment is a third party VPN running in
parallel, or ironically it can be the local segment formed by the tinc
VPN itself! Because this new algorithm only checks the addresses for
which an edge is already established, it is less likely to fall into
these traps.
This gets rid of the rest of the symbolic links. However, as a consequence, the
crypto header files have now moved to src/, and can no longer contain
library-specific declarations. Therefore, cipher_t, digest_t, ecdh_t, ecdsa_t
and rsa_t are now all opaque types, and only pointers to those types can be
used.
Without adding any extra traffic, we can measure round trip times, estimate the
bandwidth and packet loss between nodes. The RTT and bandwidth can be measured
by timing the MTU probe packets. The RTT is the difference between the time a
burst of MTU probes was sent and when the first reply is received. The
bandwidth can be estimated by multiplying the size of the probe packets by the
time between succesive received probe replies of the same burst. The packet
loss can be estimated for incoming traffic by comparing how many packets have
actually been received to the increase in the sequence numbers.
The estimates are not perfect. Especially bandwidth is difficult to measure,
the only accurate way is to continuously send as much data as possible, but
that is obviously not desirable. The packet loss rate is also almost always
a few percent when sending a lot of data over the VPN via TCP, since TCP
*needs* packet loss to work properly.
Keep track of the number of correct, non-replayed UDP packets that have been
received, regardless of their content. This can be compared to the sequence
number to determine the real packet loss.
There are several reasons for this:
- MacOS/X doesn't support polling the tap device using kqueue, requiring a
workaround to fall back to select().
- On Windows only sockets are properly handled, therefore tinc uses a second
thread that does a blocking ReadFile() on the TAP-Win32/64 device. However,
this does not mix well with libevent.
- Libevent, event just the core, is quite large, and although it is easy to get
and install on many platforms, it can be a burden.
- Libev is more lightweight and seems technically superior, but it doesn't
abstract away all the platform differences (for example, async events are not
supported on Windows).
Similar to old style key exchange requests, keep track of whether a key
exchange is already in progress and how long it took. If no key is known yet
or if key exchange takes too long, (re)start a new key exchange.
When two nodes which support SPTPS want to send packets to each other, they now
always use SPTPS. The node initiating the SPTPS session send the first SPTPS
packet via an extended REQ_KEY messages. All other handshake messages are sent
using ANS_KEY messages. This ensures that intermediate nodes using an older
version of tinc can still help with NAT traversal. After the authentication
phase is over, SPTPS packets are sent via UDP, or are encapsulated in extended
REQ_KEY messages instead of PACKET messages.
REQ_KEY requests have an extra field indicating key exchange version.
If it is present and > 0, the sender supports ECDH. If the receiver also
does, then it will generate a new keypair and sends the public key in a
ANS_KEY request with "ECDH:" prefixed. The ans_key_h() function will
compute the shared secret, which, at the moment,is used as is to set the
cipher and HMAC keys. However, this must be changed to use a proper KDF.
In the future, the ECDH key exchange must also be signed.
When we got a key request for or from a node we don't know, we disconnected the
node that forwarded us that request. However, especially in TunnelServer mode,
disconnecting does not help. We now ignore such requests, but since there is no
way of telling the original sender that the request was dropped, we now retry
sending REQ_KEY requests when we don't get an ANS_KEY back.
The control socket code was completely different from how meta connections are
handled, resulting in lots of extra code to handle requests. Also, not every
operating system has UNIX sockets, so we have to resort to another type of
sockets or pipes for those anyway. To reduce code duplication and make control
sockets work the same on all platforms, we now just connect to the TCP port
where tincd is already listening on.
To authenticate, the program that wants to control a running tinc daemon must
send the contents of a cookie file. The cookie is a random 256 bits number that
is regenerated every time tincd starts. The cookie file should only be readable
by the same user that can start a tincd.
Instead of the binary-ish protocol previously used, we now use an ASCII
protocol similar to that of the meta connections, but this can still change.
Since event.h is not part of tinc, we include it in have.h were all other
system header files are included. We also ensure -levent comes before -lgdi32
when compiling with MinGW, apparently it doesn't work when the order is
reversed.
Options should have a fixed width anyway, but this also fixes a possible MinGW
compiler bug where %lx tries to print a 64 bit value, even though a long int is
only 32 bits.