Recent improvements to the local discovery mechanism makes it cheaper,
more network-friendly, and now it cannot make things worse (as opposed
to the old mechanism). Thus there is no reason not to enable it by
default.
The new local address based local discovery mechanism is technically
superior to the old broadcast-based one. In fact, the old algorithm
can technically make things worse by e.g. sending broadcasts over the
VPN itself and then selecting the VPN address as the node's UDP
address. This cannot happen with the new mechanism.
Note that this means old nodes that don't send their local addresses in
ADD_EDGE messages can't be discovered, because there is no address to
send discovery packets to. Old nodes can still discover new nodes by
sending them broadcasts, though.
This introduces a new way of doing local discovery: when tinc has
local address information for the recipient node, it will send local
discovery packets directly to the local address of that node, instead
of using broadcast packets.
This new way of doing local discovery provides numerous advantages compared to
using broadcasts:
- No broadcast packets "polluting" the local network;
- Reliable even if the sending host has multiple network interfaces (in
contrast, broadcasts will only be sent through one unpredictable
interface)
- Works even if the two hosts are not on the same broadcast domain. One
example is a large LAN where the two hosts might be on different local
subnets. In fact, thanks to UDP hole punching this might even work if
there is a NAT sitting in the middle of the LAN between the two nodes!
- Sometimes a node is reachable through its "normal" address, and via a
local subnet as well. One might think the local subnet is the best route
to the node in this case, but more often than not it's actually worse -
one example is where the local segment is a third party VPN running in
parallel, or ironically it can be the local segment formed by the tinc
VPN itself! Because this new algorithm only checks the addresses for
which an edge is already established, it is less likely to fall into
these traps.
When using socket functions, "sockerrno" is supposed to be used to
retrieve the error code as opposed to "errno", so that it is translated
to the correct call on Windows (WSAGetLastError() - Windows does not
update errno on socket errors). Unfortunately, the use of sockerrno is
inconsistent throughout the tinc codebase, as errno is often used
incorrectly on socket-related calls.
This commit fixes these oversights, which improves socket error
handling on Windows.
In send_sptps_data(), the len variable contains the length of the whole
datagram that needs to be sent to the peer, including the overhead from SPTPS
itself.
It seems like a lot of overhead to call access() for every possible extension
defined in PATHEXT, but apparently this is what Windows does itself too. At
least this avoids calling system() when the script one is looking for does not
exist at all.
Since the tinc utility also needs to call scripts, execute_script() is now
split off into its own source file.
This patch adds timestamp information to type 2 MTU probe replies. This
timestamp can then be used by the recipient to estimate bandwidth more
accurately, as jitter in the RX direction won't affect the results.
When replying to a PMTU probe, tinc sends a packet with the same length
as the PMTU probe itself, which is usually large (~1450 bytes). This is
not necessary: the other node wants to know the size of the PMTU probes
that have been received, but encoding this information as the actual
reply length is probably the most inefficient way to do it. It doubles
the bandwidth usage of the PMTU discovery process, and makes it less
reliable since large packets are more likely to be dropped.
This patch introduces a new PMTU probe reply type, encoded as type "2"
in the first byte of the packet, that indicates that the length of the
PMTU probe that is being replied to is encoded in the next two bytes of
the packet. Thus reply packets are only 3 bytes long.
(This also protects against very broken networks that drop very small
packets - yes, I've seen it happen on a subnet of a national ISP - in
such a case the PMTU probe replies will be dropped, and tinc won't
enable UDP communication, which is a good thing.)
Because legacy nodes won't understand type 2 probe replies, the minor
protocol number is bumped to 3.
Note that this also improves bandwidth estimation, as it is able to
measure bandwidth in both directions independently (the node receiving
the replies is measuring in the TX direction) and the use of smaller
reply packets might decrease the influence of jitter.
When LocalDiscovery is enabled, tinc normally sends broadcast packets during
PMTU discovery to the broadcast address (255.255.255.255 or ff02::1). This
option lets tinc use a different address.
At the moment only one LocalDiscoveryAddress can be specified.
This gets rid of the rest of the symbolic links. However, as a consequence, the
crypto header files have now moved to src/, and can no longer contain
library-specific declarations. Therefore, cipher_t, digest_t, ecdh_t, ecdsa_t
and rsa_t are now all opaque types, and only pointers to those types can be
used.
Normally all requests sent via the meta connections are checked so that they
cannot be larger than the input buffer. However, when packets are forwarded via
meta connections, they are copied into a packet buffer without checking whether
it fits into it. Since the packet buffer is allocated on the stack, this in
effect allows an authenticated remote node to cause a stack overflow.
This issue was found by Martin Schobert.
Tinc never restarts PMTU discovery unless a node becomes unreachable. However,
it can be that the PMTU was very low during the initial discovery, but has
increased later. To detect this, tinc now tries to send an extra packet every
PingInterval, with a size slightly higher than the currently known PMTU. If
this packet is succesfully received back, we partially restart PMTU discovery
to find out the new maximum.
Conflicts:
src/net_packet.c
Without adding any extra traffic, we can measure round trip times, estimate the
bandwidth and packet loss between nodes. The RTT and bandwidth can be measured
by timing the MTU probe packets. The RTT is the difference between the time a
burst of MTU probes was sent and when the first reply is received. The
bandwidth can be estimated by multiplying the size of the probe packets by the
time between succesive received probe replies of the same burst. The packet
loss can be estimated for incoming traffic by comparing how many packets have
actually been received to the increase in the sequence numbers.
The estimates are not perfect. Especially bandwidth is difficult to measure,
the only accurate way is to continuously send as much data as possible, but
that is obviously not desirable. The packet loss rate is also almost always
a few percent when sending a lot of data over the VPN via TCP, since TCP
*needs* packet loss to work properly.
Keep track of the number of correct, non-replayed UDP packets that have been
received, regardless of their content. This can be compared to the sequence
number to determine the real packet loss.
There are several reasons for this:
- MacOS/X doesn't support polling the tap device using kqueue, requiring a
workaround to fall back to select().
- On Windows only sockets are properly handled, therefore tinc uses a second
thread that does a blocking ReadFile() on the TAP-Win32/64 device. However,
this does not mix well with libevent.
- Libevent, event just the core, is quite large, and although it is easy to get
and install on many platforms, it can be a burden.
- Libev is more lightweight and seems technically superior, but it doesn't
abstract away all the platform differences (for example, async events are not
supported on Windows).
We don't need to search the whole edge tree, we can use the node's own edge
tree since each edge has a pointer to its reverse. Also, we do need to make
sure we try the reflexive address often.
Before it would always use the first socket, and always send an IPv4 broadcast packet. That
works fine in a lot of situations, but it is better to try all sockets, and to send IPv6 packets
on IPv6 sockets. This is especially important for users that are on IPv6-only networks or that
have multiple physical network interfaces, although in the latter case it probably requires
them to use the ListenAddress variable to create a separate socket for each interface.
Only the very first packet of an SPTPS session should be send with REQ_KEY,
this signals the peer to abort any previous session and start a new one as
well.
The tree functions were never used on the connection_tree, a list is more appropriate.
Also be more paranoid about connections disappearing while traversing the list.
Similar to old style key exchange requests, keep track of whether a key
exchange is already in progress and how long it took. If no key is known yet
or if key exchange takes too long, (re)start a new key exchange.
When two nodes which support SPTPS want to send packets to each other, they now
always use SPTPS. The node initiating the SPTPS session send the first SPTPS
packet via an extended REQ_KEY messages. All other handshake messages are sent
using ANS_KEY messages. This ensures that intermediate nodes using an older
version of tinc can still help with NAT traversal. After the authentication
phase is over, SPTPS packets are sent via UDP, or are encapsulated in extended
REQ_KEY messages instead of PACKET messages.
When the "Broadcast = direct" option is used, broadcast packets are not sent
and forwarded via the Minimum Spanning Tree to all nodes, but are sent directly
to all nodes that can be reached in one hop.
One use for this is to allow running ad-hoc routing protocols, such as OLSR, on
top of tinc.
This allows tincctl to receive log messages from a running tincd,
independent of what is logged to syslog or to file. Tincctl can receive
debug messages with an arbitrary level.
Apart from the platform specific tun/tap driver, link with the dummy and
raw_socket devices, and optionally with support for UML and VDE devices.
At runtime, the DeviceType option can be used to select which driver to
use.
Probably due to a merge, the try_harder() function had duplicated the
rate-limiting code for detecting the sender node based on the HMAC of the
packet. This prevented this detection from running at all. The function is now
identical again to that in the 1.0 branch.
Because we don't want to keep track of that, and this will cause the node
structure from being relinked into the node tree, which results in myself
pointing to an invalid address.
When a UDP packet was received with an unknown source address/port, and if it
failed a HMAC check against known keys, it could still incorrectly assign that
UDP address to another node. This would temporarily cause outgoing UDP packets
to go to the wrong destination address, until packets from the correct address
were received again.
Before, if MTU probes failed, tinc would stop sending probes until the next
time keys were regenerated (by default, once every hour). Now it continues to
send them every PingInterval, so it recovers faster from temporary failures.
When we got a key request for or from a node we don't know, we disconnected the
node that forwarded us that request. However, especially in TunnelServer mode,
disconnecting does not help. We now ignore such requests, but since there is no
way of telling the original sender that the request was dropped, we now retry
sending REQ_KEY requests when we don't get an ANS_KEY back.
This wasn't working at all, since we didn't do HMAC but just a plain hash.
Also, verification of packets failed because it was checking the whole packet,
not the packet minus the HMAC.
If MTU probing discovered a node was not reachable via UDP, packets for it were
forwarded to the next hop, but always via TCP, even if the next hop was
reachable via UDP. This is now fixed by retrying to send the packet using
send_packet() if the destination is not the same as the nexthop.