Currently, the PMTU discovery code is run by a timeout callback,
independently of tunnel activity. This commit moves it into the TX
path, meaning that send_mtu_probe_handler() is only called if a
packet is about to be sent. Consequently, it has been renamed to
try_mtu() for consistency with try_tx(), try_udp() and try_sptps().
Running PMTU discovery code only as part of the TX path prevents
PMTU discovery from generating unreasonable amounts of traffic when
the "real" traffic is negligible. One extreme example is sending one
real packet and then going silent: in the current code this one little
packet will result in the entire PMTU discovery algorithm being run
from start to finish, resulting in absurd write traffic amplification.
With this patch, PMTU discovery stops as soon as "real" packets stop
flowing, and will be no more aggressive than the underlying traffic.
Furthermore, try_mtu() only runs if there is confirmed UDP
connectivity as per the UDP discovery mechanism. This prevents
unnecessary network chatter - previously, the PMTU discovery code
would send bursts of (potentially large) probe packets every second
even if there was nothing on the other side. With this patch, the
PMTU code only does that if something replied to the lightweight UDP
discovery pings.
These inefficiencies were made even worse when the node is not a
direct neighbour, as tinc will use PMTU discovery both on the
destination node *and* the relay. UDP discovery is more lightweight for
this purpose.
As a bonus, this code simplifies overall code somewhat - state is
easier to manage when code is run in predictable contexts as opposed
to "surprise callbacks". In addition, there is no need to call PMTU
discovery code outside of net_packet.c anymore, thereby simplifying
module boundaries.
This adds a new mechanism by which tinc can determine if a node is
reachable via UDP. The new mechanism is currently redundant with the
PMTU discovery mechanism - that will be fixed in a future commit.
Conceptually, the UDP discovery mechanism works similarly to PMTU
discovery: it sends UDP probes (of minmtu size, to make sure the tunnel
is fully usable), and assumes UDP is usable if it gets replies. It
assumes UDP is broken if too much time has passed since the last reply.
The big difference with the current PMTU discovery mechanism, however,
is that UDP discovery probes are only triggered as part of the
packet TX path (through try_tx()). This is quite interesting, because
it means tinc will never send UDP pings more often than normal packets,
and most importantly, it will automatically stop sending pings as soon
as packets stop flowing, thereby nicely reducing network chatter.
Of course, there are small drawbacks in some edge cases: for example,
if a node only sends one packet every minute to another node, these
packets will only be sent over TCP, because the interval between packets
is too long for tinc to maintain the UDP tunnel. I consider this a
feature, not a bug: I believe it is appropriate to use TCP in scenarios
where traffic is negligible, so that we don't pollute the network with
pings just to maintain a UDP tunnel that's seeing negligible usage.
The offset value indicates where the actual payload starts, so we can
process both legacy and SPTPS UDP packets without having to do casting
tricks and/or moving memory around.
This commit changes the layout of UDP datagrams to include a 6-byte
destination node ID at the very beginning of the datagram (i.e. before
the source node ID and the seqno). Note that this only applies to SPTPS.
Thanks to this new field, it is now possible to send SPTPS datagrams to
nodes that are not the final recipient of the packets, thereby using
these nodes as relay nodes. Previously SPTPS was unable to relay packets
using UDP, and required a fallback to TCP if the final recipient could
not be contacted directly using UDP. In that sense it fixes a regression
that SPTPS introduced with regard to the legacy protocol.
This change also updates tinc's low-level routing logic (i.e.
send_sptps_data()) to automatically use this relaying facility if at all
possible. Specifically, it will relay packets if we don't have a
confirmed UDP link to the final recipient (but we have one with the next
hop node), or if IndirectData is specified. This is similar to how the
legacy protocol forwards packets.
When sending packets directly without any relaying, the sender node uses
a special value for the destination node ID: instead of setting the
field to the ID of the recipient node, it writes a zero ID instead. This
allows the recipient node to distinguish between a relayed packet and a
direct packet, which is important when determining the UDP address of
the sending node.
On the relay side, relay nodes will happily relay packets that have a
destination ID which is non-zero *and* is different from their own,
provided that the source IP address of the packet is known. This is to
prevent abuse by random strangers, since a node can't authenticate the
packets that are being relayed through it.
This change keeps the protocol number from the previous datagram format
change (source IDs), 17.4. Compatibility is still preserved with 1.0 and
with pre-1.1 releases. Note, however, that nodes running this code won't
understand datagrams sent from nodes that only use source IDs and
vice-versa (not that we really care).
There is one caveat: in the current state, there is no way for the
original sender to know what the PMTU is beyond the first hop, and
contrary to the legacy protocol, relay nodes can't apply MSS clamping
because they can't decrypt the relayed packets. This leads to
inefficient scenarios where a reduced PMTU over some link that's part of
the relay path will result in relays falling back to TCP to send packets
to their final destinations.
Another caveat is that once a packet gets sent over TCP, it will use
TCP over the entire path, even if it is technically possible to use UDP
beyond the TCP-only link(s).
Arguably, these two caveats can be fixed by improving the
metaconnection protocol, but that's out of scope for this change. TODOs
are added instead. In any case, this is no worse than before.
In addition, this change increases SPTPS datagram overhead by another
6 bytes for the destination ID, on top of the existing 6-byte overhead
from the source ID.
This commit changes the layout of UDP datagrams to include the 6-byte ID
(i.e. node name hash) of the node that crafted the packet at the very
beginning of the datagram (i.e. before the seqno). Note that this only
applies to SPTPS.
This is implemented at the lowest layer, i.e. in
handle_incoming_vpn_data() and send_sptps_data() functions. Source ID is
added and removed there, in such a way that the upper layers are unaware
of its presence.
This is the first stepping stone towards supporting UDP relaying in
SPTPS, by providing information about the original sender in the packet
itself. Nevertheless, even without relaying this commit already provides
a few benefits such as being able to reliably determine the source node
of a packet in the presence of an unknown source IP address, without
having to painfully go through all node keys. This makes tinc's behavior
much more scalable in this regard.
This change does not break anything with regard to the protocol: It
preserves compatibility with 1.0 and even with older pre-1.1 releases
thanks to a minor protocol version change (17.4). Source ID information
won't be included in packets sent to nodes with minor version < 4.
One drawback, however, is that this change increases SPTPS datagram
overhead by 6 bytes (the size of the source ID itself).
This introduces a new type of identifier for nodes, which complements
node names: node IDs. Node IDs are defined as the first 6 bytes of the
SHA-256 hash of the node name. They will be used in future code in lieu
of node names as unique node identifiers in contexts where space is at
a premium (such as VPN packets).
The semantics of node IDs is that they are supposed to be unique in a
tinc graph; i.e. two different nodes that are part of the same graph
should not have the same ID, otherwise things could break. This
solution provides this guarantee based on realistic probabilities:
indeed, according to the birthday problem, with a 48-bit hash, the
probability of at least one collision is 1e-13 with 10 nodes, 1e-11
with 100 nodes, 1e-9 with 1000 nodes and 1e-7 with 10000 nodes. Things
only start getting hairy with more than 1 million nodes, as the
probability gets over 0.2%.
The new local address based local discovery mechanism is technically
superior to the old broadcast-based one. In fact, the old algorithm
can technically make things worse by e.g. sending broadcasts over the
VPN itself and then selecting the VPN address as the node's UDP
address. This cannot happen with the new mechanism.
Note that this means old nodes that don't send their local addresses in
ADD_EDGE messages can't be discovered, because there is no address to
send discovery packets to. Old nodes can still discover new nodes by
sending them broadcasts, though.
tinc is using a separate thread to read from the TAP device on Windows.
The rationale was that the notification mechanism for packets arriving
on the virtual network device is based on Win32 events, and the event
loop did not support listening to these events.
Thanks to recent improvements, this event loop limitation has been
lifted. Therefore we can get rid of the separate thread and simply add
the Win32 "incoming packet" event to the event loop, just like a socket.
The result is cleaner code that's easier to reason about.
This adds a new DeviceStandby option; when it is disabled (the default),
behavior is unchanged. If it is enabled, tinc-up will not be called during
tinc initialization, but will instead be deferred until the first node is
reachable, and it will be closed as soon as no nodes are reachable.
This is useful because it means the device won't be set up until we are fairly
sure there is something listening on the other side. This is more user-friendly,
as one can check on the status of the tinc network connection just by checking
the status of the network interface. Besides, it prevents the OS from thinking
it is connected to some network when it is in fact completely isolated.
ListenAddress works the same as BindToAddress, except that from now on,
explicitly binding outgoing packets to the address of a socket is only done for
sockets specified with BindToAddress.
Tinc now strictly limits incoming connections from the same host to 1 per
second. For incoming connections from multiple hosts short bursts of incoming
connections are allowed (by default 100), but on average also only 1 connection
per second is allowed.
When an incoming connection exceeds the limit, tinc will keep the connection in
a tarpit; the connection will be kept open but it is ignored completely. Only
one connection is in a tarpit at a time to limit the number of useless open
connections.
When LocalDiscovery is enabled, tinc normally sends broadcast packets during
PMTU discovery to the broadcast address (255.255.255.255 or ff02::1). This
option lets tinc use a different address.
At the moment only one LocalDiscoveryAddress can be specified.
Normally all requests sent via the meta connections are checked so that they
cannot be larger than the input buffer. However, when packets are forwarded via
meta connections, they are copied into a packet buffer without checking whether
it fits into it. Since the packet buffer is allocated on the stack, this in
effect allows an authenticated remote node to cause a stack overflow.
This issue was found by Martin Schobert.
There are several reasons for this:
- MacOS/X doesn't support polling the tap device using kqueue, requiring a
workaround to fall back to select().
- On Windows only sockets are properly handled, therefore tinc uses a second
thread that does a blocking ReadFile() on the TAP-Win32/64 device. However,
this does not mix well with libevent.
- Libevent, event just the core, is quite large, and although it is easy to get
and install on many platforms, it can be a burden.
- Libev is more lightweight and seems technically superior, but it doesn't
abstract away all the platform differences (for example, async events are not
supported on Windows).
When set to a non-zero value, tinc will try to maintain exactly that number of
meta connections to other nodes. If there are not enough connections, it will
periodically try to set up an outgoing connection to a random node. If there
are too many connections, it will periodically try to remove an outgoing
connection.
Struct outgoing_ts and connection_ts were depending too much on each other,
causing lots of problems, especially the reuse of a connection_t. Now, whenever
a connection is closed it is immediately removed from the list of connections
and destroyed.
When two nodes which support SPTPS want to send packets to each other, they now
always use SPTPS. The node initiating the SPTPS session send the first SPTPS
packet via an extended REQ_KEY messages. All other handshake messages are sent
using ANS_KEY messages. This ensures that intermediate nodes using an older
version of tinc can still help with NAT traversal. After the authentication
phase is over, SPTPS packets are sent via UDP, or are encapsulated in extended
REQ_KEY messages instead of PACKET messages.
When the Proxy option is used, outgoing connections will be made via the
specified proxy. There is no support for authentication methods or for having
the proxy forward incoming connections, and there is no attempt to proxy UDP.
When the Name starts with a $, the rest will be interpreted as the name of an
environment variable containing the real Name. When Name is $HOST, but this
environment variable does not exist, gethostname() will be used to set the
Name. In both cases, illegal characters will be converted to underscores.
In this situation, the two nodes will start fighting over the edges they announced.
When we have to contradict both ADD_EDGE and DEL_EDGE messages, we log a warning,
and with 25% chance per PingTimeout we quit.
Since event.h is not part of tinc, we include it in have.h were all other
system header files are included. We also ensure -levent comes before -lgdi32
when compiling with MinGW, apparently it doesn't work when the order is
reversed.