The SPTPS code doesn't know about nodes, so when it logs an error about
a bad packet, it doesn't log which node it came from. So add a log
message with the node's name and hostname in receive_udppacket().
Currently, when tinc sends UDP SPTPS datagrams through a relay, it
doesn't automatically start discovering PMTU with the relay. This means
that unless something else triggers PMTU discovery, tinc will keep using
TCP when sending packets through the relay.
This patches fixes the issue by explicitly establishing UDP tunnels with
relays.
This commit changes the layout of UDP datagrams to include a 6-byte
destination node ID at the very beginning of the datagram (i.e. before
the source node ID and the seqno). Note that this only applies to SPTPS.
Thanks to this new field, it is now possible to send SPTPS datagrams to
nodes that are not the final recipient of the packets, thereby using
these nodes as relay nodes. Previously SPTPS was unable to relay packets
using UDP, and required a fallback to TCP if the final recipient could
not be contacted directly using UDP. In that sense it fixes a regression
that SPTPS introduced with regard to the legacy protocol.
This change also updates tinc's low-level routing logic (i.e.
send_sptps_data()) to automatically use this relaying facility if at all
possible. Specifically, it will relay packets if we don't have a
confirmed UDP link to the final recipient (but we have one with the next
hop node), or if IndirectData is specified. This is similar to how the
legacy protocol forwards packets.
When sending packets directly without any relaying, the sender node uses
a special value for the destination node ID: instead of setting the
field to the ID of the recipient node, it writes a zero ID instead. This
allows the recipient node to distinguish between a relayed packet and a
direct packet, which is important when determining the UDP address of
the sending node.
On the relay side, relay nodes will happily relay packets that have a
destination ID which is non-zero *and* is different from their own,
provided that the source IP address of the packet is known. This is to
prevent abuse by random strangers, since a node can't authenticate the
packets that are being relayed through it.
This change keeps the protocol number from the previous datagram format
change (source IDs), 17.4. Compatibility is still preserved with 1.0 and
with pre-1.1 releases. Note, however, that nodes running this code won't
understand datagrams sent from nodes that only use source IDs and
vice-versa (not that we really care).
There is one caveat: in the current state, there is no way for the
original sender to know what the PMTU is beyond the first hop, and
contrary to the legacy protocol, relay nodes can't apply MSS clamping
because they can't decrypt the relayed packets. This leads to
inefficient scenarios where a reduced PMTU over some link that's part of
the relay path will result in relays falling back to TCP to send packets
to their final destinations.
Another caveat is that once a packet gets sent over TCP, it will use
TCP over the entire path, even if it is technically possible to use UDP
beyond the TCP-only link(s).
Arguably, these two caveats can be fixed by improving the
metaconnection protocol, but that's out of scope for this change. TODOs
are added instead. In any case, this is no worse than before.
In addition, this change increases SPTPS datagram overhead by another
6 bytes for the destination ID, on top of the existing 6-byte overhead
from the source ID.
This commit changes the layout of UDP datagrams to include the 6-byte ID
(i.e. node name hash) of the node that crafted the packet at the very
beginning of the datagram (i.e. before the seqno). Note that this only
applies to SPTPS.
This is implemented at the lowest layer, i.e. in
handle_incoming_vpn_data() and send_sptps_data() functions. Source ID is
added and removed there, in such a way that the upper layers are unaware
of its presence.
This is the first stepping stone towards supporting UDP relaying in
SPTPS, by providing information about the original sender in the packet
itself. Nevertheless, even without relaying this commit already provides
a few benefits such as being able to reliably determine the source node
of a packet in the presence of an unknown source IP address, without
having to painfully go through all node keys. This makes tinc's behavior
much more scalable in this regard.
This change does not break anything with regard to the protocol: It
preserves compatibility with 1.0 and even with older pre-1.1 releases
thanks to a minor protocol version change (17.4). Source ID information
won't be included in packets sent to nodes with minor version < 4.
One drawback, however, is that this change increases SPTPS datagram
overhead by 6 bytes (the size of the source ID itself).
Currently tinc only uses type 2 MTU probe replies if the recipient uses
protocol version 17.3. It should of course support any higher minor
protocol version as well.
This fixes the following compiler warning when building for Windows:
net_packet.c: In function ‘send_udppacket’:
net_packet.c:633:6: error: unused variable ‘origpriority’ [-Werror=unused-variable]
int origpriority = origpkt->priority;
^
The only places where connection_t::status.active is modified is in
ack_h() and terminate_connection(). In both cases, connection_t::edge
is added and removed at the same time, and that's the only places
connection_t::edge is set. Therefore, the following is true at all
times:
!c->status.active == !c->edge
This commit removes the redundant state information by getting rid of
connection_t::status.active, and using connection_t::edge instead.
in receive_udppacket(), we initialize outpkt to a default value but the
value is never read anywhere, as every read is preceded by a write.
This issue was found by the clang static analyzer tool:
http://clang-analyzer.llvm.org/
If choose_local_address() is unable to find a local address (e.g.
because of old nodes that don't send their local address information),
then send_sptps_data() ends up using uninitialized variables for the
socket and address.
This regression was introduced in
4159108971. The commit took care of
handling that case in send_udppacket() but was missing the same fix
for send_sptps_data().
This bug was found by the clang static analyzer tool:
http://clang-analyzer.llvm.org/
Recent improvements to the local discovery mechanism makes it cheaper,
more network-friendly, and now it cannot make things worse (as opposed
to the old mechanism). Thus there is no reason not to enable it by
default.
The new local address based local discovery mechanism is technically
superior to the old broadcast-based one. In fact, the old algorithm
can technically make things worse by e.g. sending broadcasts over the
VPN itself and then selecting the VPN address as the node's UDP
address. This cannot happen with the new mechanism.
Note that this means old nodes that don't send their local addresses in
ADD_EDGE messages can't be discovered, because there is no address to
send discovery packets to. Old nodes can still discover new nodes by
sending them broadcasts, though.
This introduces a new way of doing local discovery: when tinc has
local address information for the recipient node, it will send local
discovery packets directly to the local address of that node, instead
of using broadcast packets.
This new way of doing local discovery provides numerous advantages compared to
using broadcasts:
- No broadcast packets "polluting" the local network;
- Reliable even if the sending host has multiple network interfaces (in
contrast, broadcasts will only be sent through one unpredictable
interface)
- Works even if the two hosts are not on the same broadcast domain. One
example is a large LAN where the two hosts might be on different local
subnets. In fact, thanks to UDP hole punching this might even work if
there is a NAT sitting in the middle of the LAN between the two nodes!
- Sometimes a node is reachable through its "normal" address, and via a
local subnet as well. One might think the local subnet is the best route
to the node in this case, but more often than not it's actually worse -
one example is where the local segment is a third party VPN running in
parallel, or ironically it can be the local segment formed by the tinc
VPN itself! Because this new algorithm only checks the addresses for
which an edge is already established, it is less likely to fall into
these traps.
When using socket functions, "sockerrno" is supposed to be used to
retrieve the error code as opposed to "errno", so that it is translated
to the correct call on Windows (WSAGetLastError() - Windows does not
update errno on socket errors). Unfortunately, the use of sockerrno is
inconsistent throughout the tinc codebase, as errno is often used
incorrectly on socket-related calls.
This commit fixes these oversights, which improves socket error
handling on Windows.
In send_sptps_data(), the len variable contains the length of the whole
datagram that needs to be sent to the peer, including the overhead from SPTPS
itself.
It seems like a lot of overhead to call access() for every possible extension
defined in PATHEXT, but apparently this is what Windows does itself too. At
least this avoids calling system() when the script one is looking for does not
exist at all.
Since the tinc utility also needs to call scripts, execute_script() is now
split off into its own source file.
This patch adds timestamp information to type 2 MTU probe replies. This
timestamp can then be used by the recipient to estimate bandwidth more
accurately, as jitter in the RX direction won't affect the results.
When replying to a PMTU probe, tinc sends a packet with the same length
as the PMTU probe itself, which is usually large (~1450 bytes). This is
not necessary: the other node wants to know the size of the PMTU probes
that have been received, but encoding this information as the actual
reply length is probably the most inefficient way to do it. It doubles
the bandwidth usage of the PMTU discovery process, and makes it less
reliable since large packets are more likely to be dropped.
This patch introduces a new PMTU probe reply type, encoded as type "2"
in the first byte of the packet, that indicates that the length of the
PMTU probe that is being replied to is encoded in the next two bytes of
the packet. Thus reply packets are only 3 bytes long.
(This also protects against very broken networks that drop very small
packets - yes, I've seen it happen on a subnet of a national ISP - in
such a case the PMTU probe replies will be dropped, and tinc won't
enable UDP communication, which is a good thing.)
Because legacy nodes won't understand type 2 probe replies, the minor
protocol number is bumped to 3.
Note that this also improves bandwidth estimation, as it is able to
measure bandwidth in both directions independently (the node receiving
the replies is measuring in the TX direction) and the use of smaller
reply packets might decrease the influence of jitter.
When LocalDiscovery is enabled, tinc normally sends broadcast packets during
PMTU discovery to the broadcast address (255.255.255.255 or ff02::1). This
option lets tinc use a different address.
At the moment only one LocalDiscoveryAddress can be specified.
This gets rid of the rest of the symbolic links. However, as a consequence, the
crypto header files have now moved to src/, and can no longer contain
library-specific declarations. Therefore, cipher_t, digest_t, ecdh_t, ecdsa_t
and rsa_t are now all opaque types, and only pointers to those types can be
used.
Normally all requests sent via the meta connections are checked so that they
cannot be larger than the input buffer. However, when packets are forwarded via
meta connections, they are copied into a packet buffer without checking whether
it fits into it. Since the packet buffer is allocated on the stack, this in
effect allows an authenticated remote node to cause a stack overflow.
This issue was found by Martin Schobert.
Tinc never restarts PMTU discovery unless a node becomes unreachable. However,
it can be that the PMTU was very low during the initial discovery, but has
increased later. To detect this, tinc now tries to send an extra packet every
PingInterval, with a size slightly higher than the currently known PMTU. If
this packet is succesfully received back, we partially restart PMTU discovery
to find out the new maximum.
Conflicts:
src/net_packet.c
Without adding any extra traffic, we can measure round trip times, estimate the
bandwidth and packet loss between nodes. The RTT and bandwidth can be measured
by timing the MTU probe packets. The RTT is the difference between the time a
burst of MTU probes was sent and when the first reply is received. The
bandwidth can be estimated by multiplying the size of the probe packets by the
time between succesive received probe replies of the same burst. The packet
loss can be estimated for incoming traffic by comparing how many packets have
actually been received to the increase in the sequence numbers.
The estimates are not perfect. Especially bandwidth is difficult to measure,
the only accurate way is to continuously send as much data as possible, but
that is obviously not desirable. The packet loss rate is also almost always
a few percent when sending a lot of data over the VPN via TCP, since TCP
*needs* packet loss to work properly.
Keep track of the number of correct, non-replayed UDP packets that have been
received, regardless of their content. This can be compared to the sequence
number to determine the real packet loss.
There are several reasons for this:
- MacOS/X doesn't support polling the tap device using kqueue, requiring a
workaround to fall back to select().
- On Windows only sockets are properly handled, therefore tinc uses a second
thread that does a blocking ReadFile() on the TAP-Win32/64 device. However,
this does not mix well with libevent.
- Libevent, event just the core, is quite large, and although it is easy to get
and install on many platforms, it can be a burden.
- Libev is more lightweight and seems technically superior, but it doesn't
abstract away all the platform differences (for example, async events are not
supported on Windows).
We don't need to search the whole edge tree, we can use the node's own edge
tree since each edge has a pointer to its reverse. Also, we do need to make
sure we try the reflexive address often.
Before it would always use the first socket, and always send an IPv4 broadcast packet. That
works fine in a lot of situations, but it is better to try all sockets, and to send IPv6 packets
on IPv6 sockets. This is especially important for users that are on IPv6-only networks or that
have multiple physical network interfaces, although in the latter case it probably requires
them to use the ListenAddress variable to create a separate socket for each interface.
Only the very first packet of an SPTPS session should be send with REQ_KEY,
this signals the peer to abort any previous session and start a new one as
well.
The tree functions were never used on the connection_tree, a list is more appropriate.
Also be more paranoid about connections disappearing while traversing the list.
Similar to old style key exchange requests, keep track of whether a key
exchange is already in progress and how long it took. If no key is known yet
or if key exchange takes too long, (re)start a new key exchange.
When two nodes which support SPTPS want to send packets to each other, they now
always use SPTPS. The node initiating the SPTPS session send the first SPTPS
packet via an extended REQ_KEY messages. All other handshake messages are sent
using ANS_KEY messages. This ensures that intermediate nodes using an older
version of tinc can still help with NAT traversal. After the authentication
phase is over, SPTPS packets are sent via UDP, or are encapsulated in extended
REQ_KEY messages instead of PACKET messages.
When the "Broadcast = direct" option is used, broadcast packets are not sent
and forwarded via the Minimum Spanning Tree to all nodes, but are sent directly
to all nodes that can be reached in one hop.
One use for this is to allow running ad-hoc routing protocols, such as OLSR, on
top of tinc.
This allows tincctl to receive log messages from a running tincd,
independent of what is logged to syslog or to file. Tincctl can receive
debug messages with an arbitrary level.
Apart from the platform specific tun/tap driver, link with the dummy and
raw_socket devices, and optionally with support for UML and VDE devices.
At runtime, the DeviceType option can be used to select which driver to
use.
Probably due to a merge, the try_harder() function had duplicated the
rate-limiting code for detecting the sender node based on the HMAC of the
packet. This prevented this detection from running at all. The function is now
identical again to that in the 1.0 branch.
Because we don't want to keep track of that, and this will cause the node
structure from being relinked into the node tree, which results in myself
pointing to an invalid address.
When a UDP packet was received with an unknown source address/port, and if it
failed a HMAC check against known keys, it could still incorrectly assign that
UDP address to another node. This would temporarily cause outgoing UDP packets
to go to the wrong destination address, until packets from the correct address
were received again.
Before, if MTU probes failed, tinc would stop sending probes until the next
time keys were regenerated (by default, once every hour). Now it continues to
send them every PingInterval, so it recovers faster from temporary failures.
When we got a key request for or from a node we don't know, we disconnected the
node that forwarded us that request. However, especially in TunnelServer mode,
disconnecting does not help. We now ignore such requests, but since there is no
way of telling the original sender that the request was dropped, we now retry
sending REQ_KEY requests when we don't get an ANS_KEY back.
This wasn't working at all, since we didn't do HMAC but just a plain hash.
Also, verification of packets failed because it was checking the whole packet,
not the packet minus the HMAC.
If MTU probing discovered a node was not reachable via UDP, packets for it were
forwarded to the next hop, but always via TCP, even if the next hop was
reachable via UDP. This is now fixed by retrying to send the packet using
send_packet() if the destination is not the same as the nexthop.
This keeps NAT mappings for UDP alive, and will also detect when a node is not
reachable via UDP anymore or if the path MTU is decreasing. Tinc will fall back
to TCP if the node has become unreachable.
If UDP communication is impossible, we stop sending probes, but we retry if it
changes its keys.
We also decouple the UDP and TCP ping mechanisms completely, to ensure tinc
properly detects failure of either method.
This feature is not necessary anymore since we have tools like valgrind today
that can catch stack overflow errors before they make a backtrace in gdb
impossible.
During the path MTU discovery phase, we might not know the maximum MTU yet, but
we do know a safe minimum. If we encounter a packet that is larger than that
the minimum, we now send it via TCP instead to ensure it arrives. We also
allow large packets that we cannot fragment or create ICMP replies for to be
sent via TCP.
We used both rand() and random() in our code. Since it returns an int, we have
to use %x in our format strings instead of %lx. This fixes a crash under
Windows when cross-compiling tinc with a recent version of MinGW.
Although we select() before we call recvfrom(), it sometimes happens that
select() tells us we can read but a subsequent read fails anyway. This is
harmless.
If there is an outstanding MTU probe event for a node which is not reachable
anymore, a UDP packet would be sent to that node, which caused a key request to
be sent to that node, which triggered a NULL pointer dereference. Probes and
other UDP packets to unreachable nodes are now dropped.
First of all, the idea behind the TunnelServer option is to hide all other
nodes from each other, so we shouldn't forward broadcast packets from them
anyway. The other reason is that since edges from other nodes are ignored, the
calculated minimum spanning tree might not be correct, which can result in
routing loops.
Since compression can either grow or shrink a packet, the size of an MTU probe
after decompression might not reflect the real path MTU. Now we use the size
before decompression, which is independent of the compression algorithm, and
substract a safety margin such that the calculated path MTU will be safe even
for packets which grow as much as possible after compression.
Instead of a single, global decryption context, each node has its own context.
However, in send_ans_key(), the global context was initialised. This commit
fixes that and removes the global context completely.
Also only set status.validkey after all checks have been evaluated.
Previously, tinc used a fixed address and port for each node for UDP packet
exchange. The port was the one advertised by that node as its listening port.
However, due to NAT the port might be different. Now, tinc sends a different
session key to each node. This way, the sending node can be determined from
incoming packets by checking the MAC against all session keys. If a match is
found, the address and port for that node are updated.
When no session key is known for a node, or when it is doing PMTU discovery but
no MTU probes have returned yet, packets are sent via TCP. Some logic is added
to make sure intermediate nodes continue forwarding via TCP. The per-node
packet queue is now no longer necessary and has been removed.
(The new code is still segfaulting for me, and I'd like to proceed with other
work.)
This largely rolls back to the revision 1545 state of the existing code
(new crypto layer is still there with no callers), though I reintroduced
the segfault fix of revision 1562.