Full functionality of tinc mesh relays on having at least one node,
accessible, with known address to which all other nodes must connect
in order to exchange information about other peers.
Sometimes, however, in smaller networks or if two or more peers are
located in the same LAN segment without access to any of the nodes with
known address, there is no way of establishing a functional mesh
without manually changing the configuration.
SLPD addresses this problem utilizing multicast groups and autoconnect.
- Node sends periodically simple message to multicast group
(default 224.0.42.23 port 1655) in this format:
"sLPD 0 1 nodename port publickey"
"0 1" is the "major minior" version of the protocol
- Node listens to the multicast group for messages on all interfaces:
- if the nodename is known and the publickey matches the
node's public key the source address of the packet
will be stored as learned ip address
- at this point setup_outgoing_connection() will be able to
choose the learned ip for connect
Configarion example:
* Roadwarriors: SLPDInterval = 30
* Router on your home network or in your hackerspace:
- It should broadcast only in the direction of the LAN thus you should
set SLPDInterface = eth0 and SLPDInterval = 10
* Defaults:
SLPDGroup = "224.0.42.23"
SLPDPort = 1655
SLPDInterval = 0 (means SLPD is disabled)
The check of the publickey is not implemented yet. IPv6 support
must be implemented. This is the first commit - highly experimental.
When tinc is used in router mode with a TAP device, Ethernet (MAC)
headers are not present in packets flowing over the VPN; it is the
node's responsibility to fill out this header before handing the
packet over to the TAP interface (which expects such headers).
Currently, tinc fills out the destination MAC address of the packet
(otherwise the host would not recognize the packets, and nothing would
work), but it does not fill out the source MAC address. In practice this
doesn't seem to cause any real issues (the host doesn't care about the
source address), but it does look weird when looking at the packets with
a sniffer, and it also result in the following valgrind warning:
==13651== Syscall param write(buf) points to uninitialised byte(s)
==13651== at 0x5C4B620: __write_nocancel (syscall-template.S:81)
==13651== by 0x1445AA: write_packet (device.c:183)
==13651== by 0x118C7C: send_packet (net_packet.c:1259)
==13651== by 0x12B70A: route_ipv4 (route.c:443)
==13651== by 0x12D5F8: route (route.c:971)
==13651== by 0x1152BC: receive_packet (net_packet.c:250)
==13651== by 0x117E1B: receive_sptps_record (net_packet.c:904)
==13651== by 0x1309A8: sptps_receive_data_datagram (sptps.c:488)
==13651== by 0x130A90: sptps_receive_data (sptps.c:508)
==13651== by 0x115569: receive_udppacket (net_packet.c:286)
==13651== by 0x119856: handle_incoming_vpn_data (net_packet.c:1499)
==13651== by 0x10F3DA: event_loop (event.c:287)
==13651== Address 0xffeffea3a is on thread 1's stack
==13651== in frame #6, created by receive_sptps_record (net_packet.c:821)
==13651==
This commit fixes the issue by filling out the source MAC address. It is
generated by negating the last byte of the device MAC address, which is
consistent with what route_arp() does.
In addition, this commit stops route_arp() from filling out the Ethernet
header of the packet - this is the responsibility of send_packet(), not
route().
==27135== Use of uninitialised value of size 8
==27135== at 0x57BE17B: BN_num_bits_word (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57BE205: BN_num_bits (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57BADF7: BN_div (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C48FC: BN_mod_inverse (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C3647: BN_BLINDING_create_param (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x5812D44: RSA_setup_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x58095CB: rsa_get_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x580A64F: RSA_eay_private_decrypt (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x4E5D9BC: rsa_private_decrypt (rsa.c:97)
==27135== by 0x4E51E1B: metakey_h (protocol_auth.c:524)
==27135== by 0x4E505FD: receive_request (protocol.c:136)
==27135== by 0x4E46002: receive_meta (meta.c:290)
==27135== Uninitialised value was created by a heap allocation
==27135== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27135== by 0x575DCD7: CRYPTO_malloc (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C24E1: BN_rand (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C216F: bn_rand_range (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C3630: BN_BLINDING_create_param (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x5812D44: RSA_setup_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x58095CB: rsa_get_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x580A64F: RSA_eay_private_decrypt (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x4E5D9BC: rsa_private_decrypt (rsa.c:97)
==27135== by 0x4E51E1B: metakey_h (protocol_auth.c:524)
==27135== by 0x4E505FD: receive_request (protocol.c:136)
==27135== by 0x4E46002: receive_meta (meta.c:290)
It is not unusual for tinc to receive SPTPS packets to be relayed to
nodes that just became unreachable, due to state propagation delays in
the metagraph.
Unfortunately, the current code doesn't handle that situation correctly,
and still tries to relay the packet to the unreachable node. This
typically ends up segfaulting.
This commit fixes the issue by checking for reachability before relaying
the packet.
timeout_handler() calls try_tx(c->node) when c->edge exists.
Unfortunately, the existence of c->edge is not enough to conclude that
the node is reachable.
In fact, during connection establishment, there is a short period of
time where we create an edge for the node at the other end of the
metaconnection, but we don't have one from the other side yet.
Unfortunately, if timeout_handler() runs during that short time
window, it will call try_tx() on an unreachable node, which makes
things explode because that function is not prepared to handle that
case.
A typical symptom of this race condition is a hard SEGFAULT while trying
to send packets using metaconnections that don't exist, due to
n->nexthop containing garbage.
This patch fixes the issue by making try_tx() check for reachability,
and then making all code paths use try_tx() instead of the more
specialized methods so that they go through the check.
This regression was introduced in
eb7a0db18e.
try_tx_sptps() gives up on UDP communication if the recipient doesn't
support relaying. This is too restrictive - we only need the other node
to support relaying if we actually want to relay through them. If the
packet is sent directly, it's fine to send it to an old pre-node-IDs
tinc-1.1 node.
Currently, tinc tries to parse node IDs for all SPTPS packets, including
ones sent from older, pre-node-IDs tinc-1.1 nodes, and therefore doesn't
recognize packets from these nodes. This commit fixes that.
It also makes code slightly clearer by reducing the amount of fiddling
around packet offset/length.
A condition in try_harder() is always evaluating to false when talking
to a SPTPS node because n->status.validkey_in is always false in that
case. Fix the condition so that the SPTPS status is correctly checked.
This prevented recent tinc-1.1 nodes from talking to older, pre-node-ID
tinc-1.1 nodes.
The regression was introduced in
6056f1c13b.
There are a number of ways a SPTPS tunnel can get into a corrupt state.
For example, during key regeneration, the KEX and SIG messages from
other nodes might arrive out of order, which confuses the hell out of
the SPTPS code. Another possible scenario is not noticing another node
crashed and restarted because there was no point in time where the node
was seen completely disconnected from *all* nodes; this could result in
using the wrong (old) key. There are probably other scenarios which have
not even been considered yet. Distributed systems are hard.
When SPTPS got confused by a packet, it used to crash the entire
process; fortunately that was fixed by commit
2e7f68ad2b. However, the error handling
(or lack thereof) leaves a lot to be desired. Currently, when SPTPS
encounters an error when receiving a packet, it just shrugs it off and
continues as if nothing happened. The problem is, sometimes getting
receive errors mean the tunnel is completely stuck and will not recover
on its own. In that case, the node will become unreachable - possibly
indefinitely.
The goal of this commit is to improve SPTPS error handling by taking
proactive action when an incoming packet triggers a failure, which is
often an indicator that the tunnel is stuck in some way. When that
happens, we simply restart SPTPS entirely, which should make the tunnel
recover quickly.
To prevent "storms" where two buggy nodes flood each other with invalid
packets and therefore spend all their time negotiating new tunnels, we
limit the frequency at which tunnel restarts happen to ten seconds.
It is likely this commit will solve the "Invalid KEX record length
during key regeneration" issue that has been seen in the wild. It is
difficult to be sure though because we do not have a full understanding
of all the possible conditions that can trigger this problem.
Commit 10c1f60c64 introduced a mechanism
by which a packet received by REQ_KEY could continue its journey over
UDP. This was based on the assumption that REQ_KEY messages would never
be used for handshake packets (which should never be sent over UDP,
because SPTPS currently doesn't handle lost handshake packets very
well).
Unfortunately, there is one case where handshake packets are sent using
REQ_KEY: when regenerating the SPTPS key for a pre-established channel.
With the current code, such packets risk getting relayed over UDP.
When processing a REQ_KEY message, it is impossible for the receiving
end to distinguish between a data SPTPS packet and a handshake packet,
because this information is stored in the type field which is encrypted
with the end-to-end key.
This commit fixes the issue by making tinc use ANS_KEY for all SPTPS
handshake messages. This works because ANS_KEY messages are never
forwarded using the SPTPS relay mechanisms, therefore they are
guaranteed to stick to TCP.
Currently, SPTPS packets are transported over TCP metaconnections using
extended REQ_KEY requests, in order for the packets to pass through
tinc-1.0 nodes unaltered. Unfortunately, this method presents two
significant downsides:
- An already encrypted SPTPS packet is decrypted and then encrypted
again every time it passes through a node, since it is transported
over the SPTPS channels of the metaconnections. This
double-encryption is unnecessary and wastes CPU cycles.
- More importantly, the only way to transport binary data over
standard metaconnection messages such as REQ_KEY is to encode it
in base64, which has a 33% encoding overhead. This wastes 25% of the
network bandwidth.
This commit introduces a new protocol message, SPTPS_PACKET, which can
be used to transport SPTPS packets over a TCP metaconnection in an
efficient way. The new message is appropriately protected through a
minor protocol version increment, and extended REQ_KEY messages are
still used with nodes that do not support the new message, as well as
for the intial handshake packets, for which efficiency is not a concern.
The way SPTPS_PACKET works is very similar to how the traditional PACKET
message works: after the SPTPS_PACKET message, the raw binary packet is
sent directly over the metaconnection. There is one important
difference, however: in the case of SPTPS_PACKET, the packet is sent
directly over the TCP stream completely bypassing the SPTPS channel of
the metaconnection itself for maximum efficiency. This is secure because
the SPTPS packet that is being sent is already encrypted with an
end-to-end key.
REQ_SPTPS implies the message has an ANS_ counterpart (like REQ_KEY,
ANS_KEY), but it doesn't. Therefore dropping the REQ_ seems more
appropriate, and we add a _PACKET suffix to reduce the likelihood of
naming conflicts.
net_packet doesn't actually use send_sptps_data(); it only uses
send_sptps_data_priv(). In addition, the only user of send_sptps_data()
is protocol_key. Therefore it makes sense to expose
send_sptps_data_priv() directly, and move send_sptps_data() (which is
basically just boilerplate) as a local function in protocol_key.
Currently, when relaying SPTPS UDP packets, the code uses the direct
sender as the originator, instead of preserving the original source ID.
This wouldn't cause any issues in most cases because the originator and
the sender are the same in simple one-hop relay chains, but this will
break as soon as there is more than one relay.