After first tests it came out, that RoadWarriors with multiple
active Interfaces hat problems with receiving on/ and sending SLPD
packets to specific interfaces.
Here the solution:
- Define SLPDInterface in your tinc.conf (multiple definitions are allowed)
On those interfaces tincd will send and receive SLPD packetes
- You have to have IPv6 support on - link-local addresses configured
- tincd must listen on IPv6 on your SLPDInterfaces
- Define SLPDGroup to something like ff02::42:1
- Define SLPDPort for your group
- Define SLPDInterval to some sane number of seconds (0 is default,
meaning SLPD is disabled, 30 seconds should be enough for average
users)
SLPDGroup and SLPDPort should be unique for your network.
Fingerprinting, message signing is yet to be implemented.
Discovered address should also expire periodically.
Full functionality of tinc mesh relays on having at least one node,
accessible, with known address to which all other nodes must connect
in order to exchange information about other peers.
Sometimes, however, in smaller networks or if two or more peers are
located in the same LAN segment without access to any of the nodes with
known address, there is no way of establishing a functional mesh
without manually changing the configuration.
SLPD addresses this problem utilizing multicast groups and autoconnect.
- Node sends periodically simple message to multicast group
(default 224.0.42.23 port 1655) in this format:
"sLPD 0 1 nodename port publickey"
"0 1" is the "major minior" version of the protocol
- Node listens to the multicast group for messages on all interfaces:
- if the nodename is known and the publickey matches the
node's public key the source address of the packet
will be stored as learned ip address
- at this point setup_outgoing_connection() will be able to
choose the learned ip for connect
Configarion example:
* Roadwarriors: SLPDInterval = 30
* Router on your home network or in your hackerspace:
- It should broadcast only in the direction of the LAN thus you should
set SLPDInterface = eth0 and SLPDInterval = 10
* Defaults:
SLPDGroup = "224.0.42.23"
SLPDPort = 1655
SLPDInterval = 0 (means SLPD is disabled)
The check of the publickey is not implemented yet. IPv6 support
must be implemented. This is the first commit - highly experimental.
When tinc is used in router mode with a TAP device, Ethernet (MAC)
headers are not present in packets flowing over the VPN; it is the
node's responsibility to fill out this header before handing the
packet over to the TAP interface (which expects such headers).
Currently, tinc fills out the destination MAC address of the packet
(otherwise the host would not recognize the packets, and nothing would
work), but it does not fill out the source MAC address. In practice this
doesn't seem to cause any real issues (the host doesn't care about the
source address), but it does look weird when looking at the packets with
a sniffer, and it also result in the following valgrind warning:
==13651== Syscall param write(buf) points to uninitialised byte(s)
==13651== at 0x5C4B620: __write_nocancel (syscall-template.S:81)
==13651== by 0x1445AA: write_packet (device.c:183)
==13651== by 0x118C7C: send_packet (net_packet.c:1259)
==13651== by 0x12B70A: route_ipv4 (route.c:443)
==13651== by 0x12D5F8: route (route.c:971)
==13651== by 0x1152BC: receive_packet (net_packet.c:250)
==13651== by 0x117E1B: receive_sptps_record (net_packet.c:904)
==13651== by 0x1309A8: sptps_receive_data_datagram (sptps.c:488)
==13651== by 0x130A90: sptps_receive_data (sptps.c:508)
==13651== by 0x115569: receive_udppacket (net_packet.c:286)
==13651== by 0x119856: handle_incoming_vpn_data (net_packet.c:1499)
==13651== by 0x10F3DA: event_loop (event.c:287)
==13651== Address 0xffeffea3a is on thread 1's stack
==13651== in frame #6, created by receive_sptps_record (net_packet.c:821)
==13651==
This commit fixes the issue by filling out the source MAC address. It is
generated by negating the last byte of the device MAC address, which is
consistent with what route_arp() does.
In addition, this commit stops route_arp() from filling out the Ethernet
header of the packet - this is the responsibility of send_packet(), not
route().
==27135== Use of uninitialised value of size 8
==27135== at 0x57BE17B: BN_num_bits_word (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57BE205: BN_num_bits (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57BADF7: BN_div (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C48FC: BN_mod_inverse (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C3647: BN_BLINDING_create_param (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x5812D44: RSA_setup_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x58095CB: rsa_get_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x580A64F: RSA_eay_private_decrypt (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x4E5D9BC: rsa_private_decrypt (rsa.c:97)
==27135== by 0x4E51E1B: metakey_h (protocol_auth.c:524)
==27135== by 0x4E505FD: receive_request (protocol.c:136)
==27135== by 0x4E46002: receive_meta (meta.c:290)
==27135== Uninitialised value was created by a heap allocation
==27135== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27135== by 0x575DCD7: CRYPTO_malloc (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C24E1: BN_rand (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C216F: bn_rand_range (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C3630: BN_BLINDING_create_param (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x5812D44: RSA_setup_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x58095CB: rsa_get_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x580A64F: RSA_eay_private_decrypt (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x4E5D9BC: rsa_private_decrypt (rsa.c:97)
==27135== by 0x4E51E1B: metakey_h (protocol_auth.c:524)
==27135== by 0x4E505FD: receive_request (protocol.c:136)
==27135== by 0x4E46002: receive_meta (meta.c:290)
It is not unusual for tinc to receive SPTPS packets to be relayed to
nodes that just became unreachable, due to state propagation delays in
the metagraph.
Unfortunately, the current code doesn't handle that situation correctly,
and still tries to relay the packet to the unreachable node. This
typically ends up segfaulting.
This commit fixes the issue by checking for reachability before relaying
the packet.
timeout_handler() calls try_tx(c->node) when c->edge exists.
Unfortunately, the existence of c->edge is not enough to conclude that
the node is reachable.
In fact, during connection establishment, there is a short period of
time where we create an edge for the node at the other end of the
metaconnection, but we don't have one from the other side yet.
Unfortunately, if timeout_handler() runs during that short time
window, it will call try_tx() on an unreachable node, which makes
things explode because that function is not prepared to handle that
case.
A typical symptom of this race condition is a hard SEGFAULT while trying
to send packets using metaconnections that don't exist, due to
n->nexthop containing garbage.
This patch fixes the issue by making try_tx() check for reachability,
and then making all code paths use try_tx() instead of the more
specialized methods so that they go through the check.
This regression was introduced in
eb7a0db18e.
try_tx_sptps() gives up on UDP communication if the recipient doesn't
support relaying. This is too restrictive - we only need the other node
to support relaying if we actually want to relay through them. If the
packet is sent directly, it's fine to send it to an old pre-node-IDs
tinc-1.1 node.
Currently, tinc tries to parse node IDs for all SPTPS packets, including
ones sent from older, pre-node-IDs tinc-1.1 nodes, and therefore doesn't
recognize packets from these nodes. This commit fixes that.
It also makes code slightly clearer by reducing the amount of fiddling
around packet offset/length.
A condition in try_harder() is always evaluating to false when talking
to a SPTPS node because n->status.validkey_in is always false in that
case. Fix the condition so that the SPTPS status is correctly checked.
This prevented recent tinc-1.1 nodes from talking to older, pre-node-ID
tinc-1.1 nodes.
The regression was introduced in
6056f1c13b.
There are a number of ways a SPTPS tunnel can get into a corrupt state.
For example, during key regeneration, the KEX and SIG messages from
other nodes might arrive out of order, which confuses the hell out of
the SPTPS code. Another possible scenario is not noticing another node
crashed and restarted because there was no point in time where the node
was seen completely disconnected from *all* nodes; this could result in
using the wrong (old) key. There are probably other scenarios which have
not even been considered yet. Distributed systems are hard.
When SPTPS got confused by a packet, it used to crash the entire
process; fortunately that was fixed by commit
2e7f68ad2b. However, the error handling
(or lack thereof) leaves a lot to be desired. Currently, when SPTPS
encounters an error when receiving a packet, it just shrugs it off and
continues as if nothing happened. The problem is, sometimes getting
receive errors mean the tunnel is completely stuck and will not recover
on its own. In that case, the node will become unreachable - possibly
indefinitely.
The goal of this commit is to improve SPTPS error handling by taking
proactive action when an incoming packet triggers a failure, which is
often an indicator that the tunnel is stuck in some way. When that
happens, we simply restart SPTPS entirely, which should make the tunnel
recover quickly.
To prevent "storms" where two buggy nodes flood each other with invalid
packets and therefore spend all their time negotiating new tunnels, we
limit the frequency at which tunnel restarts happen to ten seconds.
It is likely this commit will solve the "Invalid KEX record length
during key regeneration" issue that has been seen in the wild. It is
difficult to be sure though because we do not have a full understanding
of all the possible conditions that can trigger this problem.
Commit 10c1f60c64 introduced a mechanism
by which a packet received by REQ_KEY could continue its journey over
UDP. This was based on the assumption that REQ_KEY messages would never
be used for handshake packets (which should never be sent over UDP,
because SPTPS currently doesn't handle lost handshake packets very
well).
Unfortunately, there is one case where handshake packets are sent using
REQ_KEY: when regenerating the SPTPS key for a pre-established channel.
With the current code, such packets risk getting relayed over UDP.
When processing a REQ_KEY message, it is impossible for the receiving
end to distinguish between a data SPTPS packet and a handshake packet,
because this information is stored in the type field which is encrypted
with the end-to-end key.
This commit fixes the issue by making tinc use ANS_KEY for all SPTPS
handshake messages. This works because ANS_KEY messages are never
forwarded using the SPTPS relay mechanisms, therefore they are
guaranteed to stick to TCP.