Commit graph

147 commits

Author SHA1 Message Date
thorkill
2ec9f1124d Merged with guus/1.1 2015-11-24 17:01:11 +01:00
Etienne Dechamps
613d586afd Don't unset validkey when receiving SPTPS handshakes over ANS_KEY.
This fixes a hairy race condition that was introduced in
1e89a63f16, which changed
the underlying transport of handshake packets from REQ_KEY to ANS_KEY.
Unfortunately, what I missed in that commit is, on the receiving side,
there is a slight difference between req_key_h() and ans_key_h():
indeed, the latter resets validkey to false.

The reason why this is not a problem during typical operation is
because the normal SPTPS key regeneration procedure looks like this:

    KEX ->
    <- KEX
    SIG ->
    <- SIG

All these messages are sent over ANS_KEY, therefore the receiving side
will unset validkey. However, that's typically not a problem in practice
because upon reception of the last message (SIG), SPTPS will call
sptps_receive_record(), which will set validkey to true again, and
everything works out fine in the end.

However, that was the *typical* scenario. Now let's assume that the
SPTPS channel is in active use at the same time key regeneration
happens. Specifically, let's assume a normal VPN data packet sneaks in
during the key regeneration procedure:

    KEX ->
    <- KEX
    <- (SPTPS packet, over TCP or UDP)
    <- KEX (wtf?)
    SIG -> (refused with Invalid packet seqno: XXX != 0)

At this point, both nodes are extremely confused and the SPTPS channel
becomes unusable with various errors being thrown on both sides. The
channel will stay down until automatic SPTPS channel restart kicks in
after 10 seconds.

(Note: the above is just an example - the race can occur on either side
whenever a packet is sent during the period of time between KEX and SIG
messages are received by the node sending the packet.)

I've seen this race occur in the wild - it is very likely to occur if
key regeneration occurs on a heavily loaded channel. It can be
reproduced fairly easily by setting KeyExpire to a short value (a few
seconds) and then running something like ping -f foobar -i 0.01.

The reason why this occurs is because tinc's TX code path triggers the
following:

 - send_packet()
 - try_tx()
 - try_tx_sptps()
 - validkey is false because we just received an ANS_KEY message
 - waitingforkey is false because it's not used for key regeneration
 - send_req_key()
 - SPTPS channel restart (sptps_stop(), sptps_start()).

Obviously, it all goes downhill from there and the two nodes get very
confused quickly (for example the seqno gets reset, hence the error
messages).

This commit fixes the issue by keeping validkey set when SPTPS data is
received over ANS_KEY messages.
2015-11-22 17:53:52 +00:00
thorkill
f1a9a40c90 Marked all unsued parameters found by -Werror=unused-parameter with UNUSED() 2015-07-02 18:37:08 +02:00
thorkill
dca3558d05 Leave a notice in the log when aborting 2015-07-01 19:01:42 +02:00
thorkill
fe99eb02df Revert "Still hunting down uninitialized variables"
This reverts commit 46b9578cad.
2015-06-30 18:08:31 +02:00
thorkill
46b9578cad Still hunting down uninitialized variables 2015-06-30 02:04:16 +02:00
thorkill
23eff91634 resolved conflict 2015-05-17 23:13:43 +02:00
Guus Sliepen
5c32bd1578 Merge remote-tracking branches 'dechamps/sptpsrestart' and 'dechamps/keychanged' into 1.1 2015-05-17 21:07:45 +02:00
Etienne Dechamps
2cb216d83d Don't send KEY_CHANGED messages if we don't support the legacy protocol.
KEY_CHANGED messages are only useful to invalidate keys for non-SPTPS nodes;
SPTPS nodes use a different internal mechanism (forced KEX) for that purpose.
Therefore, if we know we can't talk to legacy nodes, there's no point in
sending them these messages.
2015-05-17 19:27:20 +01:00
Etienne Dechamps
1a7a9078c0 Proactively restart the SPTPS tunnel if we get receive errors.
There are a number of ways a SPTPS tunnel can get into a corrupt state.
For example, during key regeneration, the KEX and SIG messages from
other nodes might arrive out of order, which confuses the hell out of
the SPTPS code. Another possible scenario is not noticing another node
crashed and restarted because there was no point in time where the node
was seen completely disconnected from *all* nodes; this could result in
using the wrong (old) key. There are probably other scenarios which have
not even been considered yet. Distributed systems are hard.

When SPTPS got confused by a packet, it used to crash the entire
process; fortunately that was fixed by commit
2e7f68ad2b. However, the error handling
(or lack thereof) leaves a lot to be desired. Currently, when SPTPS
encounters an error when receiving a packet, it just shrugs it off and
continues as if nothing happened. The problem is, sometimes getting
receive errors mean the tunnel is completely stuck and will not recover
on its own. In that case, the node will become unreachable - possibly
indefinitely.

The goal of this commit is to improve SPTPS error handling by taking
proactive action when an incoming packet triggers a failure, which is
often an indicator that the tunnel is stuck in some way. When that
happens, we simply restart SPTPS entirely, which should make the tunnel
recover quickly.

To prevent "storms" where two buggy nodes flood each other with invalid
packets and therefore spend all their time negotiating new tunnels, we
limit the frequency at which tunnel restarts happen to ten seconds.

It is likely this commit will solve the "Invalid KEX record length
during key regeneration" issue that has been seen in the wild. It is
difficult to be sure though because we do not have a full understanding
of all the possible conditions that can trigger this problem.
2015-05-17 19:21:50 +01:00
thorkill
35af740537 Merge branch '1.1' of github.com:gsliepen/tinc into thkr-1.1-ponyhof 2015-05-12 17:28:29 +02:00
Etienne Dechamps
de14308840 Rename REQ_SPTPS to SPTPS_PACKET.
REQ_SPTPS implies the message has an ANS_ counterpart (like REQ_KEY,
ANS_KEY), but it doesn't. Therefore dropping the REQ_ seems more
appropriate, and we add a _PACKET suffix to reduce the likelihood of
naming conflicts.
2015-05-10 21:08:57 +01:00
Etienne Dechamps
10c1f60c64 Try to use UDP to relay SPTPS packets received over TCP.
Currently, when tinc receives a SPTPS packet over TCP via the REQ_KEY
encapsulation mechanism, it forwards it like any other TCP request. This
is inefficient, because even though we received the packet over TCP,
we might have an UDP link with the next hop, which means the packet
could be sent over UDP.

This commit removes that limitation by making sure SPTPS data packets
received through REQ_KEY requests are not forwarded as-is but passed
to send_sptps_data() instead, thereby using the same code path as if
the packet was received over UDP.
2015-05-10 21:08:57 +01:00
Etienne Dechamps
1296f715b5 Expose the raw SPTPS send interface from net_packet.
net_packet doesn't actually use send_sptps_data(); it only uses
send_sptps_data_priv(). In addition, the only user of send_sptps_data()
is protocol_key. Therefore it makes sense to expose
send_sptps_data_priv() directly, and move send_sptps_data() (which is
basically just boilerplate) as a local function in protocol_key.
2015-05-10 21:08:57 +01:00
thorkill
0fc873a161 Merge branch '1.1' into thkr-1.1-ponyhof 2015-04-16 10:44:01 +02:00
thorkill
9910f8f2d1 Fixed a SIGABRT in send_ans_key().
In some cases the remote host does not know our key but we have got theirs.
So we send him our key but send_ans_key() aborted on this point.
2015-04-04 20:16:42 +02:00
Etienne Dechamps
b1421b9190 Add MTU_INFO protocol message.
In this commit, nodes use MTU_INFO messages to provide MTU information.

The issue this code is meant to address is the non-trivial problem of
finding the proper MTU when UDP SPTPS relays are involved. Currently,
tinc has no idea what the MTU looks like beyond the first relay, and
will arbitrarily use the first relay's MTU as the limit. This will fail
miserably if the MTU decreases after the first relay, forcing relays to
fall back to TCP. More generally, one should keep in mind that relay
paths can be arbitrarily complex, resulting in packets taking "epic
journeys" through the graph, switching back and forth between UDP (with
variable MTUs) and TCP multiple times along the path.

A solution that was considered consists in sending standard MTU probes
through the relays. This is inefficient (if there are 3 nodes on one
side of relay and 3 nodes on the other side, we end up with 3*3=9 MTU
discoveries taking place at the same time, while technically only
3+3=6 are needed) and would involve eyebrow-raising behaviors such as
probes being sent over TCP.

This commit implements an alternative solution, which consists in
the packet receiver sending MTU_INFO messages to the packet sender.
The message contains an MTU value which is set to maximum when the
message is originally sent. The message gets altered as it travels
through the metagraph, such that when the message arrives to the
destination, the MTU value contained in the message can be used to
send packets while making sure no relays will be forced to fall back to
TCP to deliver them.

The operating principles behind such a protocol message are similar to
how the UDP_INFO message works, but there is a key difference that
prevents us from simply reusing the same message: the UDP_INFO message
only cares about relay-to-relay links (i.e. it is sent between static
relays and the information it contains only makes sense between two
adjacent static relays), while the MTU_INFO cares about the end-to-end
MTU, including the entire relay path. Therefore, UDP_INFO messages stop
when they encounter static relays, while MTU_INFO messages don't stop
until they get to the original packet sender.

Note that, technically, the MTU that is obtained through this mechanism
can be slightly pessimistic, because it can be lowered by an
intermediate node that is not being used as a relay. Since nodes have no
way of knowing whether they'll be used as dynamic relays or not (and
have no say in the matter), this is not a trivial problem. That said,
this is highly unlikely to result in noticeable issues in realistic
scenarios.
2015-03-14 13:39:05 +00:00
Etienne Dechamps
9bb230f30f Add UDP_INFO protocol message.
In this commit, nodes use UDP_INFO messages to provide UDP address
information. The basic principle is that the node that receives packets
sends UDP_INFO messages to the node that's sending the packets. The
message originally contains no address information, and is (hopefully)
updated with relevant address information as it gets relayed through the
metagraph - specifically, each intermediate node will update the message
with its best guess as to what the address is while forwarding it.

When a node receives an UDP_INFO message, and it doesn't have a
confirmed UDP tunnel with the originator node, it will update its
records with the new address for that node, so that it always has the
best possible guess as to how to reach that node. This applies to the
destination node of course, but also to any intermediate nodes, because
there's no reason they should pass on the free intel, and because it
results in nice behavior in the presence of relay chains (multiple nodes
in a path all trying to reach the same destination).

If, on the other hand, the node does have a confirmed UDP tunnel, it
will ignore the address information contained in the message.

In all cases, if the node that receives the message is not the
destination node specified in the message, it will forward the message
but not before overriding the address information with the one from its
own records. If the node has a confirmed UDP tunnel, that means the
message is updated with the address of the confirmed tunnel; if not,
the message simply reflects the records of the intermediate node, which
just happen to be the contents of the UDP_INFO message it just got, so
it's simply forwarded with no modification.

This is similar to the way ANS_KEY messages are currently
overloaded to provide UDP address information, with two differences:

 - UDP_INFO messages are sent way more often than ANS_KEY messages,
   thereby keeping the address information fresh. Previously, if the UDP
   situation were to change after the ANS_KEY message was sent, the
   sender would virtually never get the updated information.

 - Once a node puts address information in an ANS_KEY message, it is
   never changed again as the message travels through the metagraph; in
   contrast, UDP_INFO messages behave the opposite way, as they get
   rewritten every time they travel through a node with a confirmed UDP
   tunnel. The latter behavior seems more appropriate because UDP tunnel
   information becomes more relevant as it moves closer to the
   destination node. The ANS_KEY behavior is not satisfactory in some
   cases such as multi-layered graphs where the first hop is located
   before a NAT.

Ultimately, the rationale behind this whole process is to improve UDP
hole punching capabilities when port translation is in effect, and more
generally, to make tinc more reliable in (very) hostile network
conditions (such as multi-layered NAT).
2015-03-14 13:39:05 +00:00
Etienne Dechamps
69d4ccc437 Fix typo in logging statement.
This was introduced in cfe9285adf.
2015-01-11 00:04:01 +01:00
Guus Sliepen
6056f1c13b Remember whether we sent our key to another node.
In tinc 1.0.x, this was tracked in node->inkey, however in tinc 1.1 we have an abstraction layer for
the legacy cipher and digest, and we don't keep an explicit copy of the key around. We cannot use
cipher_active() or digest_active(), since it is possible to set both to the null algorithm. So add a bit to
node_status_t.
2015-01-10 22:26:33 +01:00
Etienne Dechamps
98716a227e Move PMTU discovery code into the TX path.
Currently, the PMTU discovery code is run by a timeout callback,
independently of tunnel activity. This commit moves it into the TX
path, meaning that send_mtu_probe_handler() is only called if a
packet is about to be sent. Consequently, it has been renamed to
try_mtu() for consistency with try_tx(), try_udp() and try_sptps().

Running PMTU discovery code only as part of the TX path prevents
PMTU discovery from generating unreasonable amounts of traffic when
the "real" traffic is negligible. One extreme example is sending one
real packet and then going silent: in the current code this one little
packet will result in the entire PMTU discovery algorithm being run
from start to finish, resulting in absurd write traffic amplification.
With this patch, PMTU discovery stops as soon as "real" packets stop
flowing, and will be no more aggressive than the underlying traffic.

Furthermore, try_mtu() only runs if there is confirmed UDP
connectivity as per the UDP discovery mechanism. This prevents
unnecessary network chatter - previously, the PMTU discovery code
would send bursts of (potentially large) probe packets every second
even if there was nothing on the other side. With this patch, the
PMTU code only does that if something replied to the lightweight UDP
discovery pings.

These inefficiencies were made even worse when the node is not a
direct neighbour, as tinc will use PMTU discovery both on the
destination node *and* the relay. UDP discovery is more lightweight for
this purpose.

As a bonus, this code simplifies overall code somewhat - state is
easier to manage when code is run in predictable contexts as opposed
to "surprise callbacks". In addition, there is no need to call PMTU
discovery code outside of net_packet.c anymore, thereby simplifying
module boundaries.
2015-01-01 17:40:15 +00:00
Guus Sliepen
cfe9285adf Allow tinc to be compiled without OpenSSL.
The option "--disable-legacy-protocol" was added to the configure
script. The new protocol does not depend on any external crypto
libraries, so when the option is used tinc is no longer linked to
OpenSSL's libcrypto.
2014-12-29 22:57:18 +01:00
Guus Sliepen
107d9c7da5 Use void pointers for opaque data blobs in the SPTPS code. 2014-12-24 22:15:40 +01:00
Etienne Dechamps
63daebcd1e Don't send MTU probes to nodes we can't reach directly.
Currently, we send MTU probes to each node we receive a key for, even if
we know we will never send UDP packets to that node because of
indirection. This commit disables MTU probing between nodes that have
direct communication disabled, otherwise MTU probes end up getting sent
through relays.

With the legacy protocol this was never a problem because we would never
request the key of a node with indirection enabled; with SPTPS this was
not a problem until we introduced relaying because send_sptps_data()
would simply ignore indirections, but this is not the case anymore.

Note that the fix is implemented in a quick and dirty way, by disabling
the call to send_mtu_probe() in ans_key_h(); this is not a clean fix
because there's no code to resume sending MTU probes in case the
indirection disappears because of a graph change.
2014-10-04 15:11:46 +01:00
Etienne Dechamps
111040d7d1 Add UDP datagram relay support to SPTPS.
This commit changes the layout of UDP datagrams to include a 6-byte
destination node ID at the very beginning of the datagram (i.e. before
the source node ID and the seqno). Note that this only applies to SPTPS.

Thanks to this new field, it is now possible to send SPTPS datagrams to
nodes that are not the final recipient of the packets, thereby using
these nodes as relay nodes. Previously SPTPS was unable to relay packets
using UDP, and required a fallback to TCP if the final recipient could
not be contacted directly using UDP. In that sense it fixes a regression
that SPTPS introduced with regard to the legacy protocol.

This change also updates tinc's low-level routing logic (i.e.
send_sptps_data()) to automatically use this relaying facility if at all
possible. Specifically, it will relay packets if we don't have a
confirmed UDP link to the final recipient (but we have one with the next
hop node), or if IndirectData is specified. This is similar to how the
legacy protocol forwards packets.

When sending packets directly without any relaying, the sender node uses
a special value for the destination node ID: instead of setting the
field to the ID of the recipient node, it writes a zero ID instead. This
allows the recipient node to distinguish between a relayed packet and a
direct packet, which is important when determining the UDP address of
the sending node.

On the relay side, relay nodes will happily relay packets that have a
destination ID which is non-zero *and* is different from their own,
provided that the source IP address of the packet is known. This is to
prevent abuse by random strangers, since a node can't authenticate the
packets that are being relayed through it.

This change keeps the protocol number from the previous datagram format
change (source IDs), 17.4. Compatibility is still preserved with 1.0 and
with pre-1.1 releases. Note, however, that nodes running this code won't
understand datagrams sent from nodes that only use source IDs and
vice-versa (not that we really care).

There is one caveat: in the current state, there is no way for the
original sender to know what the PMTU is beyond the first hop, and
contrary to the legacy protocol, relay nodes can't apply MSS clamping
because they can't decrypt the relayed packets. This leads to
inefficient scenarios where a reduced PMTU over some link that's part of
the relay path will result in relays falling back to TCP to send packets
to their final destinations.

Another caveat is that once a packet gets sent over TCP, it will use
TCP over the entire path, even if it is technically possible to use UDP
beyond the TCP-only link(s).

Arguably, these two caveats can be fixed by improving the
metaconnection protocol, but that's out of scope for this change. TODOs
are added instead. In any case, this is no worse than before.

In addition, this change increases SPTPS datagram overhead by another
6 bytes for the destination ID, on top of the existing 6-byte overhead
from the source ID.
2014-10-04 14:37:15 +01:00
Etienne Dechamps
daf65919d1 Preemptively mirror REQ_PUBKEY messages from nodes with unknown keys.
In this commit, if a node receives a REQ_PUBKEY message from a node it
doesn't have the key for, it will send a REQ_PUBKEY message in return
*before* sending its own key.

The rationale is to prevent delays when establishing communication
between two nodes that see each other for the first time. These delays
are caused by the first SPTPS packet being dropped on the floor, as
shown in the following typical exchange:

	node1: No Ed25519 key known for node2
	REQ_PUBKEY ->
	<- ANS_PUBKEY
	node1: Learned Ed25519 public key from node2
	REQ_SPTPS_START ->
	node2: No Ed25519 key known for zyklos
	<- REQ_PUBKEY
	ANS_PUBKEY ->
	node2: Learned Ed25519 public key from node1
	-- 10-second delay --
	node1: No key from node2 after 10 seconds, restarting SPTPS
	REQ_SPTPS_START ->
	<- SPTPS ->
	node1: SPTPS key exchange with node2 succesful
	node2: SPTPS key exchange with node1 succesful

With this patch, the following happens instead:

	node1: No Ed25519 key known for node2
	REQ_PUBKEY ->
	node2: Preemptively requesting Ed25519 key for node1
	<- REQ_PUBKEY
	<- ANS_PUBKEY
	ANS_PUBKEY ->
	node2: Learned Ed25519 public key from node1
	node1: Learned Ed25519 public key from node2
	REQ_SPTPS_START ->
	<- SPTPS ->
	node1: SPTPS key exchange with node2 succesful
	node2: SPTPS key exchange with node1 succesful
2014-09-22 10:10:57 +02:00
Etienne Dechamps
b23bf13283 Remove redundant connection_t::status.active field.
The only places where connection_t::status.active is modified is in
ack_h() and terminate_connection(). In both cases, connection_t::edge
is added and removed at the same time, and that's the only places
connection_t::edge is set. Therefore, the following is true at all
times:

    !c->status.active == !c->edge

This commit removes the redundant state information by getting rid of
connection_t::status.active, and using connection_t::edge instead.
2014-07-12 14:21:48 +02:00
Guus Sliepen
b0d80c7f28 Allow Cipher and Digest "none".
This is for backwards compatibility with tinc 1.0, it has no effect on
the SPTPS protocol.
2014-05-18 21:51:42 +02:00
Guus Sliepen
f0e7e6b03e Rename ECDSA to Ed25519. 2014-05-18 20:47:04 +02:00
Guus Sliepen
51bddfd4dd Allow "none" for Cipher and Digest again. 2013-11-28 14:28:18 +01:00
Etienne Dechamps
e3a4672afb Disable PMTU discovery when TCPOnly is set.
Obviously, PMTU discovery doesn't make much sense when we know we'll be
using TCP anyway.
2013-07-21 00:36:28 +02:00
Guus Sliepen
c3d357af6c Improve base64 encoding/decoding, add URL-safe variant.
b64decode() now returns length 0 when an invalid character was encountered.
2013-05-28 13:39:15 +02:00
Guus Sliepen
214060ef20 Fix warnings for functions marked __attribute((warn_unused_result)). 2013-05-10 20:30:47 +02:00
Guus Sliepen
9b9230a0a7 Use conditional compilation for cryptographic functions.
This gets rid of the rest of the symbolic links. However, as a consequence, the
crypto header files have now moved to src/, and can no longer contain
library-specific declarations. Therefore, cipher_t, digest_t, ecdh_t, ecdsa_t
and rsa_t are now all opaque types, and only pointers to those types can be
used.
2013-05-01 17:17:22 +02:00
Guus Sliepen
4c30004cb6 Avoid calling time(NULL).
In most cases we can use the cached time.
2013-03-08 14:11:15 +01:00
Guus Sliepen
cc3c69c892 Releasing 1.1pre5. 2013-01-20 21:03:22 +01:00
Guus Sliepen
eef25266cb Count the number of correctly received UDP packets.
Keep track of the number of correct, non-replayed UDP packets that have been
received, regardless of their content. This can be compared to the sequence
number to determine the real packet loss.
2013-01-15 13:33:16 +01:00
Guus Sliepen
44a24f63ac Fix handling of initial datagram SPTPS packet.
Only the very first packet of an SPTPS session should be send with REQ_KEY,
this signals the peer to abort any previous session and start a new one as
well.
2012-10-14 14:33:54 +02:00
Guus Sliepen
d917c8cb6b Fix whitespace. 2012-10-10 17:17:49 +02:00
Guus Sliepen
d1ec010660 Fix memory leaks found by valgrind. 2012-10-09 16:27:28 +02:00
Guus Sliepen
0b8b23e0dd C99 extravaganza. 2012-10-08 00:35:38 +02:00
Guus Sliepen
ff306f0cda Replace the connection_tree with a connection_list.
The tree functions were never used on the connection_tree, a list is more appropriate.
Also be more paranoid about connections disappearing while traversing the list.
2012-10-07 21:59:53 +02:00
Guus Sliepen
c2a9ed9e98 Handle packets encrypted via SPTPS that need to be forwarded via TCP. 2012-10-07 14:03:50 +02:00
Guus Sliepen
bb6b97ce34 Make datagram SPTPS key exchange more robust.
Similar to old style key exchange requests, keep track of whether a key
exchange is already in progress and how long it took. If no key is known yet
or if key exchange takes too long, (re)start a new key exchange.
2012-10-07 13:31:19 +02:00
Guus Sliepen
6dfdb32361 Merge branch 'master' into 1.1
Conflicts:
	lib/utils.c
	src/net_setup.c
	src/process.c
	src/protocol_auth.c
	src/protocol_key.c
	src/utils.h
2012-09-30 15:00:47 +02:00
Guus Sliepen
c4940a5c88 Add strict checks to hex to binary conversions.
The main goal is to catch misuse of the obsolete PrivateKey and PublicKey
statements.
2012-09-30 13:45:47 +02:00
Guus Sliepen
9e76c464b2 Remove some debugging messages. 2012-09-28 17:51:48 +02:00
Guus Sliepen
8af2f3f5a4 Optionally compress and/or strip Ethernet header from SPTPS packets. 2012-08-02 17:44:59 +02:00
Guus Sliepen
91937812bd Clear struct sptps before reusing it. 2012-08-02 17:23:51 +02:00
Guus Sliepen
7a71d48009 Use a status bit to track which nodes use SPTPS. 2012-07-31 21:43:49 +02:00