Commit graph

287 commits

Author SHA1 Message Date
thorkill
ef4a0848ca Merge branch '1.1' of github.com:gsliepen/tinc into thkr-1.1-ponyhof 2015-05-19 17:59:03 +02:00
Etienne Dechamps
a196e9b0fd Fix direct UDP communciation with pre-relaying 1.1 nodes.
try_tx_sptps() gives up on UDP communication if the recipient doesn't
support relaying. This is too restrictive - we only need the other node
to support relaying if we actually want to relay through them. If the
packet is sent directly, it's fine to send it to an old pre-node-IDs
tinc-1.1 node.
2015-05-18 21:08:43 +01:00
Etienne Dechamps
fef29d0193 Don't parse node IDs if the sending node doesn't support them.
Currently, tinc tries to parse node IDs for all SPTPS packets, including
ones sent from older, pre-node-IDs tinc-1.1 nodes, and therefore doesn't
recognize packets from these nodes. This commit fixes that.

It also makes code slightly clearer by reducing the amount of fiddling
around packet offset/length.
2015-05-18 20:56:16 +01:00
Etienne Dechamps
643149b449 Fix SPTPS condition in try_harder().
A condition in try_harder() is always evaluating to false when talking
to a SPTPS node because n->status.validkey_in is always false in that
case. Fix the condition so that the SPTPS status is correctly checked.

This prevented recent tinc-1.1 nodes from talking to older, pre-node-ID
tinc-1.1 nodes.

The regression was introduced in
6056f1c13b.
2015-05-18 20:38:01 +01:00
thorkill
23eff91634 resolved conflict 2015-05-17 23:13:43 +02:00
Etienne Dechamps
1a7a9078c0 Proactively restart the SPTPS tunnel if we get receive errors.
There are a number of ways a SPTPS tunnel can get into a corrupt state.
For example, during key regeneration, the KEX and SIG messages from
other nodes might arrive out of order, which confuses the hell out of
the SPTPS code. Another possible scenario is not noticing another node
crashed and restarted because there was no point in time where the node
was seen completely disconnected from *all* nodes; this could result in
using the wrong (old) key. There are probably other scenarios which have
not even been considered yet. Distributed systems are hard.

When SPTPS got confused by a packet, it used to crash the entire
process; fortunately that was fixed by commit
2e7f68ad2b. However, the error handling
(or lack thereof) leaves a lot to be desired. Currently, when SPTPS
encounters an error when receiving a packet, it just shrugs it off and
continues as if nothing happened. The problem is, sometimes getting
receive errors mean the tunnel is completely stuck and will not recover
on its own. In that case, the node will become unreachable - possibly
indefinitely.

The goal of this commit is to improve SPTPS error handling by taking
proactive action when an incoming packet triggers a failure, which is
often an indicator that the tunnel is stuck in some way. When that
happens, we simply restart SPTPS entirely, which should make the tunnel
recover quickly.

To prevent "storms" where two buggy nodes flood each other with invalid
packets and therefore spend all their time negotiating new tunnels, we
limit the frequency at which tunnel restarts happen to ten seconds.

It is likely this commit will solve the "Invalid KEX record length
during key regeneration" issue that has been seen in the wild. It is
difficult to be sure though because we do not have a full understanding
of all the possible conditions that can trigger this problem.
2015-05-17 19:21:50 +01:00
Etienne Dechamps
1e89a63f16 Prevent SPTPS key regeneration packets from entering an UDP relay path.
Commit 10c1f60c64 introduced a mechanism
by which a packet received by REQ_KEY could continue its journey over
UDP. This was based on the assumption that REQ_KEY messages would never
be used for handshake packets (which should never be sent over UDP,
because SPTPS currently doesn't handle lost handshake packets very
well).

Unfortunately, there is one case where handshake packets are sent using
REQ_KEY: when regenerating the SPTPS key for a pre-established channel.
With the current code, such packets risk getting relayed over UDP.

When processing a REQ_KEY message, it is impossible for the receiving
end to distinguish between a data SPTPS packet and a handshake packet,
because this information is stored in the type field which is encrypted
with the end-to-end key.

This commit fixes the issue by making tinc use ANS_KEY for all SPTPS
handshake messages. This works because ANS_KEY messages are never
forwarded using the SPTPS relay mechanisms, therefore they are
guaranteed to stick to TCP.
2015-05-17 17:09:56 +01:00
Guus Sliepen
fd1cff6df2 Fix receiving UDP packets from tinc 1.0.x nodes.
In try_mac(), the wrong offsets were used into the packet buffer,
causing the digest verification to always fail.
2015-05-15 00:21:48 +02:00
thorkill
35af740537 Merge branch '1.1' of github.com:gsliepen/tinc into thkr-1.1-ponyhof 2015-05-12 17:28:29 +02:00
Etienne Dechamps
7e6b2dd1ea Introduce raw TCP SPTPS packet transport.
Currently, SPTPS packets are transported over TCP metaconnections using
extended REQ_KEY requests, in order for the packets to pass through
tinc-1.0 nodes unaltered. Unfortunately, this method presents two
significant downsides:

 - An already encrypted SPTPS packet is decrypted and then encrypted
   again every time it passes through a node, since it is transported
   over the SPTPS channels of the metaconnections. This
   double-encryption is unnecessary and wastes CPU cycles.

 - More importantly, the only way to transport binary data over
   standard metaconnection messages such as REQ_KEY is to encode it
   in base64, which has a 33% encoding overhead. This wastes 25% of the
   network bandwidth.

This commit introduces a new protocol message, SPTPS_PACKET, which can
be used to transport SPTPS packets over a TCP metaconnection in an
efficient way. The new message is appropriately protected through a
minor protocol version increment, and extended REQ_KEY messages are
still used with nodes that do not support the new message, as well as
for the intial handshake packets, for which efficiency is not a concern.

The way SPTPS_PACKET works is very similar to how the traditional PACKET
message works: after the SPTPS_PACKET message, the raw binary packet is
sent directly over the metaconnection. There is one important
difference, however: in the case of SPTPS_PACKET, the packet is sent
directly over the TCP stream completely bypassing the SPTPS channel of
the metaconnection itself for maximum efficiency. This is secure because
the SPTPS packet that is being sent is already encrypted with an
end-to-end key.
2015-05-10 21:08:57 +01:00
Etienne Dechamps
de14308840 Rename REQ_SPTPS to SPTPS_PACKET.
REQ_SPTPS implies the message has an ANS_ counterpart (like REQ_KEY,
ANS_KEY), but it doesn't. Therefore dropping the REQ_ seems more
appropriate, and we add a _PACKET suffix to reduce the likelihood of
naming conflicts.
2015-05-10 21:08:57 +01:00
Etienne Dechamps
1296f715b5 Expose the raw SPTPS send interface from net_packet.
net_packet doesn't actually use send_sptps_data(); it only uses
send_sptps_data_priv(). In addition, the only user of send_sptps_data()
is protocol_key. Therefore it makes sense to expose
send_sptps_data_priv() directly, and move send_sptps_data() (which is
basically just boilerplate) as a local function in protocol_key.
2015-05-10 21:08:57 +01:00
Etienne Dechamps
8e43a2fc74 Use the correct originator node when relaying SPTPS UDP packets.
Currently, when relaying SPTPS UDP packets, the code uses the direct
sender as the originator, instead of preserving the original source ID.

This wouldn't cause any issues in most cases because the originator and
the sender are the same in simple one-hop relay chains, but this will
break as soon as there is more than one relay.
2015-05-10 18:46:47 +01:00
Etienne Dechamps
9d223cb7e7 When relaying, send probes to the destination, not the source.
This seems to be a typo from c23e50385d.
Achievement unlocked: got a one-line commit wrong.
2015-05-10 18:37:30 +01:00
thorkill
0fc873a161 Merge branch '1.1' into thkr-1.1-ponyhof 2015-04-16 10:44:01 +02:00
Guus Sliepen
95921696a4 Always call res_init() before getaddrinfo().
Unfortunately, glibc assumes that /etc/resolv.conf is a static file that
never changes. Even on servers, /etc/resolv.conf might be a dynamically
generated file, and we never know when it changes. So just call
res_init() every time, so glibc uses up-to-date nameserver information.

Conflicts:
	src/have.h
	src/net.c
	src/net_setup.c
2015-04-12 15:42:48 +02:00
Guus Sliepen
9e71b74ed8 Merge remote-tracking branch 'dechamps/staticfix' into 1.1 2015-04-12 15:34:50 +02:00
thorkill
0cd387fd90 This commit implements average RTT estimation based on PING-PONG between active TCP connections.
Average RTT can be used to update edge weight and propagate it to the network.
tinc dump edges has been also extended to give the current RTT.
New edge weight will change only if the config has EdgeUpdateInterval set to other value than 0.

- Ignore local configuration for editors
- Extended manpage with informations about EdgeUpdateInterval
- Added clone_edge and fixed potential segfault when b->from not defined
- Compute avg_rtt based on the time values we got back in PONG
- Add avg_rtt on dump edge
- Send current time on PING and return it on PONG
- Changed last_ping_time to struct timeval
- Extended edge_t with avg_rtt
2015-04-11 15:27:28 +02:00
thorkill
157bc90e64 Temporal fix for 'unknown source' and broken direct UDP links. 2015-03-18 14:54:45 +01:00
Etienne Dechamps
7027bba541 Increase the ReplayWindow default from 16 to 32.
As a rule, it seems reasonable to make sure that tinc operates correctly
on at least 1G links, since these are pretty common. However, I have
observed replay window issues when operating at speeds of 600 Mbit/s and
above, especially when the receiving end is a Windows system (not sure
why). This commit increases the default so that this won't occur on
fresh setups.
2015-03-15 18:04:58 +00:00
Etienne Dechamps
fa432426df Don't send UDP probes past static relays.
Ironically, commit 0f8e2cc78c introduced
a regression on its own, since it accidently removed a return statement
that prevented try_tx_sptps() from sending UDP/MTU probes to nodes that
are past static relays.
2015-03-14 14:04:50 +00:00
Etienne Dechamps
b1421b9190 Add MTU_INFO protocol message.
In this commit, nodes use MTU_INFO messages to provide MTU information.

The issue this code is meant to address is the non-trivial problem of
finding the proper MTU when UDP SPTPS relays are involved. Currently,
tinc has no idea what the MTU looks like beyond the first relay, and
will arbitrarily use the first relay's MTU as the limit. This will fail
miserably if the MTU decreases after the first relay, forcing relays to
fall back to TCP. More generally, one should keep in mind that relay
paths can be arbitrarily complex, resulting in packets taking "epic
journeys" through the graph, switching back and forth between UDP (with
variable MTUs) and TCP multiple times along the path.

A solution that was considered consists in sending standard MTU probes
through the relays. This is inefficient (if there are 3 nodes on one
side of relay and 3 nodes on the other side, we end up with 3*3=9 MTU
discoveries taking place at the same time, while technically only
3+3=6 are needed) and would involve eyebrow-raising behaviors such as
probes being sent over TCP.

This commit implements an alternative solution, which consists in
the packet receiver sending MTU_INFO messages to the packet sender.
The message contains an MTU value which is set to maximum when the
message is originally sent. The message gets altered as it travels
through the metagraph, such that when the message arrives to the
destination, the MTU value contained in the message can be used to
send packets while making sure no relays will be forced to fall back to
TCP to deliver them.

The operating principles behind such a protocol message are similar to
how the UDP_INFO message works, but there is a key difference that
prevents us from simply reusing the same message: the UDP_INFO message
only cares about relay-to-relay links (i.e. it is sent between static
relays and the information it contains only makes sense between two
adjacent static relays), while the MTU_INFO cares about the end-to-end
MTU, including the entire relay path. Therefore, UDP_INFO messages stop
when they encounter static relays, while MTU_INFO messages don't stop
until they get to the original packet sender.

Note that, technically, the MTU that is obtained through this mechanism
can be slightly pessimistic, because it can be lowered by an
intermediate node that is not being used as a relay. Since nodes have no
way of knowing whether they'll be used as dynamic relays or not (and
have no say in the matter), this is not a trivial problem. That said,
this is highly unlikely to result in noticeable issues in realistic
scenarios.
2015-03-14 13:39:05 +00:00
Etienne Dechamps
9bb230f30f Add UDP_INFO protocol message.
In this commit, nodes use UDP_INFO messages to provide UDP address
information. The basic principle is that the node that receives packets
sends UDP_INFO messages to the node that's sending the packets. The
message originally contains no address information, and is (hopefully)
updated with relevant address information as it gets relayed through the
metagraph - specifically, each intermediate node will update the message
with its best guess as to what the address is while forwarding it.

When a node receives an UDP_INFO message, and it doesn't have a
confirmed UDP tunnel with the originator node, it will update its
records with the new address for that node, so that it always has the
best possible guess as to how to reach that node. This applies to the
destination node of course, but also to any intermediate nodes, because
there's no reason they should pass on the free intel, and because it
results in nice behavior in the presence of relay chains (multiple nodes
in a path all trying to reach the same destination).

If, on the other hand, the node does have a confirmed UDP tunnel, it
will ignore the address information contained in the message.

In all cases, if the node that receives the message is not the
destination node specified in the message, it will forward the message
but not before overriding the address information with the one from its
own records. If the node has a confirmed UDP tunnel, that means the
message is updated with the address of the confirmed tunnel; if not,
the message simply reflects the records of the intermediate node, which
just happen to be the contents of the UDP_INFO message it just got, so
it's simply forwarded with no modification.

This is similar to the way ANS_KEY messages are currently
overloaded to provide UDP address information, with two differences:

 - UDP_INFO messages are sent way more often than ANS_KEY messages,
   thereby keeping the address information fresh. Previously, if the UDP
   situation were to change after the ANS_KEY message was sent, the
   sender would virtually never get the updated information.

 - Once a node puts address information in an ANS_KEY message, it is
   never changed again as the message travels through the metagraph; in
   contrast, UDP_INFO messages behave the opposite way, as they get
   rewritten every time they travel through a node with a confirmed UDP
   tunnel. The latter behavior seems more appropriate because UDP tunnel
   information becomes more relevant as it moves closer to the
   destination node. The ANS_KEY behavior is not satisfactory in some
   cases such as multi-layered graphs where the first hop is located
   before a NAT.

Ultimately, the rationale behind this whole process is to improve UDP
hole punching capabilities when port translation is in effect, and more
generally, to make tinc more reliable in (very) hostile network
conditions (such as multi-layered NAT).
2015-03-14 13:39:05 +00:00
Etienne Dechamps
c23e50385d Fix UDP/MTU discovery in intermediate SPTPS UDP relays.
Refactoring commit 81578484dc seems to
have introduced a regression as it moved discovery code away from
send_sptps_data_priv() and within send_packet(). The issue is,
send_packet() is not called when the node is simply relaying an UDP
SPTPS packet: indeed, send_sptps_data_priv() is called directly from
handle_incoming_vpn_data() in that case.

As a result, try_tx_sptps() is not called in the relaying case, which in
practice means that a relay doesn't initiate UDP/MTU discovery with the
next relay (unless some other activity compels it to do so). This can
result in packets getting sent over TCP instead of UDP from the relay.
2015-03-08 14:40:27 +00:00
Etienne Dechamps
0f8e2cc78c Fix dynamic UDP SPTPS relaying.
Refactoring commit 0e65326047 broke UDP
SPTPS relaying by accidently removing try_tx_sptps() logic related to
establishing connectivity to so-called "dynamic" relays (i.e. relays
that are not specified by IndirectData configuration statements, but
are used on-the-fly to circumvent loss of direct UDP connectivity).

Specifically, the TX path was not trying to establish a tunnel to
dynamic relays (nexthop) anymore. This meant that MTU was not being
discovered with dynamic relays, which basically meant that all packets
being sent to dynamic relays went over TCP, thereby defeating the whole
purpose of SPTPS UDP relaying.

Note that this bug could easily go unnoticed if a tunnel was established
with the dynamic tunnel for some other reason (i.e. exchanging actual
data packets with the relay node).
2015-03-08 14:28:07 +00:00
xentec
537c352886 Fix compile errors introduced in cfe9285adf
Compiling with `--disable-legacy-protocol` resulted in failure caused by the missing exclusion of some symbols in net_packet.c.
2015-02-17 04:02:35 +01:00
Guus Sliepen
a95e182d9c Improve packet source detection.
When no UDP communication has been done yet, tinc establishes a guess
for the UDP address+port of each node. However, when there are multiple nodes
behind a NAT, tinc will guess the exact same address+port combination
for them, because it doesn't know about the NAT mappings yet. So when
receiving a packet, don't trust that guess unless we have confirmed UDP
communication.

This ensures try_harder() is called in such cases. However, this
function was actually very inefficient, trying to verify packets
multiple times for nodes with multiple edges. Only call try_mac() at
most once per node.
2015-01-12 14:43:32 +01:00
Guus Sliepen
ae5b56c03d Send gratuitous type 2 probe replies.
If we receive any traffic from another node, we periodically send back a
gratuitous type 2 probe reply with the maximum received packet length.
On the other node, this causes the udp and perhaps mtu probe timers to
be reset, so it does not need to send a probe request. Gratuitous probe
replies from another node also count as received traffic for this
purpose, so for nodes that also have a meta-connection, UDP keepalive
packets in principle can now solely be type 2 replies. This reduces the
amount of probe traffic even more.

To work, gratuitous replies should be sent slightly more often than
udp_discovery_keepalive_interval, so probe requests won't be triggered.
This also means that the timer resolution must be smaller than the
difference between the two, and at the moment it's kind of a hack.
2015-01-11 17:44:50 +01:00
Guus Sliepen
7b76b7ac35 Send the size of the largest recently received packets in type 2 probe replies. 2015-01-11 16:14:05 +01:00
Guus Sliepen
79b6adb489 Move UDP probe reply code into its own function.
This reduces the level of indentation, and prepares for sending gratuitous type 2 probe replies.
2015-01-11 16:12:57 +01:00
Guus Sliepen
f0afde0467 Keep track of the largest UDP packet size received from a node. 2015-01-11 16:10:58 +01:00
Guus Sliepen
d639415937 Move detection of PMTU decrease to try_mtu().
When we have fixed the PMTU, n->mtuprobes == -1. When we send MTU probes
when mtuprobes == -1, decrease mtuprobes, and reset it back to -1 in
mtu_probe_h(). If mtuprobes < -1, send MTU probes every second, until
mtuprobes <= -4, in which case we will restart MTU discovery.
2015-01-11 15:38:56 +01:00
Guus Sliepen
e97e9b22cb Send MTU probes only once every PingInterval. 2015-01-11 14:44:27 +01:00
Guus Sliepen
088b5fd9ee Remove RTT and packet loss estimation code.
This is not working at all anymore. Just remove it, and we'll do another
attempt at RTT, bandwidth and packet loss estimation after the new
probing code stabilizes.
2015-01-11 14:44:15 +01:00
Guus Sliepen
ce7079f4af Only send small packets during UDP probes.
We are trying to decouple UDP probing from MTU probing, so only send
very small packets during UDP probing. This significantly reduces the
amount of traffic sent (54 to 67 bytes per probe instead of 1500 bytes).

This means the MTU probing code takes over sending PMTU sized probes,
but this commit does not take care of detecting PMTU decreases.
2015-01-11 13:53:16 +01:00
Guus Sliepen
eb7a0db18e Always keep UDP mappings alive for nodes that also have a meta-connection.
This is necessary for assisting with UDP hole punching. But we don't
need to know the PMTU for this, so only send UDP probes.
2015-01-11 13:31:01 +01:00
Guus Sliepen
6fcfe763aa Don't send probe replies if we don't have the other's key.
This can happen with the legacy protocol. Don't try to send anything
back in this case, otherwise it will be sent via TCP, which is silly.
2015-01-10 23:58:35 +01:00
Guus Sliepen
f3801cb543 Proactively send our own key when we request another node's key. 2015-01-10 23:52:23 +01:00
Guus Sliepen
c26bb47af1 Fix size of type 2 probe replies.
Type 2 replies should be as small as possible. The minimum payload size
for probe packets is 14 bytes, otherwise they won't be recognized as
such.
2015-01-10 23:33:55 +01:00
Guus Sliepen
0209f12d27 Correctly estimate the initial MTU for legacy packets. 2015-01-10 23:00:51 +01:00
Guus Sliepen
0e65326047 Try to clarify the new code in net_packet.c a bit.
Mainly by trying to reduce complex if statements, by splitting try_tx() into try_tx_legacy() and
try_tx_sptps(), since they don't share a lot of code.
2015-01-10 22:28:47 +01:00
Guus Sliepen
6056f1c13b Remember whether we sent our key to another node.
In tinc 1.0.x, this was tracked in node->inkey, however in tinc 1.1 we have an abstraction layer for
the legacy cipher and digest, and we don't keep an explicit copy of the key around. We cannot use
cipher_active() or digest_active(), since it is possible to set both to the null algorithm. So add a bit to
node_status_t.
2015-01-10 22:26:33 +01:00
Guus Sliepen
f1f2df0738 Use global "now" in try_udp() and try_mtu(). 2015-01-04 16:00:02 +01:00
Etienne Dechamps
07108117ce Use a different UDP discovery interval if the tunnel is established.
This introduces a new configuration option,
UDPDiscoveryKeepaliveInterval, which is used as the UDP discovery
interval once the UDP tunnel is established. The pre-existing option,
UDPDiscoveryInterval, is therefore only used before UDP connectivity
is established.

The defaults are set so that tinc sends UDP pings more aggressively
if the tunnel is not established yet. This is appropriate since the
size of probes in that scenario is very small (16 bytes).
2015-01-03 10:12:36 +00:00
Etienne Dechamps
06345f89b9 Recalculate and resend MTU probes if they are too large for the system.
Currently, if a MTU probe is sent and gets rejected by the system
because it is too large (i.e. send() returns EMSGSIZE), the MTU
discovery algorithm is not aware of it and still behaves as if the probe
was actually sent.

This patch makes the MTU discovery algorithm recalculate and send a new
probe when this happens, so that the probe "slot" does not go to waste.
2015-01-02 09:56:50 +00:00
Etienne Dechamps
f89319f981 Fine-tune the MTU discovery multiplier for the maxmtu < MTU case.
The original multiplier constant for the MTU discovery algorithm, 0.97,
assumes a somewhat pessmistic scenario where we don't get any help from
the OS - i.e. maxmtu never changes. This can happen if IP_MTU is not
usable and the OS doesn't reject overly large packets.

However, in most systems the OS will, in fact, contribute to the MTU
discovery process. In these situations, an actual MTU equal to maxmtu
is quite likely (as opposed to the maxmtu = 1518 case where that is
highly unlikely, unless the physical network supports jumbo frames).
It therefore makes sense to use a multiplier of 1 - that will make the
first probe length equal to maxmtu.

The best results are obtained if the OS supports the getsockopt(IP_MTU)
call, and its result is accurate. In that case, tinc will typically fix
the MTU after one single probe(!), like so:

    Using system-provided maximum tinc MTU for foobar (1.2.3.4 port 655): 1442
    Sending UDP probe length 1442 to foobar (1.2.3.4 port 655)
    Got type 2 UDP probe reply 1442 from foobar (1.2.3.4 port 655)
    Fixing MTU of foobar (1.2.3.4 port 655) to 1442 after 1 probes
2015-01-02 09:55:54 +00:00
Etienne Dechamps
bce17c83e8 Add IP_MTU-based maxmtu estimation.
Linux provides a getsockopt() option, IP_MTU, to get the kernel's best
guess at a connection MTU. In practice, it seems to return the MTU of
the physical interface the socket is using.

This patch uses this option to initialize maxmtu to a better value when
MTU discovery starts.

Unfortunately, this is not supported on Windows. Winsock has options
such as SO_MAX_MSG_SIZE, SO_MAXDG and SO_MAXPATHDG but they seem useless
as they always return absurdly large values (typically, 65507), as
confirmed by http://support.microsoft.com/kb/822061/
2015-01-02 09:55:54 +00:00
Etienne Dechamps
c1532035e2 Don't send MTU probes smaller than 512 bytes.
If MTU discovery comes up with an MTU smaller than 512 bytes (e.g. due
to massive packet loss), it's pretty much guaranteed to be wrong. Even
if it's not, most Internet applications assume the MTU will be at least
512, so fixing the MTU to a small value is likely to cause trouble
anyway.

This also makes the discovery algorithm converge even faster, since the
interval it has to consider is smaller.
2015-01-02 09:55:54 +00:00
Etienne Dechamps
172cbe6771 Adjust MTU probe counts.
The recently introduced new MTU discovery algorithm converges much
faster than the previous one, which allows us to reduce the number
of probes required before we can confidently fix the MTU. This commit
reduces the number of initial discovery probes from 90 to 20. With the
new algorithm this is more than enough to get to the precise (byte-level
accuracy) MTU value; in cases of packet loss or weird MTU values for
which the algorithm is not optimized, we should get close to the actual
value, and then we rely on MTU increase detection (steady state probes)
to fine-tune it later if the need arises.

This patch also triggers MTU increase detection even if the MTU we have
is off by only one byte. Previously we only did that if it was off by at
least 8 bytes. Considering that (1) this should happen less often,
(2) restarting MTU discovery is cheaper than before and (3) having MTUs
that are subtly off from their intended values by just a few bytes
sounds like trouble, this sounds like a good idea.
2015-01-02 09:55:54 +00:00
Etienne Dechamps
24d28adf64 Use a smarter algorithm for choosing MTU discovery probe sizes.
Currently, tinc uses a naive algorithm for choosing MTU discovery probe
sizes, picking a size at random between minmtu and maxmtu.

This is of course suboptimal - since the behavior of probes is
deterministic (assuming no packet loss), it seems likely that using a
non-deterministic discovery algorithm will not yield the best results.
Furthermore, the randomness introduces a lot of variation in convergence
times.

The random solution also suffers from pathological cases - since it's
using a uniform distribution, it doesn't take into account the fact that
it's often more interesting to send small probes rather than large ones,
because getting replies is the only way we can make progress (assuming
the worst case scenario in which the OS doesn't know anything, therefore
keeping maxmtu constant). This can lead to absurd situations where the
discovery algorithm is close to the real MTU, but can't get to it
because the random number generator keeps generating numbers that are
past it.

The algorithm implemented in this patch aims to improve on the naive
random algorithm. It is organized around "cycles" of 8 probes; the sizes
of the probes decrease as we go through the cycle, thus making sure the
algorithm can cover lots of ground quickly (in case we're far from
actual MTU), but also examining the local area (in case we're close to
actual MTU). Using cycles ensures that the algorithm will "go back" to
large probes to better cover the new interval and to protect against
packet loss.

For the probe size itself, various mathematical models were simulated in
an attempt to find the one that converges the fastest; it has been
determined that using an exponential based on the size of the remaining
interval was the most effective option. The exponential is adjusted with
a magic multiplier fine-tuned to make tinc jump to the "most
interesting" (i.e. 1400+) section as soon as discovery starts.

Simulations indicate that assuming no packet loss and no help from the
OS (i.e. maxmtu stays constant), this algorithm will typically converge
to the *exact* MTU value in less than 10 probes, and will get within 8
bytes in less than 5 probes, for actual MTUs between 1417 and ~1450
(which is the range the algorithm is fine-tuned for). In contrast, the
previous algorithm gives results all over the place, sometimes taking
30+ probes to get in the ballpark. Because of the issues with the
distribution, the previous algorithm sometimes never gets to the precise
MTU value within any reasonable amount of time - in contrast, the new
algorithm will always get to the precise value in less than 30 probes,
even if the actual MTU is completely outside the optimized range.
2015-01-02 09:55:52 +00:00