j3d1/tinc - Neuland Labor e.V.: Git

j3d1/tinc

Author	SHA1	Message	Date
Etienne Dechamps	1a7a9078c0	Proactively restart the SPTPS tunnel if we get receive errors. There are a number of ways a SPTPS tunnel can get into a corrupt state. For example, during key regeneration, the KEX and SIG messages from other nodes might arrive out of order, which confuses the hell out of the SPTPS code. Another possible scenario is not noticing another node crashed and restarted because there was no point in time where the node was seen completely disconnected from all nodes; this could result in using the wrong (old) key. There are probably other scenarios which have not even been considered yet. Distributed systems are hard. When SPTPS got confused by a packet, it used to crash the entire process; fortunately that was fixed by commit `2e7f68ad2b`. However, the error handling (or lack thereof) leaves a lot to be desired. Currently, when SPTPS encounters an error when receiving a packet, it just shrugs it off and continues as if nothing happened. The problem is, sometimes getting receive errors mean the tunnel is completely stuck and will not recover on its own. In that case, the node will become unreachable - possibly indefinitely. The goal of this commit is to improve SPTPS error handling by taking proactive action when an incoming packet triggers a failure, which is often an indicator that the tunnel is stuck in some way. When that happens, we simply restart SPTPS entirely, which should make the tunnel recover quickly. To prevent "storms" where two buggy nodes flood each other with invalid packets and therefore spend all their time negotiating new tunnels, we limit the frequency at which tunnel restarts happen to ten seconds. It is likely this commit will solve the "Invalid KEX record length during key regeneration" issue that has been seen in the wild. It is difficult to be sure though because we do not have a full understanding of all the possible conditions that can trigger this problem.	2015-05-17 19:21:50 +01:00
Etienne Dechamps	aa52300b2b	Trivial: make sptps_receive_data_datagram() a little more readable. The new code updates variables as stuff is being consumed, so that the reader doesn't have to do that in his head.	2015-05-17 17:52:15 +01:00
Guus Sliepen	30e839b0a1	Don't send local_address in ADD_EDGE messages if it's AF_UNSPEC.	2015-05-17 18:44:09 +02:00
Sven-Haegar Koch	23fda4db6d	Let sockaddr2hostname() handle AF_UNSPEC addresses.	2015-05-17 18:43:34 +02:00
Etienne Dechamps	1e89a63f16	Prevent SPTPS key regeneration packets from entering an UDP relay path. Commit `10c1f60c64` introduced a mechanism by which a packet received by REQ_KEY could continue its journey over UDP. This was based on the assumption that REQ_KEY messages would never be used for handshake packets (which should never be sent over UDP, because SPTPS currently doesn't handle lost handshake packets very well). Unfortunately, there is one case where handshake packets are sent using REQ_KEY: when regenerating the SPTPS key for a pre-established channel. With the current code, such packets risk getting relayed over UDP. When processing a REQ_KEY message, it is impossible for the receiving end to distinguish between a data SPTPS packet and a handshake packet, because this information is stored in the type field which is encrypted with the end-to-end key. This commit fixes the issue by making tinc use ANS_KEY for all SPTPS handshake messages. This works because ANS_KEY messages are never forwarded using the SPTPS relay mechanisms, therefore they are guaranteed to stick to TCP.	2015-05-17 17:09:56 +01:00
Guus Sliepen	eecfeadeb4	Let sockaddr2str() handle AF_UNSPEC addresses.	2015-05-16 02:01:54 +02:00
Guus Sliepen	613c121cdc	Try all addresses for the hostname in an invitation URL.	2015-05-15 23:35:46 +02:00
Guus Sliepen	54a8bd78e3	Be more liberal accepting ADD_EDGE messages with conflicting local address information. If the ADD_EDGE is for one of the edges we own, and if it is not the same as we actually have, send a correcting ADD_EDGE back. Otherwise, if the ADD_EDGE contains new information, update our idea of the local address for that edge. If the ADD_EDGE does not contain local address information, then we never make a correction nor log a warning.	2015-05-15 23:08:53 +02:00
Guus Sliepen	8028e01100	Use AF_UNSPEC instead of AF_UNKNOWN for unspecified local address in add_edge_h(). AF_UNKNOWN is reserved for valid addresses that the local node cannot parse, but remote nodes possibly can.	2015-05-15 23:01:06 +02:00
Guus Sliepen	fd1cff6df2	Fix receiving UDP packets from tinc 1.0.x nodes. In try_mac(), the wrong offsets were used into the packet buffer, causing the digest verification to always fail.	2015-05-15 00:21:48 +02:00
Guus Sliepen	44e9f1e1d8	Fix invitations. These were broken due to a change in behaviour of sptps_receive_data() introduced in commit `d237efd325`.	2015-05-13 14:28:28 +02:00
Etienne Dechamps	7e6b2dd1ea	Introduce raw TCP SPTPS packet transport. Currently, SPTPS packets are transported over TCP metaconnections using extended REQ_KEY requests, in order for the packets to pass through tinc-1.0 nodes unaltered. Unfortunately, this method presents two significant downsides: - An already encrypted SPTPS packet is decrypted and then encrypted again every time it passes through a node, since it is transported over the SPTPS channels of the metaconnections. This double-encryption is unnecessary and wastes CPU cycles. - More importantly, the only way to transport binary data over standard metaconnection messages such as REQ_KEY is to encode it in base64, which has a 33% encoding overhead. This wastes 25% of the network bandwidth. This commit introduces a new protocol message, SPTPS_PACKET, which can be used to transport SPTPS packets over a TCP metaconnection in an efficient way. The new message is appropriately protected through a minor protocol version increment, and extended REQ_KEY messages are still used with nodes that do not support the new message, as well as for the intial handshake packets, for which efficiency is not a concern. The way SPTPS_PACKET works is very similar to how the traditional PACKET message works: after the SPTPS_PACKET message, the raw binary packet is sent directly over the metaconnection. There is one important difference, however: in the case of SPTPS_PACKET, the packet is sent directly over the TCP stream completely bypassing the SPTPS channel of the metaconnection itself for maximum efficiency. This is secure because the SPTPS packet that is being sent is already encrypted with an end-to-end key.	2015-05-10 21:08:57 +01:00
Etienne Dechamps	d237efd325	Only read one record at a time in sptps_receive_data(). sptps_receive_data() always consumes the entire buffer passed to it, which is somewhat inflexible. This commit improves the interface so that sptps_receive_data() consumes at most one record. The goal is to allow non-SPTPS stuff to be interleaved with SPTPS records in a single TCP stream.	2015-05-10 21:08:57 +01:00
Etienne Dechamps	de14308840	Rename REQ_SPTPS to SPTPS_PACKET. REQ_SPTPS implies the message has an ANS_ counterpart (like REQ_KEY, ANS_KEY), but it doesn't. Therefore dropping the REQ_ seems more appropriate, and we add a _PACKET suffix to reduce the likelihood of naming conflicts.	2015-05-10 21:08:57 +01:00
Etienne Dechamps	10c1f60c64	Try to use UDP to relay SPTPS packets received over TCP. Currently, when tinc receives a SPTPS packet over TCP via the REQ_KEY encapsulation mechanism, it forwards it like any other TCP request. This is inefficient, because even though we received the packet over TCP, we might have an UDP link with the next hop, which means the packet could be sent over UDP. This commit removes that limitation by making sure SPTPS data packets received through REQ_KEY requests are not forwarded as-is but passed to send_sptps_data() instead, thereby using the same code path as if the packet was received over UDP.	2015-05-10 21:08:57 +01:00
Etienne Dechamps	1296f715b5	Expose the raw SPTPS send interface from net_packet. net_packet doesn't actually use send_sptps_data(); it only uses send_sptps_data_priv(). In addition, the only user of send_sptps_data() is protocol_key. Therefore it makes sense to expose send_sptps_data_priv() directly, and move send_sptps_data() (which is basically just boilerplate) as a local function in protocol_key.	2015-05-10 21:08:57 +01:00
Etienne Dechamps	8e43a2fc74	Use the correct originator node when relaying SPTPS UDP packets. Currently, when relaying SPTPS UDP packets, the code uses the direct sender as the originator, instead of preserving the original source ID. This wouldn't cause any issues in most cases because the originator and the sender are the same in simple one-hop relay chains, but this will break as soon as there is more than one relay.	2015-05-10 18:46:47 +01:00
Etienne Dechamps	9d223cb7e7	When relaying, send probes to the destination, not the source. This seems to be a typo from `c23e50385d`. Achievement unlocked: got a one-line commit wrong.	2015-05-10 18:37:30 +01:00
Etienne Dechamps	13f9bc1ff1	Add support for out-of-tree ("VPATH") builds. This fixes some issues with the build system when building out of tree. With this commit, it is now possible to do the following: $ cd /tmp/build $ /path/to/tinc/configure $ make	2015-05-09 16:41:48 +01:00
Etienne Dechamps	462e9892ae	Remove explicit distribution rules for m4 scripts. It turns out Automake is smart enough to include these files in the distribution by itself.	2015-05-09 16:17:39 +01:00
Guus Sliepen	362b791764	Really remove "release-" from the git-derived version string.	2015-05-09 15:41:37 +02:00
Etienne Dechamps	b109e8b164	Use git describe to populate autoconf's VERSION. This uses the output of "git describe" directly in configure.ac to determine the version number to use, instead of hardcoding it. With this change, current version information is completely removed from the codebase itself, and is always fetched on-the-fly from git as the single source of truth. In order to ensure make dist always uses the current version number in the contents of the packaged configure script as well as the package name, a dependency is added to the dist target such that autoconf is always run before dist to regenerate the version number. If this wasn't the case, make dist would use the version number from when autoconf was originally run, not the version number that make dist is running from. That said, errors from that rule are ignored so that people can still run make dist without a working autoconf. In addition, the NEWS check is dropped, as it would then become annoying because it would force make dist users to always have a line for the current commit in the NEWS file.	2015-05-09 12:14:31 +01:00
Pierre Emeriaud	1c77069064	Fix typo in tincctl help.	2015-05-09 00:03:51 +02:00
Guus Sliepen	54554cc276	Don't include build-time generated version_git.h in the tarball.	2015-05-05 23:05:22 +02:00
Guus Sliepen	c46bdbde18	Remove "release-" from displayed git version. Also make sure that version_git.h is only written to if the "git describe" command succeeds.	2015-05-05 23:03:41 +02:00
Etienne Dechamps	120e0567cb	Use git description as the tinc version. Instead of using the hardcoded version number in configure.ac, this makes tinc use the live version reported by "git describe", queried on-the-fly during the build process and regenerated for every build. This makes tinc version output more useful, as tinc will now display the number of commits since the last tag as well as the commit the binary is built from, following the format described in git-describe(1). Here's an example of tincd --version output: tinc version release-1.1pre10-48-gc149315 (built Jun 29 2014 15:21:10, protocol 17.3) When building directly from a release tag, this will look like the following: tinc version release-1.1pre10 (built Jun 29 2014 15:21:10, protocol 17.3) (Note that the format is slightly different - because of the way the tags are named, it says "release-1.1pre10" instead of just "1.1pre10") If git describe fails (for example when building from a release tarball), the build automatically falls back to the autoconf-provided VERSION macro (i.e. the old behavior).	2015-05-04 21:38:23 +01:00
Guus Sliepen	95594f4738	Fix typo `0fda572c88` that prevented some errors from being logged.	2015-04-24 23:51:29 +02:00
Guus Sliepen	0fda572c88	Don't log an error message when receiving a TERMREQ.	2015-04-24 23:43:58 +02:00
Guus Sliepen	ea1e815223	Fix a possible segmentation fault during key upgrades. read_rsa_public_key() was bailing out early if the given node already has an Ed25519 key, and returned true even though c->rsa was NULL. The early bailout code isn't necessary anymore, so just remove it.	2015-04-24 23:43:19 +02:00
Guus Sliepen	2059814238	Allow one-sided upgrades to Ed25519. This deals with the case where one node knows the Ed25519 key of another node, but not the other way around. This was blocked by an overly paranoid check in id_h(). The upgrade_h() function already handled this case, and the node that already knows the other's Ed25519 key checks that it has not been changed, otherwise the connection will be aborted.	2015-04-24 23:40:20 +02:00
Guus Sliepen	3def9d2ad8	Merge remote-tracking branch 'dechamps/wintapver' into 1.1	2015-04-12 15:43:05 +02:00
Guus Sliepen	95921696a4	Always call res_init() before getaddrinfo(). Unfortunately, glibc assumes that /etc/resolv.conf is a static file that never changes. Even on servers, /etc/resolv.conf might be a dynamically generated file, and we never know when it changes. So just call res_init() every time, so glibc uses up-to-date nameserver information. Conflicts: src/have.h src/net.c src/net_setup.c	2015-04-12 15:42:48 +02:00
Guus Sliepen	f500a3d4e6	Merge remote-tracking branch 'dechamps/windevice' into 1.1	2015-04-12 15:36:50 +02:00
Guus Sliepen	417981462a	Merge remote-tracking branch 'dechamps/winmtu' into 1.1	2015-04-12 15:35:50 +02:00
Guus Sliepen	11effab85b	Merge remote-tracking branch 'dechamps/fsckwin' into 1.1	2015-04-12 15:35:37 +02:00
Guus Sliepen	9e71b74ed8	Merge remote-tracking branch 'dechamps/staticfix' into 1.1	2015-04-12 15:34:50 +02:00
Etienne Dechamps	0c010ff9fe	Warn about performance if using TAP-Windows >=9.21. Testing has revealed that the newer series of Windows TAP drivers (i.e. 9.0.0.21 and later, also known as NDIS6, tap-windows6) suffer from serious performance issues in the write path. Write operations seems to take a very long time to complete, resulting in massive packet loss even for throughputs as low as 10 Mbit/s. I've made some attempts to alleviate the problem using parellelism. By using custom code that allows up to 256 write operations at the same time the results are much better, but it's still about 2 times worse than the traditional 9.0.0.9 driver. We need to investigate more and file a bug against tap-windows6, but in the mean time, let's inform the user that he might not want to use the latest drivers.	2015-03-15 18:37:58 +00:00
Etienne Dechamps	0f328d9d28	Log TAP-Windows driver version on startup. This is generally useful. We've seen issues that are specific to some version of these drivers (especially the newer 9.0.0.21 version), so it's relevant to log it, especially since that means it will be copy-pasted by people posting their logs asking for help.	2015-03-15 18:36:37 +00:00
Etienne Dechamps	7027bba541	Increase the ReplayWindow default from 16 to 32. As a rule, it seems reasonable to make sure that tinc operates correctly on at least 1G links, since these are pretty common. However, I have observed replay window issues when operating at speeds of 600 Mbit/s and above, especially when the receiving end is a Windows system (not sure why). This commit increases the default so that this won't occur on fresh setups.	2015-03-15 18:04:58 +00:00
Etienne Dechamps	94f49a163a	Set the default for UDPRcvBuf and UDPSndBuf to 1M. It may not be obvious, but due to the way tinc operates (single-threaded control loop with no intermediate packet buffer), UDP send and receive buffers can have a massive impact on performance. It is therefore of paramount importance that the buffers be large enough to prevent packet drops that could occur while tinc is processing a packet. Leaving that value to the OS default could be reasonable if we weren't relying on it so much. Instead, this makes performance somewhat unpredictable. In practice, the worst case scenario occurs on Windows, where Microsoft had the brillant idea of making the buffers 8K in size by default, no matter what the link speed is. Considering that 8K flies past in a matter of microseconds on >1G links, this is extremely inappropriate. On these systems, changing the buffer size to 1M results in obscene raw throughput improvements; I have observed a 10X jump from 40 Mbit/s to 400 Mbit/s on my system. In this commit, we stop trusting the OS to get this right and we use a fixed 1M value instead, which should be enough for <=1G links.	2015-03-15 18:04:55 +00:00
Etienne Dechamps	89715454c0	Fix Windows device asynchronous write behavior. Write operations to the Windows device do not necessarily complete immediately; in fact, with the latest TAP-Win32 drivers, this never seems to be the case. write_packet() does not handle that case correctly, because the OVERLAPPED structure and the packet data go out of scope before the write operation completes, resulting in race conditions. This commit fixes the issue by making sure these data structures are kept in global scope, and by dropping any packets that may arrive while the previous write operation is still pending.	2015-03-15 10:34:40 +00:00
Etienne Dechamps	675142c7d8	When disabling the Windows device, wait for pending reads to complete. On Windows, when disabling the device, tinc uses the CancelIo() to cancel the pending read operation, and then proceeds to delete the event handle immediately. This assumes that CancelIo() blocks until the pending read request is completely torn down and no references to it remain. While MSDN is not completely clear on that subject, it does suggest that this is not the case: http://msdn.microsoft.com/en-us/library/windows/desktop/aa363791.aspx If the function succeeds [...] the cancel operation for all pending I/O operations issued by the calling thread for the specified file handle was successfully requested. This implies that cancellation was merely "requested", and that there are no guarantees as to the state of the operation when CancelIo() returns. Therefore, care must be taken not to close event handles prematurely. While I'm no aware of this potential race condition causing any problems in practice, I don't want to take any chances.	2015-03-15 10:32:18 +00:00
Etienne Dechamps	176ee01526	Make sure packet header structures are correctly packed on Windows. Modern versions of GCC handle structure packing differently when compiling for Windows, as reported in the following GCC bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52991 In practice, this affects tinc because it uses packed structs as a convenient way to populate packet headers. "struct ip" is especially affected - on Linux, sizeof(struct ip) returns 20 as expected, while on Windows, it returns 24 because of the broken alignment. This in turn completely breaks code that has to populate an IP header. Specifically, this breaks route_ipv4_unreachable() which is responsible, among other things, for the generation of ICMP Fragmentation Needed messages. On Windows, these messages are corrupted beyond hope because of this alignment issue. For TCP connections that are established before tinc obtains a fix on the MTU (and thus are not MSS clamped), this can result in massive disruption. This commit fixes the issue by forcing GCC to use standard alignment for all packed structures in the tinc codebase instead of the MSVC alignment.	2015-03-15 10:12:18 +00:00
Etienne Dechamps	43b41e9095	Fix HAVE_DECL_RES_INIT conditionals. HAVE_DECL_RES_INIT is generated using AC_CHECK_DECLS. tinc checks this symbol using #ifdef, which is wrong because (according to autoconf docs) the symbol is always defined, it's just set to zero if the check failed. This broke the Windows build starting from `0b310bf406`, because it introduced this conditional in code that's not excluded from the Windows build.	2015-03-14 16:22:26 +00:00
Etienne Dechamps	4989362300	Fix invalid getuid() call on Windows. This is breaking the Windows build. Regression was introduced in `268e3ffca7`.	2015-03-14 16:07:54 +00:00
Etienne Dechamps	fa432426df	Don't send UDP probes past static relays. Ironically, commit `0f8e2cc78c` introduced a regression on its own, since it accidently removed a return statement that prevented try_tx_sptps() from sending UDP/MTU probes to nodes that are past static relays.	2015-03-14 14:04:50 +00:00
Etienne Dechamps	76a9be5bce	Throttle the rate of MTU_INFO messages. This makes sure MTU_INFO messages are only sent at the maximum rate of 5 per second (by default). As usual with these "probe" mechanisms, the rate of these messages cannot be higher than the rate of data packets themselves, since they are sent from the RX path.	2015-03-14 13:39:05 +00:00
Etienne Dechamps	467397f25d	Throttle the rate of UDP_INFO messages. This makes sure UDP_INFO messages are only sent at the maximum rate of 5 per second (by default). As usual with these "probe" mechanisms, the rate of these messages cannot be higher than the rate of data packets themselves, since they are sent from the RX path.	2015-03-14 13:39:05 +00:00
Etienne Dechamps	b1421b9190	Add MTU_INFO protocol message. In this commit, nodes use MTU_INFO messages to provide MTU information. The issue this code is meant to address is the non-trivial problem of finding the proper MTU when UDP SPTPS relays are involved. Currently, tinc has no idea what the MTU looks like beyond the first relay, and will arbitrarily use the first relay's MTU as the limit. This will fail miserably if the MTU decreases after the first relay, forcing relays to fall back to TCP. More generally, one should keep in mind that relay paths can be arbitrarily complex, resulting in packets taking "epic journeys" through the graph, switching back and forth between UDP (with variable MTUs) and TCP multiple times along the path. A solution that was considered consists in sending standard MTU probes through the relays. This is inefficient (if there are 3 nodes on one side of relay and 3 nodes on the other side, we end up with 3*3=9 MTU discoveries taking place at the same time, while technically only 3+3=6 are needed) and would involve eyebrow-raising behaviors such as probes being sent over TCP. This commit implements an alternative solution, which consists in the packet receiver sending MTU_INFO messages to the packet sender. The message contains an MTU value which is set to maximum when the message is originally sent. The message gets altered as it travels through the metagraph, such that when the message arrives to the destination, the MTU value contained in the message can be used to send packets while making sure no relays will be forced to fall back to TCP to deliver them. The operating principles behind such a protocol message are similar to how the UDP_INFO message works, but there is a key difference that prevents us from simply reusing the same message: the UDP_INFO message only cares about relay-to-relay links (i.e. it is sent between static relays and the information it contains only makes sense between two adjacent static relays), while the MTU_INFO cares about the end-to-end MTU, including the entire relay path. Therefore, UDP_INFO messages stop when they encounter static relays, while MTU_INFO messages don't stop until they get to the original packet sender. Note that, technically, the MTU that is obtained through this mechanism can be slightly pessimistic, because it can be lowered by an intermediate node that is not being used as a relay. Since nodes have no way of knowing whether they'll be used as dynamic relays or not (and have no say in the matter), this is not a trivial problem. That said, this is highly unlikely to result in noticeable issues in realistic scenarios.	2015-03-14 13:39:05 +00:00
Etienne Dechamps	9bb230f30f	Add UDP_INFO protocol message. In this commit, nodes use UDP_INFO messages to provide UDP address information. The basic principle is that the node that receives packets sends UDP_INFO messages to the node that's sending the packets. The message originally contains no address information, and is (hopefully) updated with relevant address information as it gets relayed through the metagraph - specifically, each intermediate node will update the message with its best guess as to what the address is while forwarding it. When a node receives an UDP_INFO message, and it doesn't have a confirmed UDP tunnel with the originator node, it will update its records with the new address for that node, so that it always has the best possible guess as to how to reach that node. This applies to the destination node of course, but also to any intermediate nodes, because there's no reason they should pass on the free intel, and because it results in nice behavior in the presence of relay chains (multiple nodes in a path all trying to reach the same destination). If, on the other hand, the node does have a confirmed UDP tunnel, it will ignore the address information contained in the message. In all cases, if the node that receives the message is not the destination node specified in the message, it will forward the message but not before overriding the address information with the one from its own records. If the node has a confirmed UDP tunnel, that means the message is updated with the address of the confirmed tunnel; if not, the message simply reflects the records of the intermediate node, which just happen to be the contents of the UDP_INFO message it just got, so it's simply forwarded with no modification. This is similar to the way ANS_KEY messages are currently overloaded to provide UDP address information, with two differences: - UDP_INFO messages are sent way more often than ANS_KEY messages, thereby keeping the address information fresh. Previously, if the UDP situation were to change after the ANS_KEY message was sent, the sender would virtually never get the updated information. - Once a node puts address information in an ANS_KEY message, it is never changed again as the message travels through the metagraph; in contrast, UDP_INFO messages behave the opposite way, as they get rewritten every time they travel through a node with a confirmed UDP tunnel. The latter behavior seems more appropriate because UDP tunnel information becomes more relevant as it moves closer to the destination node. The ANS_KEY behavior is not satisfactory in some cases such as multi-layered graphs where the first hop is located before a NAT. Ultimately, the rationale behind this whole process is to improve UDP hole punching capabilities when port translation is in effect, and more generally, to make tinc more reliable in (very) hostile network conditions (such as multi-layered NAT).	2015-03-14 13:39:05 +00:00

1 2 3 4 5 ...

2589 commits