On Windows, the event loop io tree uses the Windows Event handle to
differentiate between io_t objects. Unfortunately, there is a bug in
the io_add_event() function (introduced in
2f9a1d4ab5) as it sets the event after
inserting the object into the tree, resulting in objects appearing in
io_tree out of order.
This can lead to crashes on Windows as the event loop is unable to
determine which events fired.
Setting the Port configuration variable to zero can be used to make tinc
listen on a system-assigned port. Unfortunately, in this scenario myport
will be zero, which means that tinc won't transmit its actual UDP
listening port to other nodes. This breaks UDP hole punching and local
discovery.
Commit 611217c96e introduced a regression
because it accidentally reordered the timeout handler calls and the
fdset setup code. This means that any io_add(), io_del() or io_set()
calls in timeout handlers would be ignored in the current event loop
iteration, resulting in erratic behavior.
The most visible symptom is when a metaconnection timeout occurs and the
connection is closed; the timeout handler closes the socket but it still
ends up in the select() call, typically resulting in the following
crash:
Error while waiting for input: Bad file descriptor
Currently we don't do any shortening on IPv6 addresses (aside from
removing trailing zeroes) before printing them. This commit makes
textual addresses smaller by shortening them according to the rules
described in RFC 5952. This is also the canonical textual representation
for IPv6 addresses, thus making them easier to compare.
This commit suppresses subnet prefix length output (/xx) for subnets
that only contain one address (/32 for IPv4, /128 for IPv6). It also
suppresses weight information if the subnet is using the default
weight. This improves readability of net2str() output in the majority
of cases.
tinc currently prints MAC addresses without trailing zeroes, for example:
1:2:3:4:5:6
This looks weird and is inconsistent with how MAC addresses are
displayed everywhere else. This commit adds trailing zeroes, so the
above address will be printed as the following:
01:02:03:04:05:06
This is a complete rewrite of the str2net() function. Besides
refactoring duplicate code, this new code brings the following fixes
and improvements:
- Fixes handling of leading/trailing double colon in IPv6 addresses.
For example, with the previous code the address
2001:0db8:85a3:0000:0000:8a2e:0370:: is interpreted as a MAC address,
and ::0db8:85a3:0000:0000:8a2e:0370:7334 is rejected.
- Catches more invalid cases, such as garbage at the end of the string.
- Adds support for dotted quad notation in IPv6 (e.g. ::1.2.3.4).
See RFC 4291, section 2.2 for details on the textual format of IPv6
addresses.
Instead of using a hardcoded version number in configure.ac, this makes
tinc use the live version reported by "git describe", queried on-the-fly
during the build process and regenerated for every build.
This provides several advantages:
- Less redundancy: git is now the source of truth for version
information, no need to store it in the repository itself.
- Simpler release process: just creating a git tag automatically
updates the version. No need to change files.
- More useful version information: tinc will now display the number of
commits since the last tag as well as the commit the binary is built
from, following the format described in git-describe(1).
Here's an example of tincd --version output:
tinc version release-1.1pre10-48-gc149315 (built Jun 29 2014 15:21:10, protocol 17.3)
When building directly from a release tag, this would like the following:
tinc version release-1.1pre10 (built Jun 29 2014 15:21:10, protocol 17.3)
(Note that the format is slightly different - because of the way the
tags are named, it says "release-1.1pre10" instead of just "1.1pre10")
This prevents the date and time shown in version information from
getting stale because of partial builds. With these changes, date and
time information is written to a dedicated object file that gets rebuilt
every time make is run, even if there are no changes.
This adds a new option, BroadcastSubnet, that allows the user to
declare broadcast subnets, i.e. subnets which are considered broadcast
addresses by the tinc routing layer. Previously only the global IPv4
and IPv6 broadcast addresses were supported by virtue of being
hardcoded.
This is useful when using tinc in router mode with Ethernet virtual
devices, as it can be used to provide broadcast support for a local
broadcast address (e.g. 10.42.255.255) instead of just the global
address (255.255.255.255).
This is implemented by removing hardcoded broadcast addresses and
introducing "broadcast subnets", which are subnets with a NULL owner.
By default, behavior is unchanged; this is accomplished by adding
the global broadcast addresses for Ethernet, IPv4 and IPv6 at start
time.
Implementation of sptps_verify_datagram() was left as a TODO. This
causes problems when using SPTPS in tinc, because this function is
used in try_mac(), which itself is used in try_harder() to locate
nodes sending UDP packets from unexpected addresses. In the current
state this function always returns true, resulting in UDP addresses
of random nodes getting changed which makes UDP communication
fragile and unreliable. In addition, this makes UDP communication
impossible through port translation and local discovery.
This commit adds the missing implementation, thus fixing the issue.
Recent improvements to the local discovery mechanism makes it cheaper,
more network-friendly, and now it cannot make things worse (as opposed
to the old mechanism). Thus there is no reason not to enable it by
default.
The new local address based local discovery mechanism is technically
superior to the old broadcast-based one. In fact, the old algorithm
can technically make things worse by e.g. sending broadcasts over the
VPN itself and then selecting the VPN address as the node's UDP
address. This cannot happen with the new mechanism.
Note that this means old nodes that don't send their local addresses in
ADD_EDGE messages can't be discovered, because there is no address to
send discovery packets to. Old nodes can still discover new nodes by
sending them broadcasts, though.
This introduces a new way of doing local discovery: when tinc has
local address information for the recipient node, it will send local
discovery packets directly to the local address of that node, instead
of using broadcast packets.
This new way of doing local discovery provides numerous advantages compared to
using broadcasts:
- No broadcast packets "polluting" the local network;
- Reliable even if the sending host has multiple network interfaces (in
contrast, broadcasts will only be sent through one unpredictable
interface)
- Works even if the two hosts are not on the same broadcast domain. One
example is a large LAN where the two hosts might be on different local
subnets. In fact, thanks to UDP hole punching this might even work if
there is a NAT sitting in the middle of the LAN between the two nodes!
- Sometimes a node is reachable through its "normal" address, and via a
local subnet as well. One might think the local subnet is the best route
to the node in this case, but more often than not it's actually worse -
one example is where the local segment is a third party VPN running in
parallel, or ironically it can be the local segment formed by the tinc
VPN itself! Because this new algorithm only checks the addresses for
which an edge is already established, it is less likely to fall into
these traps.
In addition to the remote address, each edge now stores the local address from
the point of view of the "from" node. This information is then made available
to other nodes through a backwards-compatible extension to ADD_EDGE messages.
This information can be used in future code to improve packet routing.
tinc is using a separate thread to read from the TAP device on Windows.
The rationale was that the notification mechanism for packets arriving
on the virtual network device is based on Win32 events, and the event
loop did not support listening to these events.
Thanks to recent improvements, this event loop limitation has been
lifted. Therefore we can get rid of the separate thread and simply add
the Win32 "incoming packet" event to the event loop, just like a socket.
The result is cleaner code that's easier to reason about.
Currently, when the tinc service handler callback (which runs in a
separate thread) receives a service shutdown request, it calls
event_exit() to request the event loop to exit.
This approach has a few issues:
- The event loop will only notice the exit request when the next event
fires. This slows down tinc service shutdown. In some extreme cases
(DeviceStandby enabled, long PingTimeout and no connections),
shutdown can take ages.
- Strictly speaking, because of the absence of memory barriers, there
is no guarantee that the event loop will even notice an exit request
coming from another thread. I suppose marking the "running" variable
as "volatile" is supposed to alleviate that, but it's unclear whether
that provides any guarantees with modern systems and compilers.
This commit fixes the issue by leveraging the new event loop Windows
interface, using a custom Windows event that is manually set when
shutdown is requested.
This commit changes the event loop to use WSAEventSelect() and
WSAWaitForMultipleEvents() on Windows. This paves the way for making the
event loop more flexible on Windows by introducing the required
infrastructure to make the event loop wait on any Win32 event.
This commit only affects the internal implementation of the event
module. Externally visible behavior remains strictly unchanged (for
now).
Commit 86a99c6b99 changed the way we
handle connection events to protect against spurious event loop
callbacks. Unfortunately, it turns out that calling connect() twice on
the same socket results in different behaviors depending on the platform
(even though it seems well defined in POSIX). On Windows this resulted
in the connection handling code being unable to react to connection
errors (such as connection refused), always hitting the timeout; on
Linux this resulted in spurious error messages about connect() returning
success.
In POSIX and on Linux, using connect() on a socket where the previous
attempt failed will attempt to connect again, resulting in unnecessary
network activity. Using getsockopt(SO_ERROR) before connect() solves
that, but introduces a race condition if a connection failure happens
between the two calls.
For this reason, this commit switches from connect() to a zero-sized
send() call, which is more consistent (though not completely, see the
truth table in the comments) and simpler to use for that purpose. Note
that Windows explictly support empty send() calls; POSIX says nothing
on the subject, but testing shows it works at least on Linux.
(Surprisingly enough, Windows seems more POSIX-compliant than Linux on
this one!)
The event loop does not guarantee that spurious write I/O events do not
happen; in fact, they are guaranteed to happen on Windows when
event_flush_output() is called. Because handle_meta_io() does not check
for spurious events, a metaconnection socket might appear connected even
though it's not, and will fail immediately when sending the ID request.
This commit fixes this issue by making handle_meta_io() check the
connection status before assuming the socket is connected. It seems that
the only reliable way to do that is to try to call connect() again and
look at the error code, which will be EISCONN if the socket is
connected, or EALREADY if it's not.
When using socket functions, "sockerrno" is supposed to be used to
retrieve the error code as opposed to "errno", so that it is translated
to the correct call on Windows (WSAGetLastError() - Windows does not
update errno on socket errors). Unfortunately, the use of sockerrno is
inconsistent throughout the tinc codebase, as errno is often used
incorrectly on socket-related calls.
This commit fixes these oversights, which improves socket error
handling on Windows.
These Windows include lines are capitalized, which causes the build to fail
when cross-compiling from Linux to Windows using MinGW as the MinGW headers
are entirely lower case.
Besides controlling when tinc-up and tinc-down get called, this commit makes
DeviceStandby control when the virtual network interface "cable" is "plugged"
on Windows. This is more user-friendly as the status of the tinc network can
be seen just by looking at the state of the network interface, and it makes
Windows behave better when isolated.
This adds a new DeviceStandby option; when it is disabled (the default),
behavior is unchanged. If it is enabled, tinc-up will not be called during
tinc initialization, but will instead be deferred until the first node is
reachable, and it will be closed as soon as no nodes are reachable.
This is useful because it means the device won't be set up until we are fairly
sure there is something listening on the other side. This is more user-friendly,
as one can check on the status of the tinc network connection just by checking
the status of the network interface. Besides, it prevents the OS from thinking
it is connected to some network when it is in fact completely isolated.
In send_sptps_data(), the len variable contains the length of the whole
datagram that needs to be sent to the peer, including the overhead from SPTPS
itself.
When tinc runs the graph algorithms and updates the nexthop and via pointers,
it uses a breadth-first search, but it can sometimes revisit nodes that have
already been visited if the previous path is marked as being indirect, and
there is a longer path that is "direct". The via pointer should be updated in
this case, because this points to the closest hop to the destination that can
be reached directly. However, the nexthop pointer should not be updated.
This fixes a bug where there could potentially be a routing loop if a node in
the graph has an edge with the indirect flag set, and some other edge without
that flag, the indirect edge is part of the minimum spanning tree, and a
broadcast packet is being sent.
The main reason to switch from AES-256-GCM to ChaCha-Poly1305 is to remove a
dependency on OpenSSL, whose behaviour of the AES-256-GCM decryption function
changes between versions. The source code for ChaCha-Pol1305 is small and in
the public domain, and can therefore be easily included in tinc itself.
Moreover, it is very fast even without using any optimized assembler, easily
outperforming AES-256-GCM on platforms that don't have special AES instructions
in hardware.
This uses the portable Ed25519 library made by Orson Peters, which in turn uses
the reference implementation made by Daniel J. Bernstein.
This implementation also allows Ed25519 keys to be used for key exchange, so
there is no need to add a separate implementation of Curve25519.
- Try to prevent SIGPIPE from being sent for errors sending to the control
socket. We don't outright block the SIGPIPE signal because we still want the
tinc CLI to exit when its output is actually sent to a real (broken) pipe.
- Don't call exit() from top(), and properly detect when the control socket is
closed by the tincd.
Before, the tapreader thread would just exit immediately after encountering the
first error, without notifying the main thread. Now, the tapreader thead never
exits itself, but tells the main thread to stop when more than ten errors are
encountered in a row.
Before, when making a meta-connection to a node (either because of a ConnectTo
or because AutoConnect is set), tinc required one or more Address statements
in the corresponding host config file. However, tinc learns addresses from
other nodes that it uses for UDP connections. We can use those just as well for
TCP connections.
When creating invitations or using them to join a VPN, and the tinc command is
not run interactively (ie, when stdin and stdout are not connected or
redirected to/from a file), don't ask questions. If normally tinc would ask for
a confirmation, just assume the default answer instead. If tinc really needs
some input, just print an error message instead.
In case an invitation is used for a VPN which uses a netname that is already in
use on the local host, tinc will store the configuration in a temporary
directory. Normally it asks for an alternative netname and then renames the
temporary directory, but when not run interactively, it now just prints the
location of the unchanged temporary directory.
ListenAddress works the same as BindToAddress, except that from now on,
explicitly binding outgoing packets to the address of a socket is only done for
sockets specified with BindToAddress.
If the Port statement is not used, there are two other ways to let tinc listen
on a non-default port: either by specifying one or more BindToAddress
statements including port numbers, or by starting it from systemd with socket
activation. Tinc announces its own port to other nodes, but before it only
announced what was set using the Port statement.
The restriction of accepting only 1 connection per second from a single address
is a bit too much, especially if one wants to join a VPN using an invitation,
which requires two connections.
It now defers reading from stdin until after the authentication phase is
completed. Furthermore, it supports the -q, -r, -w options similar to those of
Jürgen Nickelsen's socket.
The tinc utility defered calling WSAStartup() until it tried to connect to a
running tinc daemon. However, socket functions are now also used for other
things (like joining another VPN using an invitation). Now we just
unconditionally call WSAStartup() early in main().
It seems like a lot of overhead to call access() for every possible extension
defined in PATHEXT, but apparently this is what Windows does itself too. At
least this avoids calling system() when the script one is looking for does not
exist at all.
Since the tinc utility also needs to call scripts, execute_script() is now
split off into its own source file.
Since filenames could potentially leak to unprivileged users (for example,
because of locatedb), it should not contain the cookie used for invitations.
Instead, tinc now uses the hash of the cookie and the invitation key as the
filename to store pending invitations in.
Commit cff5a84 removed the feature of binding outgoing TCP sockets to a local
address. We now call bind() again, but only if there is exactly one listening
socket with the same address family as the destination address of the outgoing
socket.
The order in which tinc initialized things was not completely correct. Now, it
is done as follows:
- Load and parse configuration files.
- Create all TCP and UDP listening sockets.
- Create PID file and UNIX socket.
- Run the tinc-up script.
- Drop privileges.
- Start outgoing connections.
- Run the main loop.
The PID file can only be created correctly if the listening sockets have been
set up ,as it includes the address and port of the first listening socket. The
tinc-up script has to be run after the PID file and UNIX socket have been
created so it can change their permissions if necessary. Outgoing connections
should only be started right before the main loop, because this is not really
part of the initialization.
The PID file was created before tinc-up was called, but the UNIX socket was
created afterwards, which meant one could not change the UNIX socket's owner or
permissions from the tinc-up script.
Automake finds the files in the subdirectories of src/ now that they are
properly declared in the _SOURCES variables. Using EXTRA_DIST would now cause
.o files to be included in the tarball.
When reloading the configuration file via the tinc command, the user will get
an error message if reloading has failed. However, no such warning exists when
sending a HUP signal. Previously, tincd would exit in both cases, but with a
zero exit code. Now it will exit with code 1 when reloading fails after a
SIGHUP, but tincd will keep running if it is signaled via the tinc command.
Instead, the tinc command will exit with a non-zero exit code.
The retry() function would only abort connections that were in progress of
being made, it wouldn't reschedule the outgoing connections that had been
sleeping.
As mentioned by Erik Tews, calling fchmod() after fopen() leaves a small window
for exploits. As long as tinc is single-threaded, we can use umask() instead to
reduce file permissions. This also works when creating the AF_UNIX control socket.
The umask of the user running tinc(d) is used for most files, except for the
private keys, invitation files, PID file and control socket.
In case no explicit netname of configuration directory is specified when
accepting an invitation, the netname specified in the invitation data is
used. However, this new netname is only known after making the connection
to the server. If the new netname conflicts with an existing one at the
client, we ask the user for a netname that doesn't conflict. However, we
should first finish accepting the invitation, so we don't run into the
problem that the server times out and cancels the invitation. So, we create
a random netname and store the files there, and only after we finish
accepting the invitation we ask the user for a better netname, and then
just rename the temporary directory to the final name.
If port 655 cannot be bound to when using the init command, tinc will try to
find a random port number that can be bound to, and will add the appropriate
Port variable to its host config file. A warning will be printed as well.
During the init command, tinc changed the umask to 077 when writing the public
and private key files, to prevent the temporary copies from being world
readable. However, subsequently created files would therefore also be
unreadable for others. Now we don't change the umask anymore, therefore
allowing the user to choose whether the files are world readable or not by
setting the umask as desired. The private key files are still made unreadable
for others of course. Temporary files now inherit the permissions of the
original, and the tinc-up script's permissions now also honour the umask.
This patch adds timestamp information to type 2 MTU probe replies. This
timestamp can then be used by the recipient to estimate bandwidth more
accurately, as jitter in the RX direction won't affect the results.
When replying to a PMTU probe, tinc sends a packet with the same length
as the PMTU probe itself, which is usually large (~1450 bytes). This is
not necessary: the other node wants to know the size of the PMTU probes
that have been received, but encoding this information as the actual
reply length is probably the most inefficient way to do it. It doubles
the bandwidth usage of the PMTU discovery process, and makes it less
reliable since large packets are more likely to be dropped.
This patch introduces a new PMTU probe reply type, encoded as type "2"
in the first byte of the packet, that indicates that the length of the
PMTU probe that is being replied to is encoded in the next two bytes of
the packet. Thus reply packets are only 3 bytes long.
(This also protects against very broken networks that drop very small
packets - yes, I've seen it happen on a subnet of a national ISP - in
such a case the PMTU probe replies will be dropped, and tinc won't
enable UDP communication, which is a good thing.)
Because legacy nodes won't understand type 2 probe replies, the minor
protocol number is bumped to 3.
Note that this also improves bandwidth estimation, as it is able to
measure bandwidth in both directions independently (the node receiving
the replies is measuring in the TX direction) and the use of smaller
reply packets might decrease the influence of jitter.
The hashing function that tinc uses is currently broken as it only looks
at the first 4 bytes of data.
This leads to interesting bugs, like the node UDP address cache being
subtly broken because two addresses with the same protocol and port (but
not the same IP address) will override each other. This is because
the first four bytes of sockaddr_in contains the IP protocol and port,
while the IP address itself is contained in the four remaining bytes
that are never used when the hash is computed.
Windows doesn't actually support it, but MinGW provides it. However, with some versions of
MinGW it doesn't work correctly. Instead, we vsnprintf() to a local buffer and xstrdup() the
results.
I believe I have found a bug in tinc on Linux when it is used with
Mode = router and DeviceType = tap. This combination is useful because
it allows global broadcast packets to be used in router mode. However,
when tinc receives a packet in this situation, it needs to make sure its
destination MAC address matches the address of the TAP adapter, which is
typically not the case since the sending node doesn't know the MAC
address of the recipient. Unfortunately, this is not the case on Linux,
which breaks connectivity.