Besides controlling when tinc-up and tinc-down get called, this commit makes
DeviceStandby control when the virtual network interface "cable" is "plugged"
on Windows. This is more user-friendly as the status of the tinc network can
be seen just by looking at the state of the network interface, and it makes
Windows behave better when isolated.
This adds a new DeviceStandby option; when it is disabled (the default),
behavior is unchanged. If it is enabled, tinc-up will not be called during
tinc initialization, but will instead be deferred until the first node is
reachable, and it will be closed as soon as no nodes are reachable.
This is useful because it means the device won't be set up until we are fairly
sure there is something listening on the other side. This is more user-friendly,
as one can check on the status of the tinc network connection just by checking
the status of the network interface. Besides, it prevents the OS from thinking
it is connected to some network when it is in fact completely isolated.
In send_sptps_data(), the len variable contains the length of the whole
datagram that needs to be sent to the peer, including the overhead from SPTPS
itself.
When tinc runs the graph algorithms and updates the nexthop and via pointers,
it uses a breadth-first search, but it can sometimes revisit nodes that have
already been visited if the previous path is marked as being indirect, and
there is a longer path that is "direct". The via pointer should be updated in
this case, because this points to the closest hop to the destination that can
be reached directly. However, the nexthop pointer should not be updated.
This fixes a bug where there could potentially be a routing loop if a node in
the graph has an edge with the indirect flag set, and some other edge without
that flag, the indirect edge is part of the minimum spanning tree, and a
broadcast packet is being sent.
The main reason to switch from AES-256-GCM to ChaCha-Poly1305 is to remove a
dependency on OpenSSL, whose behaviour of the AES-256-GCM decryption function
changes between versions. The source code for ChaCha-Pol1305 is small and in
the public domain, and can therefore be easily included in tinc itself.
Moreover, it is very fast even without using any optimized assembler, easily
outperforming AES-256-GCM on platforms that don't have special AES instructions
in hardware.
This uses the portable Ed25519 library made by Orson Peters, which in turn uses
the reference implementation made by Daniel J. Bernstein.
This implementation also allows Ed25519 keys to be used for key exchange, so
there is no need to add a separate implementation of Curve25519.
- Try to prevent SIGPIPE from being sent for errors sending to the control
socket. We don't outright block the SIGPIPE signal because we still want the
tinc CLI to exit when its output is actually sent to a real (broken) pipe.
- Don't call exit() from top(), and properly detect when the control socket is
closed by the tincd.
Before, the tapreader thread would just exit immediately after encountering the
first error, without notifying the main thread. Now, the tapreader thead never
exits itself, but tells the main thread to stop when more than ten errors are
encountered in a row.
Before, when making a meta-connection to a node (either because of a ConnectTo
or because AutoConnect is set), tinc required one or more Address statements
in the corresponding host config file. However, tinc learns addresses from
other nodes that it uses for UDP connections. We can use those just as well for
TCP connections.
When creating invitations or using them to join a VPN, and the tinc command is
not run interactively (ie, when stdin and stdout are not connected or
redirected to/from a file), don't ask questions. If normally tinc would ask for
a confirmation, just assume the default answer instead. If tinc really needs
some input, just print an error message instead.
In case an invitation is used for a VPN which uses a netname that is already in
use on the local host, tinc will store the configuration in a temporary
directory. Normally it asks for an alternative netname and then renames the
temporary directory, but when not run interactively, it now just prints the
location of the unchanged temporary directory.
ListenAddress works the same as BindToAddress, except that from now on,
explicitly binding outgoing packets to the address of a socket is only done for
sockets specified with BindToAddress.
If the Port statement is not used, there are two other ways to let tinc listen
on a non-default port: either by specifying one or more BindToAddress
statements including port numbers, or by starting it from systemd with socket
activation. Tinc announces its own port to other nodes, but before it only
announced what was set using the Port statement.
The restriction of accepting only 1 connection per second from a single address
is a bit too much, especially if one wants to join a VPN using an invitation,
which requires two connections.
It now defers reading from stdin until after the authentication phase is
completed. Furthermore, it supports the -q, -r, -w options similar to those of
Jürgen Nickelsen's socket.
The tinc utility defered calling WSAStartup() until it tried to connect to a
running tinc daemon. However, socket functions are now also used for other
things (like joining another VPN using an invitation). Now we just
unconditionally call WSAStartup() early in main().
It seems like a lot of overhead to call access() for every possible extension
defined in PATHEXT, but apparently this is what Windows does itself too. At
least this avoids calling system() when the script one is looking for does not
exist at all.
Since the tinc utility also needs to call scripts, execute_script() is now
split off into its own source file.
Since filenames could potentially leak to unprivileged users (for example,
because of locatedb), it should not contain the cookie used for invitations.
Instead, tinc now uses the hash of the cookie and the invitation key as the
filename to store pending invitations in.
Commit cff5a84 removed the feature of binding outgoing TCP sockets to a local
address. We now call bind() again, but only if there is exactly one listening
socket with the same address family as the destination address of the outgoing
socket.
The order in which tinc initialized things was not completely correct. Now, it
is done as follows:
- Load and parse configuration files.
- Create all TCP and UDP listening sockets.
- Create PID file and UNIX socket.
- Run the tinc-up script.
- Drop privileges.
- Start outgoing connections.
- Run the main loop.
The PID file can only be created correctly if the listening sockets have been
set up ,as it includes the address and port of the first listening socket. The
tinc-up script has to be run after the PID file and UNIX socket have been
created so it can change their permissions if necessary. Outgoing connections
should only be started right before the main loop, because this is not really
part of the initialization.
The PID file was created before tinc-up was called, but the UNIX socket was
created afterwards, which meant one could not change the UNIX socket's owner or
permissions from the tinc-up script.
Automake finds the files in the subdirectories of src/ now that they are
properly declared in the _SOURCES variables. Using EXTRA_DIST would now cause
.o files to be included in the tarball.
When reloading the configuration file via the tinc command, the user will get
an error message if reloading has failed. However, no such warning exists when
sending a HUP signal. Previously, tincd would exit in both cases, but with a
zero exit code. Now it will exit with code 1 when reloading fails after a
SIGHUP, but tincd will keep running if it is signaled via the tinc command.
Instead, the tinc command will exit with a non-zero exit code.
The retry() function would only abort connections that were in progress of
being made, it wouldn't reschedule the outgoing connections that had been
sleeping.
As mentioned by Erik Tews, calling fchmod() after fopen() leaves a small window
for exploits. As long as tinc is single-threaded, we can use umask() instead to
reduce file permissions. This also works when creating the AF_UNIX control socket.
The umask of the user running tinc(d) is used for most files, except for the
private keys, invitation files, PID file and control socket.
In case no explicit netname of configuration directory is specified when
accepting an invitation, the netname specified in the invitation data is
used. However, this new netname is only known after making the connection
to the server. If the new netname conflicts with an existing one at the
client, we ask the user for a netname that doesn't conflict. However, we
should first finish accepting the invitation, so we don't run into the
problem that the server times out and cancels the invitation. So, we create
a random netname and store the files there, and only after we finish
accepting the invitation we ask the user for a better netname, and then
just rename the temporary directory to the final name.
If port 655 cannot be bound to when using the init command, tinc will try to
find a random port number that can be bound to, and will add the appropriate
Port variable to its host config file. A warning will be printed as well.
During the init command, tinc changed the umask to 077 when writing the public
and private key files, to prevent the temporary copies from being world
readable. However, subsequently created files would therefore also be
unreadable for others. Now we don't change the umask anymore, therefore
allowing the user to choose whether the files are world readable or not by
setting the umask as desired. The private key files are still made unreadable
for others of course. Temporary files now inherit the permissions of the
original, and the tinc-up script's permissions now also honour the umask.
This patch adds timestamp information to type 2 MTU probe replies. This
timestamp can then be used by the recipient to estimate bandwidth more
accurately, as jitter in the RX direction won't affect the results.
When replying to a PMTU probe, tinc sends a packet with the same length
as the PMTU probe itself, which is usually large (~1450 bytes). This is
not necessary: the other node wants to know the size of the PMTU probes
that have been received, but encoding this information as the actual
reply length is probably the most inefficient way to do it. It doubles
the bandwidth usage of the PMTU discovery process, and makes it less
reliable since large packets are more likely to be dropped.
This patch introduces a new PMTU probe reply type, encoded as type "2"
in the first byte of the packet, that indicates that the length of the
PMTU probe that is being replied to is encoded in the next two bytes of
the packet. Thus reply packets are only 3 bytes long.
(This also protects against very broken networks that drop very small
packets - yes, I've seen it happen on a subnet of a national ISP - in
such a case the PMTU probe replies will be dropped, and tinc won't
enable UDP communication, which is a good thing.)
Because legacy nodes won't understand type 2 probe replies, the minor
protocol number is bumped to 3.
Note that this also improves bandwidth estimation, as it is able to
measure bandwidth in both directions independently (the node receiving
the replies is measuring in the TX direction) and the use of smaller
reply packets might decrease the influence of jitter.
The hashing function that tinc uses is currently broken as it only looks
at the first 4 bytes of data.
This leads to interesting bugs, like the node UDP address cache being
subtly broken because two addresses with the same protocol and port (but
not the same IP address) will override each other. This is because
the first four bytes of sockaddr_in contains the IP protocol and port,
while the IP address itself is contained in the four remaining bytes
that are never used when the hash is computed.
Windows doesn't actually support it, but MinGW provides it. However, with some versions of
MinGW it doesn't work correctly. Instead, we vsnprintf() to a local buffer and xstrdup() the
results.
I believe I have found a bug in tinc on Linux when it is used with
Mode = router and DeviceType = tap. This combination is useful because
it allows global broadcast packets to be used in router mode. However,
when tinc receives a packet in this situation, it needs to make sure its
destination MAC address matches the address of the TAP adapter, which is
typically not the case since the sending node doesn't know the MAC
address of the recipient. Unfortunately, this is not the case on Linux,
which breaks connectivity.
Tinc now strictly limits incoming connections from the same host to 1 per
second. For incoming connections from multiple hosts short bursts of incoming
connections are allowed (by default 100), but on average also only 1 connection
per second is allowed.
When an incoming connection exceeds the limit, tinc will keep the connection in
a tarpit; the connection will be kept open but it is ignored completely. Only
one connection is in a tarpit at a time to limit the number of useless open
connections.
When LocalDiscovery is enabled, tinc normally sends broadcast packets during
PMTU discovery to the broadcast address (255.255.255.255 or ff02::1). This
option lets tinc use a different address.
At the moment only one LocalDiscoveryAddress can be specified.
Some options can take an optional argument. However, in this case GNU getopt
requires that the optional argument is right next to the option without
whitespace inbetween. If there is whitespace, getopt will treat it as a
non-option argument, but tincd ignored those without a warning. Now tincd will
allow optional arguments with whitespace inbetween, and will give an error when
it encounters any other non-option arguments.
The tinc binary now requires that all options for itself are given before the
command.
Using the tinc command, an administrator of an existing VPN can generate
invitations for new nodes. The invitation is a small URL that can easily
be copy&pasted into email or live chat. Another person can have tinc
automatically setup the necessary configuration files and exchange keys
with the server, by only using the invitation URL.
The invitation protocol uses temporary ECDSA keys. The invitation URL
consists of the hostname and port of the server, a hash of the server's
temporary ECDSA key and a cookie. When the client wants to accept an
invitation, it also creates a temporary ECDSA key, connects to the server
and says it wants to accept an invitation. Both sides exchange their
temporary keys. The client verifies that the server's key matches the hash
in the invitation URL. After setting up an SPTPS connection using the
temporary keys, the client gives the cookie to the server. If the cookie
is valid, the server sends the client an invitation file containing the
client's new name and a copy of the server's host config file. If everything
is ok, the client will generate a long-term ECDSA key and send it to the
server, which will add it to a new host config file for the client.
The invitation protocol currently allows multiple host config files to be
send from the server to the client. However, the client filters out
most configuration variables for its own host configuration file. In
particular, it only accepts Name, Mode, Broadcast, ConnectTo, Subnet and
AutoConnect. Also, at the moment no tinc-up script is generated.
When an invitation has succesfully been accepted, the client needs to start
the tinc daemon manually.
Most important is the annotation of xasprintf() with the format attribute,
which allows the compiler to give warnings about the format string and
arguments.
ecdh_compute_shared() was changed to immediately delete the ephemeral key after
the shared secret was computed. Therefore, the pointer to the ecdh_t struct
should be zeroed so it won't be freed again when a struct sptps_t is freed.
At this point, c->config_tree may or may not be NULL, but this does not tell us whether it is an
outgoing connection or not. For incoming connections, we do not know the peer's name yet,
so we always have to claim ECDSA support. For outgoing connections, we always need to check
whether we have the peer's ECDSA public key, so that if we don't, we correctly tell the peer that
we want to upgrade.
This gets rid of the rest of the symbolic links. However, as a consequence, the
crypto header files have now moved to src/, and can no longer contain
library-specific declarations. Therefore, cipher_t, digest_t, ecdh_t, ecdsa_t
and rsa_t are now all opaque types, and only pointers to those types can be
used.
Normally all requests sent via the meta connections are checked so that they
cannot be larger than the input buffer. However, when packets are forwarded via
meta connections, they are copied into a packet buffer without checking whether
it fits into it. Since the packet buffer is allocated on the stack, this in
effect allows an authenticated remote node to cause a stack overflow.
This issue was found by Martin Schobert.
We were checking only for readability, which is not a problem for normal
connections, since the server side of a connection will always send an ID
request. But when using a proxy, the proxy server doesn't send anything before
the client, so tinc would not see that its connection to the proxy had already
been established.
Tinc never restarts PMTU discovery unless a node becomes unreachable. However,
it can be that the PMTU was very low during the initial discovery, but has
increased later. To detect this, tinc now tries to send an extra packet every
PingInterval, with a size slightly higher than the currently known PMTU. If
this packet is succesfully received back, we partially restart PMTU discovery
to find out the new maximum.
Conflicts:
src/net_packet.c
Commit dd07c9fc1f broke the reception of datagram
SPTPS packets, by undoing the conversion of the sequence number to host byte
order before comparison. This caused error messages like "Packet is 16777215
seqs in the future, dropped (1)".
Tinc uses Kruskal's algorithm to calculate a MST. However, this was broken in
commit 6e80da3370. Revert back to the working
algorithm from tinc 1.0.
Thanks to Cheng LI for spotting the problem.
Without adding any extra traffic, we can measure round trip times, estimate the
bandwidth and packet loss between nodes. The RTT and bandwidth can be measured
by timing the MTU probe packets. The RTT is the difference between the time a
burst of MTU probes was sent and when the first reply is received. The
bandwidth can be estimated by multiplying the size of the probe packets by the
time between succesive received probe replies of the same burst. The packet
loss can be estimated for incoming traffic by comparing how many packets have
actually been received to the increase in the sequence numbers.
The estimates are not perfect. Especially bandwidth is difficult to measure,
the only accurate way is to continuously send as much data as possible, but
that is obviously not desirable. The packet loss rate is also almost always
a few percent when sending a lot of data over the VPN via TCP, since TCP
*needs* packet loss to work properly.
Keep track of the number of correct, non-replayed UDP packets that have been
received, regardless of their content. This can be compared to the sequence
number to determine the real packet loss.
There are several reasons for this:
- MacOS/X doesn't support polling the tap device using kqueue, requiring a
workaround to fall back to select().
- On Windows only sockets are properly handled, therefore tinc uses a second
thread that does a blocking ReadFile() on the TAP-Win32/64 device. However,
this does not mix well with libevent.
- Libevent, event just the core, is quite large, and although it is easy to get
and install on many platforms, it can be a burden.
- Libev is more lightweight and seems technically superior, but it doesn't
abstract away all the platform differences (for example, async events are not
supported on Windows).
We don't need to search the whole edge tree, we can use the node's own edge
tree since each edge has a pointer to its reverse. Also, we do need to make
sure we try the reflexive address often.
Before it would always use the first socket, and always send an IPv4 broadcast packet. That
works fine in a lot of situations, but it is better to try all sockets, and to send IPv6 packets
on IPv6 sockets. This is especially important for users that are on IPv6-only networks or that
have multiple physical network interfaces, although in the latter case it probably requires
them to use the ListenAddress variable to create a separate socket for each interface.
Before, when tinc saw a packet larger than the PMTU with a VLAN tag, it would
not know what to do with it, and would just forward it via TCP. Now, tinc
handles 802.1q packets correctly, as long as there is only one tag.
When set to a non-zero value, tinc will try to maintain exactly that number of
meta connections to other nodes. If there are not enough connections, it will
periodically try to set up an outgoing connection to a random node. If there
are too many connections, it will periodically try to remove an outgoing
connection.
Only the very first packet of an SPTPS session should be send with REQ_KEY,
this signals the peer to abort any previous session and start a new one as
well.
Most of the code doesn't care whether requests are terminated with a newline or
not, except that when requests are forwarded, it is assumed they do not have
one and a newline is added. When a node using SPTPS receives a request from
another SPTPS-using node, and forwards it to a non-SPTPS-using node, this will
result in two consecutive newlines, which the latter node will see as an empty,
and thus invalid, request.
The tree functions were never used on the connection_tree, a list is more appropriate.
Also be more paranoid about connections disappearing while traversing the list.
Struct outgoing_ts and connection_ts were depending too much on each other,
causing lots of problems, especially the reuse of a connection_t. Now, whenever
a connection is closed it is immediately removed from the list of connections
and destroyed.
Similar to old style key exchange requests, keep track of whether a key
exchange is already in progress and how long it took. If no key is known yet
or if key exchange takes too long, (re)start a new key exchange.
Most fields should be zero when reusing a connection. In particular, when an
outgoing connection to a node which is reachable on more than one address is
made, the second connection to that node will have status.encryptout set but
outctx will be NULL, causing a NULL pointer dereference when
EVP_EncryptUpdate() is called in send_meta() when it shouldn't.
When starting tincd, tincctl now strips non-options from the command line, and
sets argv[0] to the name of the tincd command instead of copying its own
command name.
When stopping a running tincd, tincctl now waits for it to terminate.
Internally, tinc maintains a directed graph of the meta connections between
nodes. However, this causes graphviz to draw two lines between nodes, which is
not always desirable. The "dump graph" command now defaults to dumping an
undirected graph, the "dump digraph" command will dump a directed graph.
This new setting allows choosing a custom script interpreter used for the various tinc callbacks.
If none is specified, the script itself is called as executable (as before).
This is particularly useful when storing tinc configuration and script on a mount point with no-exec attribute.
Commented non-existing functions in android NDK.
Prefix scripts execution with shell binary to allow execution on no-exec mount points.
Everyything is currently hard coded, while it should use pre-compiler variables...
When two nodes which support SPTPS want to send packets to each other, they now
always use SPTPS. The node initiating the SPTPS session send the first SPTPS
packet via an extended REQ_KEY messages. All other handshake messages are sent
using ANS_KEY messages. This ensures that intermediate nodes using an older
version of tinc can still help with NAT traversal. After the authentication
phase is over, SPTPS packets are sent via UDP, or are encapsulated in extended
REQ_KEY messages instead of PACKET messages.
Otherwise, the call to daemon() could close filedescriptors in use by libevent
itself; for example if it uses kqueue or epoll instead of a select() or poll()
backend.
In retry() the function do_outgoing_connection() is called, which can delete
items from the connection_tree, so when walking the tree we must first save the
pointer to the next item.
The used remote protocol can change between two reconnects, aka if
the remote side has enabled/disabled for example their ExperimentalProtocols
setting.
Proxy type "exec" can be used to have an external script or binary set
up an outgoing connection. Standard input and output will be used to
exchange data with the external command. The variables REMOTEADDRESS and
REMOTEPORT are set to the intended destination address and port.
When the Proxy option is used, outgoing connections will be made via the
specified proxy. There is no support for authentication methods or for having
the proxy forward incoming connections, and there is no attempt to proxy UDP.
When the "Broadcast = direct" option is used, broadcast packets are not sent
and forwarded via the Minimum Spanning Tree to all nodes, but are sent directly
to all nodes that can be reached in one hop.
One use for this is to allow running ad-hoc routing protocols, such as OLSR, on
top of tinc.
Most likeley the error is that there just is no valid key inside the used
host file, and in this case errno just contains a random value from the
last previously failed call.
When the Name starts with a $, the rest will be interpreted as the name of an
environment variable containing the real Name. When Name is $HOST, but this
environment variable does not exist, gethostname() will be used to set the
Name. In both cases, illegal characters will be converted to underscores.
If the LISTEN_FDS environment variable is set and tinc is run in the
foreground, tinc will use filedescriptors 3 to 3 + LISTEN_FDS for its listening
TCP sockets. For now, tinc will create matching listening UDP sockets itself.
There is no dependency on systemd or on libsystemd-daemon.
DeviceType = multicast allows one to specify a multicast address and port with
a Device statement. Tinc will then read/send packets to that multicast group
instead of to a tun/tap device. This allows interaction with UML, QEMU and KVM
instances that are listening on the same group.
When making outgoing connections, tinc goes through the list of Addresses and
tries all of them until one succeeds. However, before it would consider
establishing a TCP connection a success, even when the authentication failed.
This would be a problem if the first Address would point to a hostname and port
combination that belongs to the wrong tinc node, or perhaps even to a non-tinc
service, causing tinc to endlessly try this Address instead of moving to the
next one.
Problem found by Delf Eldkraft.
* Everything is identical except the headers of the records.
* Instead of sending explicit message length and having an implicit sequence
number, datagram mode has an implicit message length and an explicit sequence
number.
* The sequence number is used to set the most significant bytes of the counter.
Seeking in files and rewriting parts of them does not seem to work properly on
Windows. Instead, when old RSA keys are found when generating new ones, the
file containing the old keys is copied to a temporary file where the changes
are made, and that file is renamed back to the original filename. On Windows,
we cannot atomically replace files with a rename(), so we need to move the
original file out of the way first. If anything fails, the new code will warn
that the user has to solve the problem by hand.
This allows tincctl to receive log messages from a running tincd,
independent of what is logged to syslog or to file. Tincctl can receive
debug messages with an arbitrary level.
This allows administrators who frequently want to work with one tinc
network to omit the -n option. Since the NETNAME variable is set by
tincd when executing scripts, this makes it slightly easier to use
tincctl from within scripts.
The Broadcast option can be used to cause tinc to drop all broadcast and
multicast packets. This option might be expanded in the future to selectively
allow only some broadcast packet types.
The code introduced in commit 41a05f59ba is not
needed anymore, since tinc has been able to handle UDP packets from a different
source address than those of the TCP packets since 1.0.10. When using multiple
BindToAddress statements, this code does not make sense anymore, we do want the
kernel to choose the source address on its own.
Tinc will now, by default, decrement the TTL field of incoming IPv4 and IPv6
packets, before forwarding them to the virtual network device or to another
node. Packets with a TTL value of zero will be dropped, and an ICMP Time
Exceeded message will be sent back.
This behaviour can be disabled using the DecrementTTL option.
Scripts called by tinc would inherit its open filedescriptors. This could
be a problem if other long-running daemons are started from those scripts,
if those daemons would not close all filedescriptors before going into the
background.
Problem found and solution suggested by Nick Hibma.
Apart from the platform specific tun/tap driver, link with the dummy and
raw_socket devices, and optionally with support for UML and VDE devices.
At runtime, the DeviceType option can be used to select which driver to
use.
* Exchange nonce and ECDH public key first, calculate the ECDSA signature
over the complete key exchange.
* Make an explicit distinction between client and server in the signatures.
* Add more comments and replace some magic numbers by #defines.
Thanks to Erik Tews for very helpful hints and comments!
In case the config file could not be opened a new but unitialized RSA structure
would be returned, causing a segmentation fault later on. This would only
happen in the case that the config file could be opened before, but not when
read_rsa_public_key() was called. This situation could occur when the --user
option was used, and the config files were not readable by the specified user.
Probably due to a merge, the try_harder() function had duplicated the
rate-limiting code for detecting the sender node based on the HMAC of the
packet. This prevented this detection from running at all. The function is now
identical again to that in the 1.0 branch.
sometimes argv[0] will have directory-less name (when the
command is started by shell searching in $PATH for example).
For tincctl start we want the same rules to run tincd as for
tincctl itself (having full path is better but if shell does
not provide one we've no other choice). Previous code tried
to run ./tincd in this case, which is obviously wrong.
This is a fix for the previous commit.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
For tincctl start, run tincd from dirname($0) not SBINDIR -
this allows painless alternative directory installation and
running from build directory too.
Also while at it, pass the rest of command line to tincd, not
only options before "start" argument. This way it's possible
to pass options to tincd like this:
tincctl -n net start -- -d 1 -R -U tincuser ...
And also add missing newline at the end of error message there.
Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
Encryption and authentication of the meta connection is spread out over
meta.c and protocol_auth.c. The new protocol was added there as well,
leading to spaghetti code. To improve things, the new protocol will now
be implemented in sptps.[ch].
The goal is to have a very simplified version of TLS. There is a record
layer, and there are only two record types: application data and
handshake messages. The handshake message contains a random nonce, an
ephemeral ECDH public key, and an ECDSA signature over the former. After
the ECDH public keys are exchanged, a shared secret is calculated, and a
TLS style PRF is used to generate the key material for the cipher and
HMAC algorithm, and further communication is encrypted and authenticated.
A lot of the simplicity comes from the fact that both sides must have
each other's public keys in advance, and there are no options to choose.
There will be one fixed cipher suite, and both peers always authenticate
each other. (Inspiration taken from Ian Grigg's hypotheses[0].)
There might be some compromise in the future, to enable or disable
encryption, authentication and compression, but there will be no choice
of algorithms. This will allow SPTPS to be built with a few embedded
crypto algorithms instead of linking with huge crypto libraries.
The API is also kept simple. There is a start and a stop function. All
data necessary to make the connection work is passed in the start
function. Instead having both send- and receive-record functions, there
is a send-record function and a receive-data function. The latter will
pass protocol data received from the peer to the SPTPS implementation,
which will in turn call a receive-record callback function when
necessary. This hides all the handshaking from the application, and is
completely independent from any event loop or socket characteristics.
[0] http://iang.org/ssl/hn_hypotheses_in_secure_protocol_design.html
This is mainly important for Windows, where the select() call in the
main thread is not being woken up when the tapreader thread calls
route(), causing a delay of up to 1 second before the output buffer is
flushed. This would cause bad performance when UDP communication is not
possible.
First of all, if there really are two nodes with the same name, much
more than 10 contradicting ADD_EDGE and DEL_EDGE messages will be sent.
Also, we forgot to reset the counters when nothing happened.
In case there is a ADD_EDGE/DEL_EDGE storm, we do not shut down, but
sleep an increasing amount of time, allowing tinc to recover gracefully
from temporary failures.
The length parameter for the encoding functions is the length of the
binary input, and for the decoding functions it is the maximum size of
the binary output.
The return value is always the length of the resulting output, excluding
the terminating NULL character for the encoding routines.
All functions can encode and decode in-place. The encoding functions
will always write a terminating NULL character, and the decoding
functions will stop at a NULL character.
If we don't have ECDSA keys for the node we connect to, set protocol_minor
to 1, to indicate this to the other end. This will first complete the
old way of authentication with RSA keys, and will then exchange ECDSA keys.
The connection will be terminated right afterwards, and the next attempt
will use ECDSA keys.
The generate-keys command now generates both an RSA and an ECDSA keypair,
but one can generate-rsa-keys or generate-ecdsa-keys to just generate one type.
It is modelled after the pseudorandom function from RFC4346 (TLS 1.1), the only
significant change is the use of SHA512 and Whirlpool instead of MD5 and SHA1.
REQ_KEY requests have an extra field indicating key exchange version.
If it is present and > 0, the sender supports ECDH. If the receiver also
does, then it will generate a new keypair and sends the public key in a
ANS_KEY request with "ECDH:" prefixed. The ans_key_h() function will
compute the shared secret, which, at the moment,is used as is to set the
cipher and HMAC keys. However, this must be changed to use a proper KDF.
In the future, the ECDH key exchange must also be signed.
The pid is now written first, so that a version 1.0.x tincd can be used to stop
a running version 1.1 tincd. Getsockname() is used to determine the address of
the first listening socket, so that tincctl can connect to the local tincd even
if AddressFamily = ipv6, or if BindToAddress or BindToInterface is used.
Instead of UNIX time, the log messages now start with the time in RFC3339
format, which human-readable and still easy for the computer to parse and sort.
The HUP signal will also cause the log file to be closed and reopened, which is
useful when log rotation is used. If there is an error while opening the log
file, this is logged to stderr.
But we do ignore SIGPIPE, and tinc 1.0.x signals that are no longer used
(SIGUSR1 and SIGUSR2), since the default handler of these signals is to
terminate tincd immediately.
Although we use qsort(), which is not guaranteed to be stable, resorting the
previously sorted array is more stable than recreating and resorting the array
each time.
We live in the 21st century, and we require C99 semantics, so we do not need to
work around buggy libcs. The xmalloc() and related functions are now static
inline functions.
We don't override any signal handlers anymore except those for SIGPIPE and
SIGCHLD. Fatal signals (SIGSEGV, SIGBUS etc.) will terminate tincd and
optionally dump core. The previous behaviour was to terminate gracefully and
try to restart, but that usually failed and made any core dump useless.
This would allow tincctl to connect to a remote tincd, or to a local tincd that
isn't listening on localhost, for example if it is using the BindToInterface or
BindToAddress options.
Also log an error when the input buffer contains more than MAXBUFSIZE bytes
already, instead of silently claiming the other side closed the connection.
Libevent 2.0's buffer code is not completely backward compatible with 1.4's.
In order to not (mis)use it anymore, we implement it ourselves. The buffers
are automatically expanding when necessary. When consuming data from the
buffer, no memmove()s are performed. Only when adding to the buffer would
write past the end do we shift everything back to the start.
In commit 4a21aabada, code was added to detect
contradicting ADD_EDGE and DEL_EDGE messages being sent, which is an indication
of two nodes with the same Name connected to the same VPN. However, these
contradictory messages can also happen when there is a network partitioning. In
the former case a loop happens which causes many contradictory message, while
in the latter case only a few of those messages will be sent. So, now we
increase the threshold to at least 10 of both ADD_EDGE and DEL_EDGE messages.
Since tinc now handles UDP packets with a different source address and port
than used for TCP connections, the heuristic to treat edges as indirect when
tinc could detect that multiple addresses were used does not make sense
anymore, and can actually reduce performance.
Because we don't want to keep track of that, and this will cause the node
structure from being relinked into the node tree, which results in myself
pointing to an invalid address.
When a UDP packet was received with an unknown source address/port, and if it
failed a HMAC check against known keys, it could still incorrectly assign that
UDP address to another node. This would temporarily cause outgoing UDP packets
to go to the wrong destination address, until packets from the correct address
were received again.
Found by cppcheck, which complained about lenin not being initialized, but the
real problem is that reading packets would fail when using code compiled with
--tunemu on a normal tun device.
Before, if MTU probes failed, tinc would stop sending probes until the next
time keys were regenerated (by default, once every hour). Now it continues to
send them every PingInterval, so it recovers faster from temporary failures.
Although transient errors sometimes happen on the tun/tap device (for example,
if the kernel is temporarily out of buffer space), there are situations where
the tun/tap device becomes permanently broken. Instead of endlessly spamming
the syslog, we now sleep an increasing amount of time between consecutive read
errors, and if reads still fail after 10 attempts (approximately 3 seconds),
tinc will quit.
Treat netname "." in a special way as if there was no netname
specified. Before, f.e. tincd -n. -k didn't work as it tried
to open /var/run/tinc-.pid. Now -n. works as if there was no
-n option is specified.
Signed-Off-By: Michael Tokarev <mjt@tls.msk.ru>
With some exceptions, tinc only accepted host configuration options for the
local node from the corresponding host configuration file. Although this is
documented, many people expect that they can also put those options in
tinc.conf. Tinc now internally merges the contents of both tinc.conf and the
local host configuration file.
Options given on the command line have precedence over configuration from files.
This can be useful, for example, for a roaming node, for which 'ConnectTo' and
<host>.Address depends on its location.
In this situation, the two nodes will start fighting over the edges they announced.
When we have to contradict both ADD_EDGE and DEL_EDGE messages, we log a warning,
and with 25% chance per PingTimeout we quit.
If one uses a symbolic name for the Port option, tinc will send that name
literally to other nodes. However, it is not guaranteed that all nodes have
the same contents in /etc/services, or have such a file at all.
When forwarding a metadata request through forward_request() we were
adding the required newline char to our buffer, but then sending the
data without it - this results in the forwarded request and the next one
to be garbled together.
Additionally while at it add a warning comment that request string is
not zero terminated anymore after a call to the forward_request()
function - for now this is ok as it is not used by any caller after this.
If a node is unreachable, and not connected to an edge anymore, it gets
deleted. When this happens its subnets are also removed, which should
not happen with StrictSubnets=yes.
Solution:
- do not remove subnets in src/net.c::purge(), we know that all subnets
in the list came from our hosts files.
I think here you got the check wrong by looking at the tunnelserver
code below it - with strictsubnets we still inform others but do not
remove the subnet from our data.
- do not remove nodes in net.c::purge() that still have subnets
attached.
When this option is enabled, packets that cannot be sent directly to the destination node,
but which would have to be forwarded by an intermediate node, are dropped instead.
When combined with the IndirectData option,
packets for nodes for which we do not have a meta connection with are also dropped.
This determines if and how incoming packets that are not meant for the local
node are forwarded. It can either be off, internal (tinc forwards them itself,
as in previous versions), or kernel (packets are always sent to the TUN/TAP
device, letting the kernel sort them out).
When this option is enabled, tinc will not accept dynamic updates of Subnets
from other nodes, but will only use Subnets read from local host config files
to build its routing table.
Instead of allocating storage for each line read, we now read into fixed-size
buffers on the stack. This fixes a case where a malformed configuration file
could crash tinc.
Every operating system seems to have its own, slightly different way to disable
packet fragmentation. Emit a compiler warning when no suitable way is found.
On OpenBSD, it seems impossible to do it for IPv4.
To help peers that are behind NAT connect to each other directly via UDP, they
need to know the exact external address and port that they use. Keys exchanged
between NATted peers necessarily go via a third node, which knows this address
and port, and can append this information to the keys, which is in turned used
by the peers.
Since PMTU discovery will immediately trigger UDP communication from both sides
to each other, this should allow direct communication between peers behind
full, address-restricted and port-restricted cone NAT.
When we got a key request for or from a node we don't know, we disconnected the
node that forwarded us that request. However, especially in TunnelServer mode,
disconnecting does not help. We now ignore such requests, but since there is no
way of telling the original sender that the request was dropped, we now retry
sending REQ_KEY requests when we don't get an ANS_KEY back.
Commit 052ff8b2c5 contained a bug that causes
scripts to be called with an empty, or possibly corrupted SUBNET variable when
a Subnet is added or removed while the owner is still online. In router mode,
this normally does not happen, but in switch mode this is normal.
Before, we immediately retried select() if it returned -1 and errno is EAGAIN
or EINTR, and if it returned 0 it would check for network events even if we
know there are none. Now, if -1 or 0 is returned we skip checking network
events, but we do check for timer and signal events.
One reason to send the ALRM signal is to let tinc immediately try to connect to
outgoing nodes, for example when PPP or DHCP configuration of the outgoing
interface finished. Conversely, when the outgoing interface goes down one can
now send this signal to let tinc quickly detect that links are down too.
Some ISPs block the ICMP Fragmentation Needed packets that tinc sends. We
clamp the MSS of IPv4 SYN packets to prevent hosts behind those ISPs from
sending too large packets.
The utility functions in the lib/ directory do not really form a library.
Also, now that we build two binaries, tincctl does not need everything that was
in libvpn.a, so it is wasteful to link to it.
For IPv6, the minimum MTU is 1280 (RFC 2460), for IPv4 the minimum is actually
68, but this is such a low limit that it will probably hurt performance, so we
do as if it is 576 (the minimum packet size hosts should be able to handle, RFC
791). If we detect a path MTU smaller than those minima, and we have to handle
a packet that is bigger than the PMTU but smaller than those minima, we forward
them via TCP instead of fragmenting or returning ICMP packets.
If the result of an RSA encryption or decryption operation can be represented
in less bytes than given, gcry_mpi_print() will not add leading zero bytes. Fix
this by adding those ourself.
This wasn't working at all, since we didn't do HMAC but just a plain hash.
Also, verification of packets failed because it was checking the whole packet,
not the packet minus the HMAC.