This contains the current debug level used by tinc. Scripts can use it
to decide whether to log debugging information of their own.
Closes#138 on GitHub.
We know what struct addrinfo looks like, but the standard says nothing
about how it is allocated. So we cannot trust freeaddrinfo() to work
correctly on the struct addrinfo list we allocated ourselves in
get_known_addresses(). To make a distinction by allocations from the
latter and from str2addrinfo(), we keep two pointers (*ai and *kai) in
struct outgoing, and use the freeing function that is appropriate for
each.
This is an attempt at making the control flow through this function
easier to understand by rearranging branches and cutting back on
indentation levels.
This is a pure refactoring; there is no change in behavior.
This commit fixes a logic bug in the edge update code where local
address changes are not taken into account if they are bundled in with
other changes. This bug breaks local discovery in some scenarios.
The regression was introduced by commit
e4670fc4a0576eb76f1807ce29fa9455dd247632.
I have observed cases where disable_device() can get stuck on the
GetOverlappedResult() call, especially when the computer is waking up
from sleep. This is problematic when combined with DeviceStandby=yes:
other_side (1.2.3.4 port 655) didn't respond to PING in 5 seconds
Closing connection with other_side (1.2.3.4 port 655)
Disabling Windows tap device
<STUCK>
gdb reveals the following stack trace:
#0 0x77c7dd3c in ?? ()
#1 0x7482aad0 in KERNELBASE!GetOverlappedResult () from C:\WINDOWS\SysWoW64\KernelBase.dll
#2 0x0043c343 in disable_device () at mingw/device.c:244
#3 0x0040fcee in device_disable () at net_setup.c:759
#4 0x00405bb5 in check_reachability () at graph.c:292
#5 0x00405be2 in graph () at graph.c:301
#6 0x004088db in terminate_connection (c=0x4dea5c0, report=true) at net.c:108
#7 0x00408aed in timeout_handler (data=0x5af0c0 <pingtimer>) at net.c:168
#8 0x00403af8 in get_time_remaining (diff=0x2a8fd64) at event.c:239
#9 0x00403b6c in event_loop () at event.c:303
#10 0x00409904 in main_loop () at net.c:461
#11 0x00424a95 in main2 (argc=6, argv=0x2b42a60) at tincd.c:489
#12 0x00424788 in main (argc=6, argv=0x2b42a60) at tincd.c:416
This is with TAP-Win32 9.0.0.9. I suspect driver bugs related to sleep.
In any case, this commit fixes the issue by cancelling I/O only when the
entire tinc process is being gracefully shut down, as opposed to every
time the device is disabled. Thankfully, the driver seems to be
perfectly fine with this code issuing TAP_IOCTL_SET_MEDIA_STATUS ioctls
while there are I/O operations inflight.
Currently, if both write and read events fire at the same time on a
socket, the Windows-specific event loop will call both the write and
read callbacks, in that order. Problem is, the write callback could have
deleted the io handle, which makes the next call to the write callback a
use-after-free typically resulting in a hard crash.
In practice, this issue is triggered quite easily by putting the
computer to sleep, which basically freezes the tinc process. When the
computer wakes up and the process resumes, all TCP connections are
suddenly gone; as a result, the following sequence of events might
appear in the logs:
Metadata socket read error for node1 (1.2.3.4 port 655): (10054) An existing connection was forcibly closed by the remote host.
Closing connection with node1 (1.2.3.4 port 655)
Sending DEL_EDGE to everyone (BROADCAST): 13 4bf6 mynode node1
Sending 43 bytes of metadata to node2 (5.6.7.8 port 655)
Could not send 10891 bytes of data to node2 (5.6.7.8 port 655): (10054) An existing connection was forcibly closed by the remote host.a
Closing connection with node2 (5.6.7.8 port 655)
<CRASH>
In this example the crash occurs because the socket to node2 was
signaled for reading *in addition* to writing, but since the connection
was terminated, the attempt to call the read callback crashed the
process.
This commit fixes the problem by not even attempting to fire the write
callback when the write event on the socket is signaled - instead, we
just rely on the part of the event loop that simulates level-triggered
write events. Arguably that's even cleaner and faster, because the code
being removed was technically redundant - we have to go through that
write check loop anyway.
At the start of the decade, there were still distributions that shipped
with versions of OpenSSL that did not support these algorithms. By now
everyone should support them. The old defaults were Blowfish and SHA1,
both of which are not considered secure anymore.
The meta-protocol now always uses AES in CFB mode, but the key length
will adapt to the one specified by the Cipher option. The digest for the
meta-protocol is hardcoded to SHA256.