==27135== Use of uninitialised value of size 8
==27135== at 0x57BE17B: BN_num_bits_word (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57BE205: BN_num_bits (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57BADF7: BN_div (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C48FC: BN_mod_inverse (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C3647: BN_BLINDING_create_param (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x5812D44: RSA_setup_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x58095CB: rsa_get_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x580A64F: RSA_eay_private_decrypt (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x4E5D9BC: rsa_private_decrypt (rsa.c:97)
==27135== by 0x4E51E1B: metakey_h (protocol_auth.c:524)
==27135== by 0x4E505FD: receive_request (protocol.c:136)
==27135== by 0x4E46002: receive_meta (meta.c:290)
==27135== Uninitialised value was created by a heap allocation
==27135== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27135== by 0x575DCD7: CRYPTO_malloc (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C24E1: BN_rand (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C216F: bn_rand_range (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x57C3630: BN_BLINDING_create_param (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x5812D44: RSA_setup_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x58095CB: rsa_get_blinding (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x580A64F: RSA_eay_private_decrypt (in /usr/lib/libcrypto.so.1.0.0)
==27135== by 0x4E5D9BC: rsa_private_decrypt (rsa.c:97)
==27135== by 0x4E51E1B: metakey_h (protocol_auth.c:524)
==27135== by 0x4E505FD: receive_request (protocol.c:136)
==27135== by 0x4E46002: receive_meta (meta.c:290)
The definition of the splay_each() macro is somewhat complicated for
syntactic reasons. Here's what it does in a more readable way:
for (splay_node_t* node = tree->head; node;) {
type* item = node->data;
splay_node_t* next = node->next;
// RUN USER BLOCK with (item)
node = next;
}
list_each() works in the same way. Since node->next is saved before the
user block runs, this construct supports removing the current item from
within the user block. However, what it does *not* support is removing
*other items* from within the user block, especially the next item.
Indeed, that will invalide the next pointer in the above loop and
therefore result in an invalid pointer dereference.
Unfortunately, there is at least one code path where that unsupported
operation happens. It is located in ack_h(), where the authentication
protocol code detects a double connection (i.e. being connected to
another node twice). Running in the context of a socket read event, this
code will happily terminate the *other* metaconnection, resulting in its
socket being removed from the io tree. If, by misfortune, this other
metaconnection happened to have the next socket FD number (which is
quite possible due to FD reuse - albeit unlikely), and was part of the
io tree (which is quite likely because if that connection is stuck, it
will most likely have pending writes) then this will result in the next
pending io item being destroyed. Invalid pointer dereference ensues.
I did a quick audit of other uses of splay_each() and list_each() and
I believe this is the only scenario in which this "next pointer
invalidation" problem can occur in practice. While this bug has been
there since at least 6bc5d626a8 (November
2012), if not sooner, it happens quite rarely due to the very specific
set of conditions required to trigger it. Nevertheless, it does manage
to crash my central production nodes every other week or so.
Unfortunately, sptps_logger() cannot know if s->handle is pointing to a
connection_t or a node_t. But it needs to print name and hostname in
both cases. So make sure both types have name and hostname fields at the
start with the same offset.
The sptps_receive_data() was changed in commit d237efd to only process
one SPTPS record from a stream input. So now we have to put a loop
around it to ensure we process everything.
In some harmless places, checks for the return value of ECDSA and RSA
key generation and verification was omitted. Add them to keep the
compiler happy and to warn end users in case something is wrong.
GCC warns when a function attribute has no effect. The autoconf check
turns warnings about attributes into errors, therefore thinking that
they did not work. The reason was that the test function returned void,
which is not suitable for checking both __malloc__ and
__warn_unused_result__.
The definition of the splay_each() macro is somewhat complicated for
syntactic reasons. Here's what it does in a more readable way:
for (splay_node_t* node = tree->head; node;) {
type* item = node->data;
splay_node_t* next = node->next;
// RUN USER BLOCK with (item)
node = next;
}
list_each() works in the same way. Since node->next is saved before the
user block runs, this construct supports removing the current item from
within the user block. However, what it does *not* support is removing
*other items* from within the user block, especially the next item.
Indeed, that will invalide the next pointer in the above loop and
therefore result in an invalid pointer dereference.
Unfortunately, there is at least one code path where that unsupported
operation happens. It is located in ack_h(), where the authentication
protocol code detects a double connection (i.e. being connected to
another node twice). Running in the context of a socket read event, this
code will happily terminate the *other* metaconnection, resulting in its
socket being removed from the io tree. If, by misfortune, this other
metaconnection happened to have the next socket FD number (which is
quite possible due to FD reuse - albeit unlikely), and was part of the
io tree (which is quite likely because if that connection is stuck, it
will most likely have pending writes) then this will result in the next
pending io item being destroyed. Invalid pointer dereference ensues.
I did a quick audit of other uses of splay_each() and list_each() and
I believe this is the only scenario in which this "next pointer
invalidation" problem can occur in practice. While this bug has been
there since at least 6bc5d626a8 (November
2012), if not sooner, it happens quite rarely due to the very specific
set of conditions required to trigger it. Nevertheless, it does manage
to crash my central production nodes every other week or so.
Unfortunately, sptps_logger() cannot know if s->handle is pointing to a
connection_t or a node_t. But it needs to print name and hostname in
both cases. So make sure both types have name and hostname fields at the
start with the same offset.