784 lines
30 KiB
Text
784 lines
30 KiB
Text
|
This file documents non-portable functions and other issues.
|
||
|
|
||
|
Non-portable functions included in pthreads-win32
|
||
|
-------------------------------------------------
|
||
|
|
||
|
BOOL
|
||
|
pthread_win32_test_features_np(int mask)
|
||
|
|
||
|
This routine allows an application to check which
|
||
|
run-time auto-detected features are available within
|
||
|
the library.
|
||
|
|
||
|
The possible features are:
|
||
|
|
||
|
PTW32_SYSTEM_INTERLOCKED_COMPARE_EXCHANGE
|
||
|
Return TRUE if the native version of
|
||
|
InterlockedCompareExchange() is being used.
|
||
|
This feature is not meaningful in recent
|
||
|
library versions as MSVC builds only support
|
||
|
system implemented ICE. Note that all Mingw
|
||
|
builds use inlined asm versions of all the
|
||
|
Interlocked routines.
|
||
|
PTW32_ALERTABLE_ASYNC_CANCEL
|
||
|
Return TRUE is the QueueUserAPCEx package
|
||
|
QUSEREX.DLL is available and the AlertDrv.sys
|
||
|
driver is loaded into Windows, providing
|
||
|
alertable (pre-emptive) asyncronous threads
|
||
|
cancelation. If this feature returns FALSE
|
||
|
then the default async cancel scheme is in
|
||
|
use, which cannot cancel blocked threads.
|
||
|
|
||
|
Features may be Or'ed into the mask parameter, in which case
|
||
|
the routine returns TRUE if any of the Or'ed features would
|
||
|
return TRUE. At this stage it doesn't make sense to Or features
|
||
|
but it may some day.
|
||
|
|
||
|
|
||
|
void *
|
||
|
pthread_timechange_handler_np(void *)
|
||
|
|
||
|
To improve tolerance against operator or time service
|
||
|
initiated system clock changes.
|
||
|
|
||
|
This routine can be called by an application when it
|
||
|
receives a WM_TIMECHANGE message from the system. At
|
||
|
present it broadcasts all condition variables so that
|
||
|
waiting threads can wake up and re-evaluate their
|
||
|
conditions and restart their timed waits if required.
|
||
|
|
||
|
It has the same return type and argument type as a
|
||
|
thread routine so that it may be called directly
|
||
|
through pthread_create(), i.e. as a separate thread.
|
||
|
|
||
|
Parameters
|
||
|
|
||
|
Although a parameter must be supplied, it is ignored.
|
||
|
The value NULL can be used.
|
||
|
|
||
|
Return values
|
||
|
|
||
|
It can return an error EAGAIN to indicate that not
|
||
|
all condition variables were broadcast for some reason.
|
||
|
Otherwise, 0 is returned.
|
||
|
|
||
|
If run as a thread, the return value is returned
|
||
|
through pthread_join().
|
||
|
|
||
|
The return value should be cast to an integer.
|
||
|
|
||
|
|
||
|
HANDLE
|
||
|
pthread_getw32threadhandle_np(pthread_t thread);
|
||
|
|
||
|
Returns the win32 thread handle that the POSIX
|
||
|
thread "thread" is running as.
|
||
|
|
||
|
Applications can use the win32 handle to set
|
||
|
win32 specific attributes of the thread.
|
||
|
|
||
|
DWORD
|
||
|
pthread_getw32threadid_np (pthread_t thread)
|
||
|
|
||
|
Returns the Windows native thread ID that the POSIX
|
||
|
thread "thread" is running as.
|
||
|
|
||
|
Only valid when the library is built where
|
||
|
! (defined(__MINGW64__) || defined(__MINGW32__)) || defined (__MSVCRT__) || defined (__DMC__)
|
||
|
and otherwise returns 0.
|
||
|
|
||
|
|
||
|
int
|
||
|
pthread_mutexattr_setkind_np(pthread_mutexattr_t * attr, int kind)
|
||
|
|
||
|
int
|
||
|
pthread_mutexattr_getkind_np(pthread_mutexattr_t * attr, int *kind)
|
||
|
|
||
|
These two routines are included for Linux compatibility
|
||
|
and are direct equivalents to the standard routines
|
||
|
pthread_mutexattr_settype
|
||
|
pthread_mutexattr_gettype
|
||
|
|
||
|
pthread_mutexattr_setkind_np accepts the following
|
||
|
mutex kinds:
|
||
|
PTHREAD_MUTEX_FAST_NP
|
||
|
PTHREAD_MUTEX_ERRORCHECK_NP
|
||
|
PTHREAD_MUTEX_RECURSIVE_NP
|
||
|
|
||
|
These are really just equivalent to (respectively):
|
||
|
PTHREAD_MUTEX_NORMAL
|
||
|
PTHREAD_MUTEX_ERRORCHECK
|
||
|
PTHREAD_MUTEX_RECURSIVE
|
||
|
|
||
|
int
|
||
|
pthread_delay_np (const struct timespec *interval);
|
||
|
|
||
|
This routine causes a thread to delay execution for a specific period of time.
|
||
|
This period ends at the current time plus the specified interval. The routine
|
||
|
will not return before the end of the period is reached, but may return an
|
||
|
arbitrary amount of time after the period has gone by. This can be due to
|
||
|
system load, thread priorities, and system timer granularity.
|
||
|
|
||
|
Specifying an interval of zero (0) seconds and zero (0) nanoseconds is
|
||
|
allowed and can be used to force the thread to give up the processor or to
|
||
|
deliver a pending cancelation request.
|
||
|
|
||
|
This routine is a cancelation point.
|
||
|
|
||
|
The timespec structure contains the following two fields:
|
||
|
|
||
|
tv_sec is an integer number of seconds.
|
||
|
tv_nsec is an integer number of nanoseconds.
|
||
|
|
||
|
Return Values
|
||
|
|
||
|
If an error condition occurs, this routine returns an integer value
|
||
|
indicating the type of error. Possible return values are as follows:
|
||
|
|
||
|
0 Successful completion.
|
||
|
[EINVAL] The value specified by interval is invalid.
|
||
|
|
||
|
int
|
||
|
pthread_num_processors_np (void)
|
||
|
|
||
|
This routine (found on HPUX systems) returns the number of processors
|
||
|
in the system. This implementation actually returns the number of
|
||
|
processors available to the process, which can be a lower number
|
||
|
than the system's number, depending on the process's affinity mask.
|
||
|
|
||
|
BOOL
|
||
|
pthread_win32_process_attach_np (void);
|
||
|
|
||
|
BOOL
|
||
|
pthread_win32_process_detach_np (void);
|
||
|
|
||
|
BOOL
|
||
|
pthread_win32_thread_attach_np (void);
|
||
|
|
||
|
BOOL
|
||
|
pthread_win32_thread_detach_np (void);
|
||
|
|
||
|
These functions contain the code normally run via dllMain
|
||
|
when the library is used as a dll but which need to be
|
||
|
called explicitly by an application when the library
|
||
|
is statically linked. As of version 2.9.0 of the library, static
|
||
|
builds using either MSC or GCC will call pthread_win32_process_*
|
||
|
automatically at application startup and exit respectively.
|
||
|
|
||
|
Otherwise, you will need to call pthread_win32_process_attach_np()
|
||
|
before you can call any pthread routines when statically linking.
|
||
|
You should call pthread_win32_process_detach_np() before
|
||
|
exiting your application to clean up.
|
||
|
|
||
|
pthread_win32_thread_attach_np() is currently a no-op, but
|
||
|
pthread_win32_thread_detach_np() is needed to clean up
|
||
|
the implicit pthread handle that is allocated to a Win32 thread if
|
||
|
it calls any pthreads routines. Call this routine when the
|
||
|
Win32 thread exits.
|
||
|
|
||
|
Threads created through pthread_create() do not need to call
|
||
|
pthread_win32_thread_detach_np().
|
||
|
|
||
|
These functions invariably return TRUE except for
|
||
|
pthread_win32_process_attach_np() which will return FALSE
|
||
|
if pthreads-win32 initialisation fails.
|
||
|
|
||
|
int
|
||
|
pthreadCancelableWait (HANDLE waitHandle);
|
||
|
|
||
|
int
|
||
|
pthreadCancelableTimedWait (HANDLE waitHandle, DWORD timeout);
|
||
|
|
||
|
These two functions provide hooks into the pthread_cancel
|
||
|
mechanism that will allow you to wait on a Windows handle
|
||
|
and make it a cancellation point. Both functions block
|
||
|
until either the given w32 handle is signaled, or
|
||
|
pthread_cancel has been called. It is implemented using
|
||
|
WaitForMultipleObjects on 'waitHandle' and a manually
|
||
|
reset w32 event used to implement pthread_cancel.
|
||
|
|
||
|
|
||
|
Non-portable issues
|
||
|
-------------------
|
||
|
|
||
|
Thread priority
|
||
|
|
||
|
POSIX defines a single contiguous range of numbers that determine a
|
||
|
thread's priority. Win32 defines priority classes and priority
|
||
|
levels relative to these classes. Classes are simply priority base
|
||
|
levels that the defined priority levels are relative to such that,
|
||
|
changing a process's priority class will change the priority of all
|
||
|
of it's threads, while the threads retain the same relativity to each
|
||
|
other.
|
||
|
|
||
|
A Win32 system defines a single contiguous monotonic range of values
|
||
|
that define system priority levels, just like POSIX. However, Win32
|
||
|
restricts individual threads to a subset of this range on a
|
||
|
per-process basis.
|
||
|
|
||
|
The following table shows the base priority levels for combinations
|
||
|
of priority class and priority value in Win32.
|
||
|
|
||
|
Process Priority Class Thread Priority Level
|
||
|
-----------------------------------------------------------------
|
||
|
1 IDLE_PRIORITY_CLASS THREAD_PRIORITY_IDLE
|
||
|
1 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_IDLE
|
||
|
1 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_IDLE
|
||
|
1 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_IDLE
|
||
|
1 HIGH_PRIORITY_CLASS THREAD_PRIORITY_IDLE
|
||
|
2 IDLE_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
|
||
|
3 IDLE_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
|
||
|
4 IDLE_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
|
||
|
4 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
|
||
|
5 IDLE_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
|
||
|
5 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
|
||
|
5 Background NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
|
||
|
6 IDLE_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
|
||
|
6 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
|
||
|
6 Background NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
|
||
|
7 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
|
||
|
7 Background NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
|
||
|
7 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
|
||
|
8 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
|
||
|
8 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
|
||
|
8 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
|
||
|
8 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
|
||
|
9 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
|
||
|
9 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
|
||
|
9 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
|
||
|
10 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
|
||
|
10 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
|
||
|
11 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
|
||
|
11 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
|
||
|
11 HIGH_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
|
||
|
12 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
|
||
|
12 HIGH_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
|
||
|
13 HIGH_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
|
||
|
14 HIGH_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
|
||
|
15 HIGH_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
|
||
|
15 HIGH_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
|
||
|
15 IDLE_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
|
||
|
15 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
|
||
|
15 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
|
||
|
15 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
|
||
|
16 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_IDLE
|
||
|
17 REALTIME_PRIORITY_CLASS -7
|
||
|
18 REALTIME_PRIORITY_CLASS -6
|
||
|
19 REALTIME_PRIORITY_CLASS -5
|
||
|
20 REALTIME_PRIORITY_CLASS -4
|
||
|
21 REALTIME_PRIORITY_CLASS -3
|
||
|
22 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
|
||
|
23 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
|
||
|
24 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
|
||
|
25 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
|
||
|
26 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
|
||
|
27 REALTIME_PRIORITY_CLASS 3
|
||
|
28 REALTIME_PRIORITY_CLASS 4
|
||
|
29 REALTIME_PRIORITY_CLASS 5
|
||
|
30 REALTIME_PRIORITY_CLASS 6
|
||
|
31 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
|
||
|
|
||
|
Windows NT: Values -7, -6, -5, -4, -3, 3, 4, 5, and 6 are not supported.
|
||
|
|
||
|
|
||
|
As you can see, the real priority levels available to any individual
|
||
|
Win32 thread are non-contiguous.
|
||
|
|
||
|
An application using pthreads-win32 should not make assumptions about
|
||
|
the numbers used to represent thread priority levels, except that they
|
||
|
are monotonic between the values returned by sched_get_priority_min()
|
||
|
and sched_get_priority_max(). E.g. Windows 95, 98, NT, 2000, XP make
|
||
|
available a non-contiguous range of numbers between -15 and 15, while
|
||
|
at least one version of WinCE (3.0) defines the minimum priority
|
||
|
(THREAD_PRIORITY_LOWEST) as 5, and the maximum priority
|
||
|
(THREAD_PRIORITY_HIGHEST) as 1.
|
||
|
|
||
|
Internally, pthreads-win32 maps any priority levels between
|
||
|
THREAD_PRIORITY_IDLE and THREAD_PRIORITY_LOWEST to THREAD_PRIORITY_LOWEST,
|
||
|
or between THREAD_PRIORITY_TIME_CRITICAL and THREAD_PRIORITY_HIGHEST to
|
||
|
THREAD_PRIORITY_HIGHEST. Currently, this also applies to
|
||
|
REALTIME_PRIORITY_CLASSi even if levels -7, -6, -5, -4, -3, 3, 4, 5, and 6
|
||
|
are supported.
|
||
|
|
||
|
If it wishes, a Win32 application using pthreads-win32 can use the Win32
|
||
|
defined priority macros THREAD_PRIORITY_IDLE through
|
||
|
THREAD_PRIORITY_TIME_CRITICAL.
|
||
|
|
||
|
|
||
|
The opacity of the pthread_t datatype
|
||
|
-------------------------------------
|
||
|
and possible solutions for portable null/compare/hash, etc
|
||
|
----------------------------------------------------------
|
||
|
|
||
|
Because pthread_t is an opague datatype an implementation is permitted to define
|
||
|
pthread_t in any way it wishes. That includes defining some bits, if it is
|
||
|
scalar, or members, if it is an aggregate, to store information that may be
|
||
|
extra to the unique identifying value of the ID. As a result, pthread_t values
|
||
|
may not be directly comparable.
|
||
|
|
||
|
If you want your code to be portable you must adhere to the following contraints:
|
||
|
|
||
|
1) Don't assume it is a scalar data type, e.g. an integer or pointer value. There
|
||
|
are several other implementations where pthread_t is also a struct. See our FAQ
|
||
|
Question 11 for our reasons for defining pthread_t as a struct.
|
||
|
|
||
|
2) You must not compare them using relational or equality operators. You must use
|
||
|
the API function pthread_equal() to test for equality.
|
||
|
|
||
|
3) Never attempt to reference individual members.
|
||
|
|
||
|
|
||
|
The problem
|
||
|
|
||
|
Certain applications would like to be able to access only the 'pure' pthread_t
|
||
|
id values, primarily to use as keys into data structures to manage threads or
|
||
|
thread-related data, but this is not possible in a maximally portable and
|
||
|
standards compliant way for current POSIX threads implementations.
|
||
|
|
||
|
For implementations that define pthread_t as a scalar, programmers often employ
|
||
|
direct relational and equality operators on pthread_t. This code will break when
|
||
|
ported to an implementation that defines pthread_t as an aggregate type.
|
||
|
|
||
|
For implementations that define pthread_t as an aggregate, e.g. a struct,
|
||
|
programmers can use memcmp etc., but then face the prospect that the struct may
|
||
|
include alignment padding bytes or bits as well as extra implementation-specific
|
||
|
members that are not part of the unique identifying value.
|
||
|
|
||
|
[While this is not currently the case for pthreads-win32, opacity also
|
||
|
means that an implementation is free to change the definition, which should
|
||
|
generally only require that applications be recompiled and relinked, not
|
||
|
rewritten.]
|
||
|
|
||
|
|
||
|
Doesn't the compiler take care of padding?
|
||
|
|
||
|
The C89 and later standards only effectively guarrantee element-by-element
|
||
|
equivalence following an assignment or pass by value of a struct or union,
|
||
|
therefore undefined areas of any two otherwise equivalent pthread_t instances
|
||
|
can still compare differently, e.g. attempting to compare two such pthread_t
|
||
|
variables byte-by-byte, e.g. memcmp(&t1, &t2, sizeof(pthread_t) may give an
|
||
|
incorrect result. In practice I'm reasonably confident that compilers routinely
|
||
|
also copy the padding bytes, mainly because assignment of unions would be far
|
||
|
too complicated otherwise. But it just isn't guarranteed by the standard.
|
||
|
|
||
|
Illustration:
|
||
|
|
||
|
We have two thread IDs t1 and t2
|
||
|
|
||
|
pthread_t t1, t2;
|
||
|
|
||
|
In an application we create the threads and intend to store the thread IDs in an
|
||
|
ordered data structure (linked list, tree, etc) so we need to be able to compare
|
||
|
them in order to insert them initially and also to traverse.
|
||
|
|
||
|
Suppose pthread_t contains undefined padding bits and our compiler copies our
|
||
|
pthread_t [struct] element-by-element, then for the assignment:
|
||
|
|
||
|
pthread_t temp = t1;
|
||
|
|
||
|
temp and t1 will be equivalent and correct but a byte-for-byte comparison such as
|
||
|
memcmp(&temp, &t1, sizeof(pthread_t)) == 0 may not return true as we expect because
|
||
|
the undefined bits may not have the same values in the two variable instances.
|
||
|
|
||
|
Similarly if passing by value under the same conditions.
|
||
|
|
||
|
If, on the other hand, the undefined bits are at least constant through every
|
||
|
assignment and pass-by-value then the byte-for-byte comparison
|
||
|
memcmp(&temp, &t1, sizeof(pthread_t)) == 0 will always return the expected result.
|
||
|
How can we force the behaviour we need?
|
||
|
|
||
|
|
||
|
Solutions
|
||
|
|
||
|
Adding new functions to the standard API or as non-portable extentions is
|
||
|
the only reliable and portable way to provide the necessary operations.
|
||
|
Remember also that POSIX is not tied to the C language. The most common
|
||
|
functions that have been suggested are:
|
||
|
|
||
|
pthread_null()
|
||
|
pthread_compare()
|
||
|
pthread_hash()
|
||
|
|
||
|
A single more general purpose function could also be defined as a
|
||
|
basis for at least the last two of the above functions.
|
||
|
|
||
|
First we need to list the freedoms and constraints with restpect
|
||
|
to pthread_t so that we can be sure our solution is compatible with the
|
||
|
standard.
|
||
|
|
||
|
What is known or may be deduced from the standard:
|
||
|
1) pthread_t must be able to be passed by value, so it must be a single object.
|
||
|
2) from (1) it must be copyable so cannot embed thread-state information, locks
|
||
|
or other volatile objects required to manage the thread it associates with.
|
||
|
3) pthread_t may carry additional information, e.g. for debugging or to manage
|
||
|
itself.
|
||
|
4) there is an implicit requirement that the size of pthread_t is determinable
|
||
|
at compile-time and size-invariant, because it must be able to copy the object
|
||
|
(i.e. through assignment and pass-by-value). Such copies must be genuine
|
||
|
duplicates, not merely a copy of a pointer to a common instance such as
|
||
|
would be the case if pthread_t were defined as an array.
|
||
|
|
||
|
|
||
|
Suppose we define the following function:
|
||
|
|
||
|
/* This function shall return it's argument */
|
||
|
pthread_t* pthread_normalize(pthread_t* thread);
|
||
|
|
||
|
For scalar or aggregate pthread_t types this function would simply zero any bits
|
||
|
within the pthread_t that don't uniquely identify the thread, including padding,
|
||
|
such that client code can return consistent results from operations done on the
|
||
|
result. If the additional bits are a pointer to an associate structure then
|
||
|
this function would ensure that the memory used to store that associate
|
||
|
structure does not leak. After normalization the following compare would be
|
||
|
valid and repeatable:
|
||
|
|
||
|
memcmp(pthread_normalize(&t1),pthread_normalize(&t2),sizeof(pthread_t))
|
||
|
|
||
|
Note 1: such comparisons are intended merely to order and sort pthread_t values
|
||
|
and allow them to index various data structures. They are not intended to reveal
|
||
|
anything about the relationships between threads, like startup order.
|
||
|
|
||
|
Note 2: the normalized pthread_t is also a valid pthread_t that uniquely
|
||
|
identifies the same thread.
|
||
|
|
||
|
Advantages:
|
||
|
1) In most existing implementations this function would reduce to a no-op that
|
||
|
emits no additional instructions, i.e after in-lining or optimisation, or if
|
||
|
defined as a macro:
|
||
|
#define pthread_normalise(tptr) (tptr)
|
||
|
|
||
|
2) This single function allows an application to portably derive
|
||
|
application-level versions of any of the other required functions.
|
||
|
|
||
|
3) It is a generic function that could enable unanticipated uses.
|
||
|
|
||
|
Disadvantages:
|
||
|
1) Less efficient than dedicated compare or hash functions for implementations
|
||
|
that include significant extra non-id elements in pthread_t.
|
||
|
|
||
|
2) Still need to be concerned about padding if copying normalized pthread_t.
|
||
|
See the later section on defining pthread_t to neutralise padding issues.
|
||
|
|
||
|
Generally a pthread_t may need to be normalized every time it is used,
|
||
|
which could have a significant impact. However, this is a design decision
|
||
|
for the implementor in a competitive environment. An implementation is free
|
||
|
to define a pthread_t in a way that minimises or eliminates padding or
|
||
|
renders this function a no-op.
|
||
|
|
||
|
Hazards:
|
||
|
1) Pass-by-reference directly modifies 'thread' so the application must
|
||
|
synchronise access or ensure that the pointer refers to a copy. The alternative
|
||
|
of pass-by-value/return-by-value was considered but then this requires two copy
|
||
|
operations, disadvantaging implementations where this function is not a no-op
|
||
|
in terms of speed of execution. This function is intended to be used in high
|
||
|
frequency situations and needs to be efficient, or at least not unnecessarily
|
||
|
inefficient. The alternative also sits awkwardly with functions like memcmp.
|
||
|
|
||
|
2) [Non-compliant] code that uses relational and equality operators on
|
||
|
arithmetic or pointer style pthread_t types would need to be rewritten, but it
|
||
|
should be rewritten anyway.
|
||
|
|
||
|
|
||
|
C implementation of null/compare/hash functions using pthread_normalize():
|
||
|
|
||
|
/* In pthread.h */
|
||
|
pthread_t* pthread_normalize(pthread_t* thread);
|
||
|
|
||
|
/* In user code */
|
||
|
/* User-level bitclear function - clear bits in loc corresponding to mask */
|
||
|
void* bitclear (void* loc, void* mask, size_t count);
|
||
|
|
||
|
typedef unsigned int hash_t;
|
||
|
|
||
|
/* User-level hash function */
|
||
|
hash_t hash(void* ptr, size_t count);
|
||
|
|
||
|
/*
|
||
|
* User-level pthr_null function - modifies the origin thread handle.
|
||
|
* The concept of a null pthread_t is highly implementation dependent
|
||
|
* and this design may be far from the mark. For example, in an
|
||
|
* implementation "null" may mean setting a special value inside one
|
||
|
* element of pthread_t to mean "INVALID". However, if that value was zero and
|
||
|
* formed part of the id component then we may get away with this design.
|
||
|
*/
|
||
|
pthread_t* pthr_null(pthread_t* tp)
|
||
|
{
|
||
|
/*
|
||
|
* This should have the same effect as memset(tp, 0, sizeof(pthread_t))
|
||
|
* We're just showing that we can do it.
|
||
|
*/
|
||
|
void* p = (void*) pthread_normalize(tp);
|
||
|
return (pthread_t*) bitclear(p, p, sizeof(pthread_t));
|
||
|
}
|
||
|
|
||
|
/*
|
||
|
* Safe user-level pthr_compare function - modifies temporary thread handle copies
|
||
|
*/
|
||
|
int pthr_compare_safe(pthread_t thread1, pthread_t thread2)
|
||
|
{
|
||
|
return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
|
||
|
}
|
||
|
|
||
|
/*
|
||
|
* Fast user-level pthr_compare function - modifies origin thread handles
|
||
|
*/
|
||
|
int pthr_compare_fast(pthread_t* thread1, pthread_t* thread2)
|
||
|
{
|
||
|
return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
|
||
|
}
|
||
|
|
||
|
/*
|
||
|
* Safe user-level pthr_hash function - modifies temporary thread handle copy
|
||
|
*/
|
||
|
hash_t pthr_hash_safe(pthread_t thread)
|
||
|
{
|
||
|
return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
|
||
|
}
|
||
|
|
||
|
/*
|
||
|
* Fast user-level pthr_hash function - modifies origin thread handle
|
||
|
*/
|
||
|
hash_t pthr_hash_fast(pthread_t thread)
|
||
|
{
|
||
|
return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
|
||
|
}
|
||
|
|
||
|
/* User-level bitclear function - modifies the origin array */
|
||
|
void* bitclear(void* loc, void* mask, size_t count)
|
||
|
{
|
||
|
int i;
|
||
|
for (i=0; i < count; i++) {
|
||
|
(unsigned char) *loc++ &= ~((unsigned char) *mask++);
|
||
|
}
|
||
|
}
|
||
|
|
||
|
/* Donald Knuth hash */
|
||
|
hash_t hash(void* str, size_t count)
|
||
|
{
|
||
|
hash_t hash = (hash_t) count;
|
||
|
unsigned int i = 0;
|
||
|
|
||
|
for(i = 0; i < len; str++, i++)
|
||
|
{
|
||
|
hash = ((hash << 5) ^ (hash >> 27)) ^ (*str);
|
||
|
}
|
||
|
return hash;
|
||
|
}
|
||
|
|
||
|
/* Example of advantage point (3) - split a thread handle into its id and non-id values */
|
||
|
pthread_t id = thread, non-id = thread;
|
||
|
bitclear((void*) &non-id, (void*) pthread_normalize(&id), sizeof(pthread_t));
|
||
|
|
||
|
|
||
|
A pthread_t type change proposal to neutralise the effects of padding
|
||
|
|
||
|
Even if pthread_nornalize() is available, padding is still a problem because
|
||
|
the standard only garrantees element-by-element equivalence through
|
||
|
copy operations (assignment and pass-by-value). So padding bit values can
|
||
|
still change randomly after calls to pthread_normalize().
|
||
|
|
||
|
[I suspect that most compilers take the easy path and always byte-copy anyway,
|
||
|
partly because it becomes too complex to do (e.g. unions that contain sub-aggregates)
|
||
|
but also because programmers can easily design their aggregates to minimise and
|
||
|
often eliminate padding].
|
||
|
|
||
|
How can we eliminate the problem of padding bytes in structs? Could
|
||
|
defining pthread_t as a union rather than a struct provide a solution?
|
||
|
|
||
|
In fact, the Linux pthread.h defines most of it's pthread_*_t objects (but not
|
||
|
pthread_t itself) as unions, possibly for this and/or other reasons. We'll
|
||
|
borrow some element naming from there but the ideas themselves are well known
|
||
|
- the __align element used to force alignment of the union comes from K&R's
|
||
|
storage allocator example.
|
||
|
|
||
|
/* Essentially our current pthread_t renamed */
|
||
|
typedef struct {
|
||
|
struct thread_state_t * __p;
|
||
|
long __x; /* sequence counter */
|
||
|
} thread_id_t;
|
||
|
|
||
|
Ensuring that the last element in the above struct is a long ensures that the
|
||
|
overall struct size is a multiple of sizeof(long), so there should be no trailing
|
||
|
padding in this struct or the union we define below.
|
||
|
(Later we'll see that we can handle internal but not trailing padding.)
|
||
|
|
||
|
/* New pthread_t */
|
||
|
typedef union {
|
||
|
char __size[sizeof(thread_id_t)]; /* array as the first element */
|
||
|
thread_id_t __tid;
|
||
|
long __align; /* Ensure that the union starts on long boundary */
|
||
|
} pthread_t;
|
||
|
|
||
|
This guarrantees that, during an assignment or pass-by-value, the compiler copies
|
||
|
every byte in our thread_id_t because the compiler guarrantees that the __size
|
||
|
array, which we have ensured is the equal-largest element in the union, retains
|
||
|
equivalence.
|
||
|
|
||
|
This means that pthread_t values stored, assigned and passed by value will at least
|
||
|
carry the value of any undefined padding bytes along and therefore ensure that
|
||
|
those values remain consistent. Our comparisons will return consistent results and
|
||
|
our hashes of [zero initialised] pthread_t values will also return consistent
|
||
|
results.
|
||
|
|
||
|
We have also removed the need for a pthread_null() function; we can initialise
|
||
|
at declaration time or easily create our own const pthread_t to use in assignments
|
||
|
later:
|
||
|
|
||
|
const pthread_t null_tid = {0}; /* braces are required */
|
||
|
|
||
|
pthread_t t;
|
||
|
...
|
||
|
t = null_tid;
|
||
|
|
||
|
|
||
|
Note that we don't have to explicitly make use of the __size array at all. It's
|
||
|
there just to force the compiler behaviour we want.
|
||
|
|
||
|
|
||
|
Partial solutions without a pthread_normalize function
|
||
|
|
||
|
|
||
|
An application-level pthread_null and pthread_compare proposal
|
||
|
(and pthread_hash proposal by extention)
|
||
|
|
||
|
In order to deal with the problem of scalar/aggregate pthread_t type disparity in
|
||
|
portable code I suggest using an old-fashioned union, e.g.:
|
||
|
|
||
|
Contraints:
|
||
|
- there is no padding, or padding values are preserved through assignment and
|
||
|
pass-by-value (see above);
|
||
|
- there are no extra non-id values in the pthread_t.
|
||
|
|
||
|
|
||
|
Example 1: A null initialiser for pthread_t variables...
|
||
|
|
||
|
typedef union {
|
||
|
unsigned char b[sizeof(pthread_t)];
|
||
|
pthread_t t;
|
||
|
} init_t;
|
||
|
|
||
|
const init_t initial = {0};
|
||
|
|
||
|
pthread_t tid = initial.t; /* init tid to all zeroes */
|
||
|
|
||
|
|
||
|
Example 2: A comparison function for pthread_t values
|
||
|
|
||
|
typedef union {
|
||
|
unsigned char b[sizeof(pthread_t)];
|
||
|
pthread_t t;
|
||
|
} pthcmp_t;
|
||
|
|
||
|
int pthcmp(pthread_t left, pthread_t right)
|
||
|
{
|
||
|
/*
|
||
|
* Compare two pthread handles in a way that imposes a repeatable but arbitrary
|
||
|
* ordering on them.
|
||
|
* I.e. given the same set of pthread_t handles the ordering should be the same
|
||
|
* each time but the order has no particular meaning other than that. E.g.
|
||
|
* the ordering does not imply the thread start sequence, or any other
|
||
|
* relationship between threads.
|
||
|
*
|
||
|
* Return values are:
|
||
|
* 1 : left is greater than right
|
||
|
* 0 : left is equal to right
|
||
|
* -1 : left is less than right
|
||
|
*/
|
||
|
int i;
|
||
|
pthcmp_t L, R;
|
||
|
L.t = left;
|
||
|
R.t = right;
|
||
|
for (i = 0; i < sizeof(pthread_t); i++)
|
||
|
{
|
||
|
if (L.b[i] > R.b[i])
|
||
|
return 1;
|
||
|
else if (L.b[i] < R.b[i])
|
||
|
return -1;
|
||
|
}
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
It has been pointed out that the C99 standard allows for the possibility that
|
||
|
integer types also may include padding bits, which could invalidate the above
|
||
|
method. This addition to C99 was specifically included after it was pointed
|
||
|
out that there was one, presumably not particularly well known, architecture
|
||
|
that included a padding bit in it's 32 bit integer type. See section 6.2.6.2
|
||
|
of both the standard and the rationale, specifically the paragraph starting at
|
||
|
line 16 on page 43 of the rationale.
|
||
|
|
||
|
|
||
|
An aside
|
||
|
|
||
|
Certain compilers, e.g. gcc and one of the IBM compilers, include a feature
|
||
|
extention: provided the union contains a member of the same type as the
|
||
|
object then the object may be cast to the union itself.
|
||
|
|
||
|
We could use this feature to speed up the pthrcmp() function from example 2
|
||
|
above by casting rather than assigning the pthread_t arguments to the union, e.g.:
|
||
|
|
||
|
int pthcmp(pthread_t left, pthread_t right)
|
||
|
{
|
||
|
/*
|
||
|
* Compare two pthread handles in a way that imposes a repeatable but arbitrary
|
||
|
* ordering on them.
|
||
|
* I.e. given the same set of pthread_t handles the ordering should be the same
|
||
|
* each time but the order has no particular meaning other than that. E.g.
|
||
|
* the ordering does not imply the thread start sequence, or any other
|
||
|
* relationship between threads.
|
||
|
*
|
||
|
* Return values are:
|
||
|
* 1 : left is greater than right
|
||
|
* 0 : left is equal to right
|
||
|
* -1 : left is less than right
|
||
|
*/
|
||
|
int i;
|
||
|
for (i = 0; i < sizeof(pthread_t); i++)
|
||
|
{
|
||
|
if (((pthcmp_t)left).b[i] > ((pthcmp_t)right).b[i])
|
||
|
return 1;
|
||
|
else if (((pthcmp_t)left).b[i] < ((pthcmp_t)right).b[i])
|
||
|
return -1;
|
||
|
}
|
||
|
return 0;
|
||
|
}
|
||
|
|
||
|
|
||
|
Result thus far
|
||
|
|
||
|
We can't remove undefined bits if they are there in pthread_t already, but we have
|
||
|
attempted to render them inert for comparison and hashing functions by making them
|
||
|
consistent through assignment, copy and pass-by-value.
|
||
|
|
||
|
Note: Hashing pthread_t values requires that all pthread_t variables be initialised
|
||
|
to the same value (usually all zeros) before being assigned a proper thread ID, i.e.
|
||
|
to ensure that any padding bits are zero, or at least the same value for all
|
||
|
pthread_t. Since all pthread_t values are generated by the library in the first
|
||
|
instance this need not be an application-level operation.
|
||
|
|
||
|
|
||
|
Conclusion
|
||
|
|
||
|
I've attempted to resolve the multiple issues of type opacity and the possible
|
||
|
presence of undefined bits and bytes in pthread_t values, which prevent
|
||
|
applications from comparing or hashing pthread handles.
|
||
|
|
||
|
Two complimentary partial solutions have been proposed, one an application-level
|
||
|
scheme to handle both scalar and aggregate pthread_t types equally, plus a
|
||
|
definition of pthread_t itself that neutralises padding bits and bytes by
|
||
|
coercing semantics out of the compiler to eliminate variations in the values of
|
||
|
padding bits.
|
||
|
|
||
|
I have not provided any solution to the problem of handling extra values embedded
|
||
|
in pthread_t, e.g. debugging or trap information that an implementation is entitled
|
||
|
to include. Therefore none of this replaces the portability and flexibility of API
|
||
|
functions but what functions are needed? The threads standard is unlikely to
|
||
|
include that can be implemented by a combination of existing features and more
|
||
|
generic functions (several references in the threads rationale suggest this.
|
||
|
Therefore I propose that the following function could replace the several functions
|
||
|
that have been suggested in conversations:
|
||
|
|
||
|
pthread_t * pthread_normalize(pthread_t * handle);
|
||
|
|
||
|
For most existing pthreads implementations this function, or macro, would reduce to
|
||
|
a no-op with zero call overhead.
|