Home | Libraries | People | FAQ | More |
On the planning stage of the library, it was believed that being able
to migrate a coroutine from one thread to another was a desirable
property, and even a necessary one to take full advantage of the
completion port abstraction provided by the Win32 API
. During
the implementation stage it became apparent that guaranteeing this
property was going to be a considerable challenge.
In the end the decision to prohibit migration as been taken. This section shows why it is unfeasible with current compilers/standard libraries to allow coroutine migration.
One of the problems with migrating coroutines is the handling of thread local
storage. If such an object is accessed, the thread specific
copy is acceded instead. Consider the following code
(it is plain C
to simplify the generated assembler output, but is by
no mean restricted to it):
__thread int some_val;
void bar();
int foo () {
while(1) {
bar();
printf("%p", &test);
}
}
The __thread
storage class is a GCC
extension to mark a global
object as having thread specific storage. Most compilers that support
threaded applications have similar facilities albeit with slightly
different syntaxes.
Let suppose that every time bar()
is invoked, foo()
is suspended
and then resumed in another thread. We would expect that at every
iteration printf()
will print a different address for test
, as
every thread has its own specific instance. For this function GCC
generates the current assembler output (non relevant parts have been
omitted):
.L2
call bar
movl %gs:0, %eax
leal test@NTPOFF(%eax), %eax
pushl %eax
pushl $.LC0
call printf
popl %eax
popl %edx
jmp .L2
This is straightforward. The first line calls bar, the
second line loads from the thread register (GCC uses the GS
segment
register as a thread register) the address of the TLS area, then
the third line load the address of the current thread instance of
test
in EAX
. The fourth and fifth line push on the stack the
parameters for printf (#.LCO
is the symbol that contains the string
"%p"
). The sixth line calls it. The seventh and eight line pop the
argument from the stack and finally the last line returns to the
first.
This code does the right thing at every iteration print the a new
value for the address of test
. If we compile at an higher
optimization level things are no longer fine:
movl %gs:0, %eax
leal test@NTPOFF(%eax), %ebx
.L2:
call bar
pushl %ebx
pushl $.LC0
call printf
popl %ebx
popl %edx
jmp .L2
Even on an optimization level as low as -O1
(usually considered
safe), the compiler hoists the load of the address of test
outside
the loop. Now the loop will always print the same value.
Unfortunately this specific compiler provides no switch to disable this specific optimization. Other compilers might do the same thing. The only compiler we know that provides a switch to explicitly disable this optimization is Visual C++, as this is often used with code that uses fibers.
It might be argued that #__thread# is not part of the
It might be argued that __thread
is not part of the
C++
standard, so its handling is undefined anyway. Putting aside
the fact that something similar to __thread
is likely to be part of
the next release of the standard, abstaining from using it is not a
solution. For example on many systems the errno
macro expands to a
symbol declared the equivalent of __thread
. Also thread local
variables might be used in standard library facilities (memory
allocation is a very likely candidate), and an optimizer capable of
inlining library functions might hoist loads of those variables
outside loops or at least move them across yield points.
Fixing compilers is unfortunately not enough. Operating systems might need to be fixed too; consider the following code:
mutex mtx;
void bar();
void foo() {
lock(mtx);
bar();
unlock(mtx);
}
Where mutex
is some synchronization primitive, and bar()
a
function may migrate the current coroutine to another thread. Aside of
the fact that is bad practice to hold a lock across a yield point,
many operating systems require a mutex to be unlocked by the same
thread that locked it, breaking the code above.
The above scenarios are just two examples. There are many possible ways that coroutine migration could break otherwise perfectly fine code. For reference see this blog about using fibers in .NET code and MSDN article about the perils of fiber mode in SQL Server.
In the end Boost.Coroutine provides the only thread safety guarantees that are believed to be safe on all systems. Note that, as coroutines are not to be shared between threads, internal reference counting is not thread safe (it doesn't necessarily use atomic operations).
Copyright © 2006 Giovanni P. Deretta |