User-space implementations of coroutines and green threads need to be able to switch the CPU's call stack pointer between different memory regions. Since this is inherently CPU- and OS-specific, I'll limit my discussion to CPUs and platforms that Factor supports.
System APIs for switching call stacks
- Windows has an API for creating and switching between contexts, which it calls "fibers". The main functions to look at are CreateFiber() and SwitchToFiber(). On Windows, by far the easiest way to switch call stacks is to just use fibers.
- Most Unix systems have a set of functions operating on
ucontext
s, such asmakecontext()
,swapcontext()
and so on. On some systems, these APIs are poorly implemented and documented. - The C standard library functions
setjmp()
andlongjmp()
can be (ab)used to switch contexts. Thejmpbuf
structure stored to bysetjmp()
and read from bylongjmp()
contains a snapshot of all registers, including the call stack pointer. Once you've allocated your own call stack, you can capture ajmp_buf
withsetjmp()
, change the call stack pointer, and then resume execution withlongjmp()
. As far as I know, this is completely undocumented and unsupported with every major C compiler. - The most direct way is to write some assembly to switch the relevant registers directly. Paradoxically, this approach is the most portable out of all of these, because while it is CPU-specific, the details are pretty much the same across most OSes, Windows being the exception. Switching call stacks on Windows is a bit more involved than just changing the
ESP
register, and requires updating some Windows-specific in-memory structures. However, it is still easy enough to do directly, if you do not wish to use the fiber API. I will describe the details at the end of this post.
High-level libraries for switching call stacks
A couple of existing libraries implement high-level, cross-platform abstractions using some combination of the above mechanisms.
- libcoroutine uses fibers on Windows, and ucontext and longjmp on Unix.
- libcoro uses a handful of Unix-specific APIs if they're available, or falls back onto some hand-coded assembly routines. The latter works on Windows.
NetBSD/x86 limitation
There is no realiable way to switch call stacks on NetBSD/x86, because of a limitation in the implementation of libpthread
. Even if your program doesn't use pthreads, it might link with a library that does, such as Glib. The pthread library replaces various C standard library functions with its own versions, and since this includes some extremely common calls such as malloc()
, almost nothing will work as a result.
The reason is that the NetBSD pthread library uses a silly trick to implement thread-local storage, one which unfortunately assumes that every native thread has exactly one call stack. The trick is to always allocate thread call stacks on multiples of the maximum call stack size. Then, each thread's unique ID can just be the result of masking the call stack pointer by the maximum call stack size.
Windows-specific concerns
So far, I've only worked out the details of switching call stacks on 32-bit Windows. I'll talk about 64-bit Windows in another post.
Windows has a mechanism known as Structured Exception Handling, which is a sort of scoped exception handling mechanism with kernel support. Windows uses SEH to deliver processor traps to user-space applications (illegal memory access, division by zero, etc). Some C++ compilers also use SEH to implement C++ exceptions.
On Windows, simply changing the ESP
register to point at another memory region is insufficient because structured exception handling must know the current call stack's beginning and end, as well as a pointer to the innermost exception handler record.
These three pieces of information are stored in a per-thread information block, known as the TIB. The fiber switching APIs update the TIB for you, which is why Microsoft recommends their use.
The TIB
The TIB is defined by the NT_TIB
struct in winnt.h
:
typedef struct _NT_TIB {
struct _EXCEPTION_REGISTRATION_RECORD *ExceptionList;
PVOID StackBase;
PVOID StackLimit;
PVOID SubSystemTib;
_ANONYMOUS_UNION union {
PVOID FiberData;
DWORD Version;
} DUMMYUNIONNAME;
PVOID ArbitraryUserPointer;
struct _NT_TIB *Self;
} NT_TIB,*PNT_TIB;
The relevant fields that must be updated when switching call stacks are ExceptionList
, StackBase
and StackLimit
.
StackLimit
is the start of the call stack's memory region and StackBase
is the end. Structured exception handlers
Structured exception handlers are stored in a linked list on the call stack, with the head of the list pointed at by the ExceptionList
field of the TIB:
struct _EXCEPTION_REGISTRATION_RECORD
{
PEXCEPTION_REGISTRATION_RECORD Next;
PEXCEPTION_DISPOSITION Handler;
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;
The exception list should be saved and restored when switching between call stacks, and a new call stack should begin with an empty list (ExceptionList
set to NULL
).
If the ExceptionList
field of the TIB or the Next
field of an exception record point outside the call stack, then the handler in question will not be called at all.
Accessing the TIB
On x86-32, the current thread's TIB is stored starting at address 0 in the segment pointed at by the FS segment register. The Self
field always points at the struct itself. Since Windows uses flat addressing, the segment storing the TIB begins somewhere in the linear 32-bit address space, so an assembly routine to fetch the TIB and return a pointer to it in EAX looks like this:
fetch_tib:
mov eax,[fs:24]
ret
This assembly routine can then be called from C code, which can manipulate the TIB like any other structure.
Sample code for updating Windows-specific structures
The basis/cpu/x86/32/winnt/bootstrap.factor file defines assembly subroutines for updating the TIB when switching call stacks. These routines are invoked by the x86-32 non-optimizing compiler backend:
2 comments:
I did similar research in producing libconcurrency. Unfortunately, recent versions of glibc swizzle the jmp_buf structure, so it won't work on glibc on x86 at least. My next planned step was just to take the asm setjmp implementations from netbsd, and use those instead of the glibc setjmp, but I haven't got around to it yet.
http://www.reddit.com/r/programming/comments/bmgo8/switching_call_stacks_on_different_platforms/ has more comments.
Post a Comment