How I hacked a vintage C++ compiler to support exceptions before they were standard

Posted on Jul 19 2024

TL;DR I hacked a 28-year-old C++ compiler to support exceptions. That’s two years before exceptions were even part of the first C++ standard!

The compiler

The compiler is the Watcom C++32 Optimizing Compiler Version 11.0, from 1996, by Sybase, Inc. Around that time, some friends and I started a real-time graphics and audio library for DOS, mainly for games and demos. This compiler was very advanced for its time, but not so much as to fully support exceptions, as we will see.

Fig. 1: The Watcom C++32 Optimizing Compiler Version 11.0 running on DOSBox.

Years later, I resumed maintaining the library. Real-time code was mostly written in assembly, but I started adding C++ support to integrate faster. However, with error checking everywhere, the code started to look... verbose:

Fig. 2: Before implementing exception support, code could end up being pretty verbose.

Lots of “do this or fail gracefully” patterns, so RAII + exceptions seemed like a good way to simplify the source while still giving the program a chance to course-correct if an error was not critical. So, I started writing some RAII classes for images that would throw upon error, but surprisingly my exceptions were not being caught anywhere in the program!

It turns out that the compiler can compile a program with try, catch and throw directives, but the generated code calls certain hooks that do not exist in any library shipped with the compiler in 1996. I searched the Internet retro-forums to no avail. It looked like the  folks at Sybase were getting ready to support exceptions, but did not fully release it just yet.

With no compiler documentation, I was working in the dark. How could I pull it off? I reverse-engineered the compiler by creating dozens of programs and disassembling the generated binaries. I kept formulating hypotheses of the workings of the compiler until I got one that explained all the generated binaries from their source.

 

Reverse-engineering exception handling

Without further ado, here’s how C++ exceptions translate to assembly in the Watcom C++32 11.0 compiler:

When a function has at least one try-catch or throw statement, the compiler generates a data structure in its stack: a “stack frame”. When the function runs, it first stores in its stack frame a pointer to the enclosing stack frame, forming a linked list, and makes its stack frame the list’s head by making a global symbol point at it. This global is called __wint_thread_data and our library must define it.

Fig. 3: Dissassembly of a simple C++ program with a try-catch pattern. I wrote lots of program variations to reverse-engineer the exception mechanism that Watcom engineers envisioned.

This is what a stack frame looks like in memory:

Offset (bytes) Size (bytes) Description
0x0 4 Pointer to previous stack frame (innermost pointed to by __wint_thread_data).
0x4 4 Pointer to __wcpp_4_fs_handler_rtn__.
0x8 4 Possibly, pointer to some sort of class info (unsure).
0xC 4 Try-catch block state.

Table I: A stack frame generated in a function's stack when it has at least one try-catch or throw statement.

As we see in the code above, before a try-catch statement, the compiler calls _setjmp_, passing it a pointer to stack space of at least 52 bytes. If the call returns 1 in eax, the catch block is executed; otherwise, the try block is; I call this the fork point. At the end of a catch block, the compiler calls __wcpp_4_catch_done__. A throw statement will call __wcpp_4_throw__ with a pointer to the exception in eax.

All these entry points must be provided by our library to support exceptions. Ideally, _setjmp_ will track its fork point and return eax!=1 to execute the try block. If an exception is thrown in it, __wcpp_4_throw__ can jump to that point after _setjmp_ with eax==1 so the catch block is run, after which __wcpp_4_catch_done__ would untrack the fork point after _setjmp_ so the catch block does not run again with a new throw.

But what if we have multiple nested try-catch statements in the same function? Unfortunately, the compiler won’t create one stack frame per nested try-catch, so how will throw know which catch block to jump back to? For that, we need _setjmp_ to track every nested fork point. It can do so by maintaining its own linked list of try-catch nodes, each of which with its fork address.

That’s where the 52-byte stack space given to _setjmp_ by the compiler comes into play: it can store one new node of the try-catch list! And we can store the head of the list at the second field of the stack frame. Remember that field points at __wcpp_4_fs_handler_rtn__ by default, so we can initialize that symbol with a terminal node that will mark the end of the list. 

Let’s recap: the compiler maintains a linked list of stack frames, and, for each stack frame, we maintain a linked list of try-catch nodes. The compiler’s stack frames are associated with functions with try-catch or throw statements, and our try-catch nodes represent multiple try-catch blocks in the same function, either sequential or nested. 

This is what our try-catch list node looks like:

Offset (bytes) Size (bytes) Description
0x0 4 Pointer to next node, or 0 if none.
0x4 4 Try-catch state when _setjmp_ was called.
0x8 4 Code address after call to _setjmp_.
0xC 4 Pointer to the exception thrown in this try-catch, or 0 if none.
0x10 24 Values of CPU registers (except for eax) before call to _setjmp_.

Table II: A try-catch node in our linked list. These nodes represent multiple try-catch blocks in the same function, either sequential or nested.

Look at the second field of the try-catch node. _setjmp_ compares the previous node’s try-catch state with the current stack frame’s try-catch state to see if the new try-catch is inside the previous try-catch or after it; then, it links the new node to the previous one or entirely replaces it, respectively. This try-catch state stored in the stack frame is updated by the compiler with a unique identifier at each stage of a try-catch block. 

The logic generating the try-catch state was especially painful to reverse-engineer. In a nested try-catch structure, the compiler walks the try blocks from outermost to innermost, then walks catch blocks from innermost to outermost.

Along that walk, each try adds 1 to the try-catch state, if there’s no throw in the containing try, or 2 otherwise. Each catch block adds 2, and each throw adds 1.

Fig. 4: Evolution of a stack frame's try-catch state in nested try-catch with throws.

The try-catch node must also store a pointer to the exception. If an exception is thrown inside a try block that is inside a catch block, we must be able to access the first exception after handling the second one. This is only possible if each try-catch stores the exception thrown in it.

Lastly, the try-catch node stores the values of all registers, so that the CPU state can be restored before jumping back to the catch block. This is crucial because the compiler may access local variables using esp or ebp as base, which could be different at __wcpp_4_throw__ since it may run from a different stack frame. If not restored, the catch block code would be accessing local variables at a different location and probably fault. 

Having to restore the registers before jumping to a catch block, including esp, means that we lose all local variables in the throwing stack frame, including our thrown object! Therefore, __wcpp_4_throw__ must deep-copy the object before jumping to the catch block, so it can be safely accessed from it.

 

Implementing exception support

As we have seen, there are three hooks that our library must provide to support exceptions. Here's what each one of them must do.

_setjmp_

  1. Create a new try-catch node in the provided stack space and make it the new list’s head
  2. If this try-catch is inside the previous try-catch, link it to the previous node
  3. Copy the try-catch state, fork address, and all register values to the node
  4. Return eax!=1, so the try block runs

__wcpp_4_throw__

  1. Deep-copy the thrown object
  2. Find the try-catch node in the current stack frame that handles the exception
  3. If not found, unwind the stack frame and go back to 2
  4. If found, unlink the try-catch node, restore the CPU state, set eax=1 and jump back to the fork point; or jump to the default exception handler if the fork point is NULL (terminal node).

__wcpp_4_catch_done__

  1. Make the previous try-catch node the list’s head
  2. Destroy the deep copy of the exception (this is also run at the end of the program)

 

Limitations

Interestingly, there is no hook called at the end of a try block. This puts a limitation on the cases we can support: it’s not possible to call the right catch block if a throw statement is right after a try-catch block. Hopefully, this was addressed in later versions of the compiler with proper C++98 support. 

Other thing the compiler does not do is keeping track of where the catch block receives the thrown object. It is accessed at a random place within the local variable space, but since __wcpp_4_throw__ does not get that information, it cannot copy the object where the catch block expects it to be. My solution was providing a GetLastException() function that the catch block can use to retrieve the object.

Another limitation of my implementation comes from the fact that I have not found yet where the compiler stores the info about the object classes a catch block handles (it might not even exist in this version!). So, my implementation can only handle thrown objects of the same base class. An exception is always caught by the immediately enclosing try-catch regardless of what object type it specifies.

 

Results

The code for all exception handling hooks is here

Thanks to exceptions and RAII, the verbose code of figure 2 looks now like this. The framework wraps everything in an outer try-catch that handles exceptions and resource deallocation automatically. The code should become more concise as the Image class wraps more of the underlying functionality of the SLI plain structure.

Fig. 5: The same code as in figure 2, but simplified greatly thanks to exceptions.

comments powered by Disqus