Close

Register spill

A project log for F-CPU

The Freedom CPU project has a log here too now :-)

yann-guidon-ygdesYann Guidon / YGDES 04/05/2024 at 14:180 Comments

In earlier logs, I was confident and convinced that the control stack should not be used to store general registers, for many reasons, such as :

But then I looked deeper at the IPC/IPE/IPR instructions and I'm reconsidering and adapting my previous position: yes there is a good case to save/restore a limited amount of registers on the control stack, and the scheduling issue is not an actual problem in this case.

The key is that the register spill on the control stack is only permitted for limited types of operations. The user code must still provide its own register storage space to save/restore the registers, which would be faster anyway (since FC1 has 8 read/write ports, it would take 6 cycles).

IPC has a few convoluted things to do, though it can be reduced to a jump where the 16 LSB come from the instruction's immediate field, and others (the new MID, the target module) from a register (<<16). The MSB are set such that the new address points to the "trampoline" address space. IPC writes a pair of values on the control stack: the frame pointer and the address of the IPC instruction itself. Then the processor goes into the "IPC_call" state, loads the new cache line and must check that the target instruction is IPE.

IPE is where more magic happens. If the target is not the expected opcode, it traps (nice, some registers are already saved), otherwise IPE is executed and pushes a new pair of values on the control stack: the Module ID and some metadata (such as general capabilities).

Then the target code must be able to check metadata from the caller: somehow the processor must write data to a given register, which needs more scratch space to setup a new environment. This can easily be solved by an ABI so the target registers are known to be overwriteable, but the control stack is progressively used for more and more functions like exception handling, traps, and more: the ABI can not be always enforced and the called code should not overwrite the state of the caller (or exceptions might fail to recover).

A scratch register area (like Itanium) is possible but moves more complexity to the SW and opens the gates of race conditions. So the stack itself is the best place, as it provides flexible, local, on-demand space. Traps can be nested without microcode or compromises.

So here comes the new class of instructions : SPILL. It writes a register to the stack (along with its number) and optionally writes metadata from the caller's state (such as TID, MID, metadata, it might evolve). So we get the SPILL imm, reg instruction that pushes the register on the stack while also overwriting it with the requested data. SPILL 0, R1 will save R1 and overwrite it with 0, but other numbers will write a specific/useful/relevant value (probably up to 255 possible info).

Why not combine this with the SR read instruction ? Well, the access rights, timing and datapath might be different. SPILL will focus only on a few context-specific information (a subset of the SR that will not trap) so a trampoline can filter requests hassle-free, almost immediately.

Once all the metadata are checked, the code can jump out of the trampoline area and into the module. 

Then before IPR, a corresponding number of UNSPILL instructions must pop the registers and restore them from the stack. If IPR is executed before, the spilled values are skipped from the stack and lost.

IPR will restore the module ID and capabilities, but another cycle is required to restore the instruction pointer and the frame pointer. And it gets a bit complex and quirky because several cycles are needed. The first step is to enable/select the proper addressing space, while the second step loads the caller's original IPC instruction. This could take "a while" because a page miss could occur... Then when the instruction is loaded, the processor knows which register was used as a frame pointer, which can finally be restored.

Important note: the IPC instruction is executed twice !

So depending on the state of execution, call or return, the instruction behaves differently. This also double-checks that the return has not been tampered with.

Furthermore, execution of SPILL and UNSPILL can be limited to the trampoline area.

...

And now that I think of it, the stack layout and the instructions could be modified, such that the return is faster, more convenient.

This way, the IP is available at the same time as the MID for a faster return, in case the target is still in the instruction cache.

Discussions