Close

Code space structure and management

A project log for POSEVEN

A programming model for languages and "application processors", that emphasises speed, safety, security and modularity.

yann-guidon-ygdesYann Guidon / YGDES 07/16/2023 at 18:280 Comments

Here is a more formal definition for one aspect of NPM, which was first imagined with the YASEP architecture (see logs 30. Log#30 : Fast and secure InterProcess Communications and 55. Log#55 : More about the IPC instructions)

So here we go (again).

In memory, the computer contains a collection of separate addressing spaces, each of their own size and corresponding to a given code module.

These addressing spaces are instruction-grained: each address points to one individual, sequential instruction, regardless of its size. Thus every index gives a valid instruction and the program counter increments normally. Instruction size or packing is out of the equation for the program itself (fixed instructions sizes are recommended).

Each module could also have their own dedicated shared data space (in the negative range ? for constants, semaphores, configuration, whatever, but it must be "safe") but the actual data must be held somewhere else, in the local thread's data context (that's another story for another log).

The addressing space for the code modules can be any (positive) length, but

One module can CALL its own code but must go through the trampoline to call any other module, using a dedicated hardware stack and a trio of instructions, that enforce safety and access rights (through user code and strict separation) while maintaining performance (through static and dynamic  caching).

The trampoline contains 64Ki instructions that could be addressed by a specific instruction only (called IPC for InterProcess Call), and it may only jump to another specific instruction (called IPE InterProcess Entry) that manages a dedicated, hardware-enforced stack. IPE can only exist in the first 64K of the code space (or it will trigger a fault) and any IPC to an instruction other than IPE will fault as well. The third instruction IPR (InterProcess Return) completes the collection and jumps back to the previous module (indicated by the dedicated hardware stack).

The trampoline is meant to speed up repeated accesses to libraries (for example) so the IPC instruction contains the 16-bit constant index of the desired trampoline entry (while the module number maybe be stored in a register). However a program can not know in advance this index so it must first encode the entry point number, which is later translated by a kernel-driven lookup at module load time.

Each module can not be stored and distributed with the IPC instruction itself, because the entry points will change with time independently from the program, so it uses the IPL (InterProcess Lookup) instruction which traps :

  1. The handler checks that the module number is valid
  2. Check that the entry point number exists (not out of range)
  3. The handler changes the opcode from IPL to IPC and stores the actual trampoline index in the 16-bit constant field of the opcode.
  4. The handler re-runs the instruction that trapped

Thus on the second run, the cached entry point has been integrated into the running code and saves some time. The interprocess call should be almost as fast as a regular CALL instruction.

If the called module has been taken down or updated, the calling module can simply be "reloaded" into the uncached state, to re-trigger the dynamic translation.

The numbers of the modules called by a given module are also translated at load time. Let's say a computer can load up to 64K modules simultaneously, each module may be allocated a different random number after a system restart. OTOH a program or module encodes the number in its own way. 

In particular : each module contains a list of the "human readable names" starting at index 1 (constant index 0 is the microkernel's fundamental functions). This helps "match" the program's dependencies with the loaded modules, which are translated by the kernel (module indices => textual names => running modules indices) and updated in the module's internal tables.

The program loader must extract the list of names of all the called modules, in the same order as in the calling module's list, and for each name,

  1. Check that the name is valid and available
  2. Check that the module is loaded in memory. If not,
    1.  the module is read and installed into memory
    2. its initialisation code (entry point 0) is executed (This could recursively trigger loading more modules)
    3. The trampoline's lookup table is built by scanning the first 64K for the IPE instruction and storing the indices in the lookup table
Thus, each module can get the current list of the other installed modules from its own lookup table, and load these numbers into registers.

There is a potential problem here though:

IPC requires 2 numbers : the index of the entry point and the module to call (a constant) and the number of the module (which does not fit in the instruction and is often indicated in a register). The lookup happens with both numbers but only one is stored in the IPC instruction and there is a possible mismatch.

The first solution is to encode the module number as a constant in the instruction : 255 modules is a lot to be called so 8 bits would be enough but this would require the processor to contain a lookup table for each module.... which is not conceivable.

In the worst case, the multi-level protection mechanisms ensure that any spurious call to an invalid entry point gets reported and the offending calling module gets flagged by the kernel to be shut down.

With these mechanisms, modules and threads can be filtered to create protected herds of modules to create a larger modular program, shielded from others.

An even better solution would be to "sign" the IPC instructions to link them with the (un?)translated module number, but this requires more room inside the instruction... So maybe this could be hashed with the register-held value (a 32 or 64-bit value)

Conclusion:

This system strikes a balance between HW and SW overheads. The hardware features should be easy to schedule without microcode, all the rest is software-assisted hence 1) light on transistors 2) easy to debug and update 3) pipelineable and non-blocking...

It should remain true to the RISC principles and avoid all the pitfalls of the iAPX432 or i386. The ISA must implement the 64K trampoline zone, the 4 dedicated opcodes, some trapping entries and more delicate: the hardware stack (which is different from the normal internal calling mechanism, and tightly protected). Each module should also have a list of the callees and callers (and when the number of callers get to 0, the module can be flagged for flushing from memory).

It is also independent from all the paged memory mechanisms, which could also be used to enforce some of the above protection mechanisms (thus moving even more protection to the software side).

But with only a single, secure, flexible and lightweight mechanism, it implements what is required to make an Operating System : the kernel (module n°0), the libraries and the programs. There is no real distinction between them, except the n°0 which has all the rights.

Discussions