Close

Executable files' structure

A project log for POSEVEN

A programming model for languages and "application processors", that emphasises speed, safety, security and modularity.

yann-guidon-ygdesYann Guidon / YGDES 07/17/2023 at 03:240 Comments

The same file format/structure is used for every executable module : the kernel, the libraries, the applications... It does not use ELF or other existing relocatable formats because it does not need the same features. There is no symbol relocation like with POSIX systems, for example.

The file contains 3 main sections :

  1. The list of required modules, in full text and official form, in a precise order.
  2. The data, relocated/moved to the negative range of the private addressable data space
  3. The code itself, moved to the private instruction address space

Loading such a new module is simple:

  1. create the list of module dependencies by scanning the textual lists, look the names up and sign the dependency (with the module ID and such) to get a 64-bit hash for each called module. See the last log Code space structure and management. This may become recursive, beware of the depth.
  2. copy the data to the appropriate location and map the data space
  3. copy the instructions to the new private instruction space
  4. scan the first 64Ki instructions to create the lookup table of the entry points.
  5. call the first entry point to run initialisation.

The first list can be empty: for example a minimalistic kernel or a basic library. These codes can be called by any other modules and threads and it's up to each module to select which rights or capabilities are protected and how. Thus a program can be built with a group of modules that call each other, registering each other's thread or module IDs to enable richer functions without the risk of interference from other programs.

A number of entry points are to be allocated for basic enumeration and general management.

When an entry point is called, the module can choose :

The rejection can be either a polite one (IPR) or a TRAP (to signal the kernel that something fishy is happening and the caller must be flagged as intrusive).

If the kernel must load external libs, then the bootloader must provide the appropriate access and functions to load more than one module. Ideally, the kernel must provide its own loader though but it's not a critical consideration yet.

The module can use 2 methods, maybe simultaneously:

The entry points would ideally be aligned to 8 instructions boundaries to align the jump target to the cache lines, but it's not a requirement.

___________________

So let's imagine a simple library that provides a single int2str (convert integer to string) function:

The code will look like this and will load at logical address 0 for each module in its own addressing space (there is no need of ASLR because a pointer in one address space can't address another address space, you must go through the IPC/IPE duo to jump, and returns can't be forced from outside)

IPE ThrID, ModID // entry 0 : init
// Save the calling Tread ID to local variable (register) ThrID and the module's ID to ModID
// but they are not required here so just ignore the values

If Loaded_Flag != 0
  then TRAP_FLAG  // already loaded, complain to the kernel
// More initialisation can be done here
Loaded_Flag = 1
IPR   // Module is loaded, go on with your day

IPE ThrID, ModID // entry 1  : suspend
IPR   // who cares, not implemented

IPE ThrID, ModID // entry 2 : resume
IPR   // who cares, nothing to restore

IPE ThrID, ModID // entry 3 : shutdown
IPR   // who cares, nothing to save or close

IPE ThrID, ModID // entry 4 : enum
IPR   // who cares, there is only one function to provide, but this could be updated later

IPE ThrID, ModID // entry 5 : the actual useful function
(no filtering on ThrID or ModID because it's harmless)
(do the actual computation and string stuff here)
IPR

In other situations, the filtering of ThrID and ModID can be evaluated in the trampoline zone, then the code jumps outside the trampoline to have more room to do its job.

Nothing forces the first instruction to be IPE but that wouldn't make sense otherwise. It is not possible to read the instruction space, only to execute instructions, so maybe there could be code there but why ? So maybe the first instruction entry point is "implicit", no need of an IPE and the kernel calls it directly ?

Anyway the above would be easily mapped to an object-oriented language, with a bit more technical stuff that could be made visible to the programmer.

----------------

Important note : all the modules and programs and entry points MUST be totally re-entrant so using the shared values must be protected by semaphores !

Discussions