Close
0%
0%

POSEVEN

A programming model for languages and "application processors", that emphasises speed, safety, security and modularity.

Public Chat
Similar projects worth following
Because POSIX (at the lower levels) is totally tied to the C language which descends from 60's era DEC PDP's idiosynchrasies, it too has become unsuitable for modern computing, holding it back more than helping. It survives from its tight integration with countless other software domains and the huge installed base, but the fundamentals have never been seriously challenged.
So let's take a deep breath and make an abstract definition for a system architecture, both hardware and software, revisit old classic and recombine them in hopefully more convenient ways.

This is a medium-high-level discussion about general computer program organisation. For lower-level CPU design tricks, dos and don'ts, please check it out: #PDP - Processor Design Principles :-)


The way we are used to program computers is a long evolution, and selective pressure has favoured bad habits, among others.

Safety and security are not guaranteed, ASLR and NoExecute flags don't prevent  the constant stream of dumb and preventable vulnerabilities (out of range access, buffer overflows, stack smashing, various leaks, Return-Oriented-Programming and so many exploits). I'm not speaking about cache and buffer coherency/timing issues here, which are detailed architectural issues.

Of course, no program is flawless and "there will always be bugs" but so many of them depend on the platform allowing them. This project will not revolutionise software engineering but tries to define techniques, structures and features that prevent many of the above-mentioned exploitable side-effects of the current generation of monolithic kernels (which are themselves architectured around existing CPU's programming models).

This project also tries to enable a new generation of capability-based microkernels (quite close to the "actor model"), which influence this work. I tried to generalise their structure and keep the most fundamental features for inclusion as processor features.


Logs:
1. POSIX is a curse
2. Read as Zero
3. The cellular allegory
4. Introduction to the new model
5. The watered-down OO Paradigm
6. Code space structure and management
7. Executable files' structure
8. SSS : the Single-Stack Syndrome
9. Name change
10.
11.
12.

  • Name change

    Yann Guidon / YGDES12/16/2023 at 03:06 0 comments

    NPM is no more, welcome the new flashy and witty name :

    POSEVEN.

    Yes it's a deliberate pun, which seems to have been overlooked for decades, and the domain names seem uncluttered (yet). So I got .com as well. You never know.

    NPM was never intended to remain a definitive name for this projects/development/specification. Due to my ADHD, the name collision was a deliberate personal incentive to find a way better one, just like all those "project code names" names either Aurora, Nebula or Eclipse you see in the industry. I wanted to start the design without being hindered by the lack of a definitive umbrella name, and the ideas matter more than the label, just a few letters, put on them. But this lack was becoming a pain recently. Inspiration never warns anyway, and when it arrives, it strikes with its obviousness.

    It's seven characters long and I'd have loved a much shorter one but they are all taken already.

    What's after POSEVEN ? POEIGHT. So a system written using these principles would naturally be a POEM. Sweet, but now it's become totally corny.

    Anyway, I'm still considering this development in the background, while I work on other things like the #YGREC8. Stay tuned.

  • SSS : the Single-Stack Syndrome

    Yann Guidon / YGDES09/08/2023 at 17:20 0 comments

    I remember as a teen, knowing BASIC and PASCAL, I explored the weird world of the 8086 so I could program it efficiently. And I was, and still am, scratching my head over (among other things) the concept of the stack as it was implemented by this CPU...

    I knew about stacks already, being familiar with the 6809. Stacks are cool and great. But the 8086 implements frames and contrived, uselessly elaborated. And you know what ? Stacks are one of the main weaknesses of x86, particularly when coupled with C's unbound and unchecked array, it's a highway to bugs.

    Stacks are one of the most fundamental structures in computer science and has been studied to death. Having a curiosity of FORTH, I know that a single stack is not a fatality as there are other programming languages with more than one. Indeed FORTH was creating on a computer featuring two stacks. Some languages also implement their own split stack under the hood. And that's very fine. So what happened ?

    Well, C happened more than 50 years ago and started on a PDP-7. Belonging to a certain tradition and having to run on tightly limited resources, having only one stack was a compromise that worked well enough for the time, the constraints, so it spread wildly and even overcame PASCAL.

    The single stack has one major problem in C : you can't return composite variables on the stack so function must resort to pointers and other stupid workarounds, which makes C a development hell once you go beyond the basic examples. The standard libs are a freak show. And I won't rant again about the insanity that comes from mixing control and variables data.

    The thing is : C structures (and the SSS) crystalized in many subsequent designs. Worse : it underpins modern compilers and processor architectures ! The syndrome turns into a plague.


    So...

    The NPM defines a strict split structure that separates data from control values.

    • The Control Stack is tightly controlled and only affected by the program's instructions (CALL, RETURN, IPC/IPE/IPR, THROW/CATCH etc.). The data reside in a dedicated memory space to enforce security. Nothing bad should happen as long as the program and the hardware's integrity are ensured.
    • The data stack is user-defined in the user's data space and can be placed anywhere there, in fact there could be any number of data stacks. It's up to the programmer's tastes and needs, who could shoot himself in the foot but this will not break everything. Anyway, one main data stack seems indispensable at least to support "frames" that common languages expect.

    This is expanded in the sub-project #A stack.

    This is a way to get both safety (the control stack can't be tampered with by data injection, only by modifying the read-only program) and performance (since separate access increases the ILP and there are fewer indirections due to programming kludges). The user can optimise the data stack(s) at will and even not use one when unneeded. The user stacks can grow up or down, be allocated dynamically...

    OTOH the control stack enforces security by using tagged values that define the type of the stack's contents. This is discussed at https://hackaday.io/project/8774-f-cpu/log/222000-tagged-control-stack and will certainly expand. This tagged control stack (TCS ?) can mix critical information of the program's flow state, including error handling, inter- and intra- modules calls, or even outer loops. However registers and other states must be saved through the normal datapath : apart from the Instruction Pointer, there is no provision to save the other states on the TCS, whose name implies it's only for control.

    Saving the state of the other data can be performed explicitly by code sequences and/or specific instructions. NPM/POSEVEN does not specify context switches.

  • Executable files' structure

    Yann Guidon / YGDES07/17/2023 at 03:24 0 comments

    The same file format/structure is used for every executable module : the kernel, the libraries, the applications... It does not use ELF or other existing relocatable formats because it does not need the same features. There is no symbol relocation like with POSIX systems, for example.

    The file contains 3 main sections :

    1. The list of required modules, in full text and official form, in a precise order.
    2. The data, relocated/moved to the negative range of the private addressable data space
    3. The code itself, moved to the private instruction address space

    Loading such a new module is simple:

    1. create the list of module dependencies by scanning the textual lists, look the names up and sign the dependency (with the module ID and such) to get a 64-bit hash for each called module. See the last log Code space structure and management. This may become recursive, beware of the depth.
    2. copy the data to the appropriate location and map the data space
    3. copy the instructions to the new private instruction space
    4. scan the first 64Ki instructions to create the lookup table of the entry points.
    5. call the first entry point to run initialisation.

    The first list can be empty: for example a minimalistic kernel or a basic library. These codes can be called by any other modules and threads and it's up to each module to select which rights or capabilities are protected and how. Thus a program can be built with a group of modules that call each other, registering each other's thread or module IDs to enable richer functions without the risk of interference from other programs.

    A number of entry points are to be allocated for basic enumeration and general management.

    • Entry 0 is for start/init : it's run after all the dependencies have been loaded, it can return immediately (for a kernel or lib) or go on until ... whatever (a program, a daemon, you name it)
    • Entry 1 is for suspend
    • Entry 2 is for restore/resume
    • Entry 3 is called just before shutting down/exiting/unloading the module
    • Entry 4 for enumeration of available entry points and getting the versions, name, max. num. of entries, configuration...

    When an entry point is called, the module can choose :

    • Accept anything anyway
    • Check a semaphore to accept to go further
    • Check the thread ID (given by the IPE instruction)
    • Check the module ID (also given by the IPE instruction)
    • reject everything

    The rejection can be either a polite one (IPR) or a TRAP (to signal the kernel that something fishy is happening and the caller must be flagged as intrusive).

    If the kernel must load external libs, then the bootloader must provide the appropriate access and functions to load more than one module. Ideally, the kernel must provide its own loader though but it's not a critical consideration yet.

    The module can use 2 methods, maybe simultaneously:

    • mix the code with the entry points. Compact, fast and suitable for few entry points.
    • jump from the trampoline zone almost immediately : suitable when many entry points are provided in the 64K instruction zone

    The entry points would ideally be aligned to 8 instructions boundaries to align the jump target to the cache lines, but it's not a requirement.

    ___________________

    So let's imagine a simple library that provides a single int2str (convert integer to string) function:

    • File header (could be "PM2\n") [32 bits]
    • File checksum [32 bits]
    • Some sort of specifying the ISA/CPU/word and instruction sizes
    • Size of the dependency list [32 bits] (0 in this case because no external module is required).
      • For each module name in ASCII, a PascalZ with a 8-bit prefix and NULL terminator
    • Size of constant data section [32 bits] (if larger than 31 bits, it's clearly a problem)
    • Constant section
      • "loaded" flag (cleared)
      • some constant array such as "0123456789ABCDEF" (the lookup table for ASCII chars)
    • Size of variable data section (initialised to 0 and placed below the already negative data range of the above section)
    • Size of the code
    • Code section

    The code will look like this and will load at logical...

    Read more »

  • Code space structure and management

    Yann Guidon / YGDES07/16/2023 at 18:28 0 comments

    Here is a more formal definition for one aspect of NPM, which was first imagined with the YASEP architecture (see logs 30. Log#30 : Fast and secure InterProcess Communications and 55. Log#55 : More about the IPC instructions)

    So here we go (again).

    In memory, the computer contains a collection of separate addressing spaces, each of their own size and corresponding to a given code module.

    These addressing spaces are instruction-grained: each address points to one individual, sequential instruction, regardless of its size. Thus every index gives a valid instruction and the program counter increments normally. Instruction size or packing is out of the equation for the program itself (fixed instructions sizes are recommended).

    Each module could also have their own dedicated shared data space (in the negative range ? for constants, semaphores, configuration, whatever, but it must be "safe") but the actual data must be held somewhere else, in the local thread's data context (that's another story for another log).

    The addressing space for the code modules can be any (positive) length, but

    • The shorter the better (to prevent feature creep and bug-prone cruft accumulation)
    • The first 64Ki is a trampoline zone that could be jumped to arbitrarily from anywhere.

    One module can CALL its own code but must go through the trampoline to call any other module, using a dedicated hardware stack and a trio of instructions, that enforce safety and access rights (through user code and strict separation) while maintaining performance (through static and dynamic  caching).

    The trampoline contains 64Ki instructions that could be addressed by a specific instruction only (called IPC for InterProcess Call), and it may only jump to another specific instruction (called IPE InterProcess Entry) that manages a dedicated, hardware-enforced stack. IPE can only exist in the first 64K of the code space (or it will trigger a fault) and any IPC to an instruction other than IPE will fault as well. The third instruction IPR (InterProcess Return) completes the collection and jumps back to the previous module (indicated by the dedicated hardware stack).

    The trampoline is meant to speed up repeated accesses to libraries (for example) so the IPC instruction contains the 16-bit constant index of the desired trampoline entry (while the module number maybe be stored in a register). However a program can not know in advance this index so it must first encode the entry point number, which is later translated by a kernel-driven lookup at module load time.

    Each module can not be stored and distributed with the IPC instruction itself, because the entry points will change with time independently from the program, so it uses the IPL (InterProcess Lookup) instruction which traps :

    1. The handler checks that the module number is valid
    2. Check that the entry point number exists (not out of range)
    3. The handler changes the opcode from IPL to IPC and stores the actual trampoline index in the 16-bit constant field of the opcode.
    4. The handler re-runs the instruction that trapped

    Thus on the second run, the cached entry point has been integrated into the running code and saves some time. The interprocess call should be almost as fast as a regular CALL instruction.

    If the called module has been taken down or updated, the calling module can simply be "reloaded" into the uncached state, to re-trigger the dynamic translation.

    The numbers of the modules called by a given module are also translated at load time. Let's say a computer can load up to 64K modules simultaneously, each module may be allocated a different random number after a system restart. OTOH a program or module encodes the number in its own way. 

    In particular : each module contains a list of the "human readable names" starting at index 1 (constant index 0 is the microkernel's fundamental functions). This helps "match" the program's dependencies with the loaded modules, which are translated by the kernel...

    Read more »

  • The watered-down OO Paradigm

    Yann Guidon / YGDES06/16/2023 at 03:09 0 comments

    How can I easily and quickly describe the programming model ?

    It originates from a study of microkernels but now I realise it features "Object-Oriented" traits. So it's a sort of object-oriented microkernel, or whatever...

    I was really interested to learn the ideas of GNU HURD, in particular that "everything is a server" and the user/root split is replaced by credentials. Today is looks great to me as I poke holes into the POSIX paradigms, but it was not obvious back in the days, despite the claims of the benefits. It takes time to balance them with the drawbacks, the most prominent one being to re-learn to code, structure, manage this concept, which will percolate slowly, very slowly through a software collection. So slowly in fact that GNU simply provides a POSIX compatibility layer, making the whole effort moot...

    OTOH, basic OO programming has some simple concepts: an object is a module that bundles data and code, of which there are private and public ones. Inheritance provides enhancements, and a few more clever concepts that I understand but find no use in my projects. I have written a Java program for a project 25 years ago and I still dislike it, though the necessary boilerplate seems to be more or less accepted, while the similar verboseness of Ada/VHDL is often criticised by newcomers. Whatever.

    The convergence of OOp and capabilities-based microkernels however provides something really interesting. The overall structure of NPM can be used to implement a strict OOp system but we're not going for a iAPX432 remake, the ABI is very simple and requires little complexity on the platform side.

    The similarity became apparent when I saw that the module's entry points could be seen as an object's public functions. Data are private unless shared with a specific mechanism (zero-copy becomes tricky though). The "init" entry point is close to the "new" or constructor of an object. The more I look, the more I see the parallels and it appears that object-oriented languages could be ported, with some modifications, which would lower the barrier of entry in this new world.

    Does that mean I'm making a Java or C++ OS and processor ? No because it will remain language agnostic, though OO code would be quite easy to program and port there. Pure procedural code is probably easier to write but will require the understanding and knowledge of the paradigm, but if I say "it's OOp-like" then people will adopt many prerequisite coding habits. It might even justify the limitation of the size of the entry point area because people will see and understand why a program should not be overly complex. However there is no support for things like virtual tables or anything dynamic.

    The BIG problem will be with the data pointers, as there is no "system-level heap". Modules/objects must communicate via private, peer-to-peer channels with some kind of address space right delegation. Most OO programs don't have to deal with different address spaces or access rights tied to an object, because classical OO programs share a single executable's virtual space and the OS does the policing.

    OO programs could be made a bit more robust by adopting some of the proposed mechanisms but the classical OO structure is already stringent enough to help with adoption.

  • Introduction to the new model

    Yann Guidon / YGDES05/30/2020 at 18:59 0 comments

    In the previous log (3. The cellular allegory), we find that there is some degree of similarity between eukaryote cells and the model I'm describing in this project.

    Differences are :

    • Binary coding is used, instead of quaternary (i'm nitpicking, I know)
    • Programs communicate with direct calls, register values and transient memory blocks, instead of mRNA
    • fine-grained rights are enforced by an extra sub-level, under control of a specific promgram that enforces protections, as well as housekeeping for the thread ID and other essential features (like allowing code to load in the executable area)
    • Programs are inherently parallel and can be executed by any number of threads simultaneously (whereas DNA transcription of one gene can only be physically performed once at a time, as in the following GIF
      Obviously a RNA or DNA strand can only be "sensed" by a single molecule at a time).

    I will now try to describe the elements of the programming model :

    • The rights
    • The programs
    • The threads
    • The data memory

    The rights

    are properties and/or credentials that enable or inhibit access to a critical resource, such as

    • the input-output ports, or communications outside de execution context
    • the paging mechanism
    • the program memory (loading and/or reading the code of programs)
    • the properties of the programs
    • etc.

    There is one rule here, inspired by other OSes : it is only possible to drop/lose rights ! Otherwise, any program could get access to resources it shouldn't, by mistake and/or malevolently. So the whole system is designed in a "top-down" fashion where a first/initial program starts with all the possible rights, dispatches them to other sub-programs, with each of them having only the minimum required rights to perform their job.

    Surrogate programs can serve as gatekeepers : they perform the I/O taks for example while filtering data and enforcing protocols. They have their own filters for who can use which provided service. This allows dynamic, fine-grained access to necessary features, and even cascading "server programs" while keeping the system "flat" (no "privileged program" because no program has all the rights).


    ..
    ..
    ..
    ..

  • The cellular allegory

    Yann Guidon / YGDES05/30/2020 at 05:06 0 comments

    Here is one way to explain the "vision" of the programming model I try to define.

    Initially I wanted to use robots in a factory, but a more natural example abounds on Earth : cells. More specifically : Eukaryotes. They have a nucleus that contains, among others, chromosomes, each made of genes, each written with codons of three nucleotides.

    The analogy would then be :

    • Each nucleotide equivalent to 2 binary bits, so it makes a quaternary "computer"
    • Each codon has 3 quaternary values, or 64 codes, vaguely equivalent to a processor instruction
    • Codons are assembled in sequences to make genes, or "computer routines"
    • The genes are packed in chromosomes, like a computer's program
    • The cell contains several chromosomes, or program, that can "work" in parallel...
    • Some extra functions are provided by Mitochondria, such as peripheral management or energy processing...
    • Communication is provided by "messenger RNA" (simplified)

    .


    Our "new computer" can perform parallel execution but requires of course synchronisation and semaphores. The number of execution threads is not specified and can vary wildly, as it is hypothetical so far.

    Our computer has "threads" : this is an active instance of a program. Any number of instances could be running at any time... It depends on the implementation. The program starts from one "chromosome" and can then ask services to other programs/chromosomes. The callee can refuse or accept, depending on its own policy. This is performed with the "IPC/IPE/IPR" mechanism.

    Communication is essential and "zero-copy" is required. How is our mRNA implemented ? Small chunks of data would fit in the registers, larger blocks require memory. The paging system ensures that a block of data has only one owner and the sender "yields" a data block ownership to the callee. To prevent dangling and zombie blocks, in case the callee has an issue, the callee must "accept" the new block (otherwise the block is garbage-collected). This block could then be yielded again to another program, and passed along a string of "genes" to perform any required processing.

    Paging ensures that any access outside of the message traps. It's not foolproof because data will rarely fill a whole page... Smaller and bigger pages are required (with F-CPU we talked about 512, 4096, 32768 and 262144 bytes)

    This also means that there must be a sort of fast and simple page allocator to yield pages of data at high rates, and also provide some garbage collection and TLB cleanup.

    There is a shared space and a private area for all the threads. The private area implements the stack(s) and resides in the negative addresses. The positive addresses are shared though only one thread can "own" a block at once. So there are fewer chances of stack disruption.

    As described before, the "yield" operation marks a block as candidate to belonging to the callee. The "accept" operation can then validate and acknowledge the reception and eventually remap the block into its private space.

    Processor support is required for the best performance but these features could be implemented on existing systems through emulation. It would be slow (yes the Hurd was slow too) but will help refine the API.

    Similarly : no mention of a "kernel" can be found : any "program" is structurally identical. All of them provide a function of some sort and the only difference is the access rights they provide. Each "thread" either inherit these rights from the thread that creates it, or can selectively "drop" some of those rights. This is not even a "microkernel" approach if there is no kernel !

    However in the first implementations, no hardware provides those features and a nanokernel is necessary to perform these while we develop the software stack.

  • Read as Zero

    Yann Guidon / YGDES05/30/2020 at 03:36 7 comments

    I'm not sure why it's not widespread yet, or why it's not implemented or even spoken about but...

    Imagine you free some memory, and the kernel reclaims it for other purposes. It can never be sure the physical data in RAM can be read by another thread to exfiltrate precious information. So the kernel spends some of its precious time clearing pages after pages, just in case, writing 0s all over the place to clean up after the patrons.

    It's something the hardware could do, by marking a page as "read as zero" or "trap on read dirty" in the TLB. Writes would not trap and you can read your own  freshly written data. In fact it's as if you allocated a new cache line without reading the cache...

    The cache knows about the dirty bits, and a coarser "dirty map" could be stored to help with cache lines. That's 128 bits, one for each cache line if lines are 256 bits wide.

    It's still preliminary but I'm sure people have worked on this already... Because scrubbing data was a thing for a long time.

  • POSIX is a curse

    Yann Guidon / YGDES05/30/2020 at 03:14 0 comments

    I'm sorry, dear Mr Stallman, but POSIX is defective by design and even though it was an amazing project 30+ years ago, it is now biting us E V E R Y - S I N G L E - D A Y. What was the very best in the mini- and micro-computers era is now totally unadapted and no hardware tweak or supervisory tool could ever solve the fundamental problems. And all the new language (even the "safe" ones) perpetuate the misconceptions that you could make an unsafe system safe by using a safe language. You should already know that the weakest link breaks first, right ?

    POSIX was never meant to be safe by design. It is inherently tied to the standard libraries, which are inherently designed for and by the C language, which is well known to not be safe...

    • You can change the application programming language : you keep the standard libraries.
    • OK let's change the standard libraries : nobody will want to program for your platform, AND the core OS will remain tied to the standard interfaces...
    • Let's make a novel interface and kernel then ? Oh, others have tried and... they are stuck by performance issues because their HW platform is designed to run POSIX fast.

    This co-evolution creates a macabre dance and there is no solution. The whole thing is rotten to the core but held in place because everybody uses it !


    Of course the "new programming model" requires an adapted platform to run smoothly. However this is not a hard requirement and the system can be emulated at some level or another, to help bootstrap the transition.

    Let's simply start with the most common source of vulnerabilities : buffer overflows. These should NEVER happen. Yet they do. All the time.

    The i286 introduced the BOUND opcode in 1982... Where is it used ???

    Oh yes, segmented and paginated memory is the solution...

    But wait, flat space is sexier and segment registers are PITA.

    Let's make fast SW and throw all the compiler warnings through the window...

    IF you use a language that implements boundary checks. Some are more stringent than others... but speed always ends up being the excuse for indulging in the safety-sin again and again, and processors are tired and tired of checking the same pointers for nothing. Because it can never occur right ?

    iAPX432 wanted to solve this with a micro-managed system with complex properties and it was slow. I understand why... but it would have been safer, right ?


    But wait !

    We're in 2020 and in hindsight (hahaha) 70% of all security bugs are memory safety issues and half of them "are use-after-free vulnerabilities, a type of security issue that arises from incorrect management of memory pointers"

    So you use a block, free it and ... it can still be read ? Why would that be possible ?

    Oh wait : we are bound to use the standard libs, even if our programming language has its own system.


    "There will always be bugs" is an old refrain, and it is not false.

    It is however a lousy pretext to accept preventable bugs and incapable platforms.

    I see a kind of bugs that is not completely dependent on the tools and the programming languages : check the goddamned access rights. Check which process requests what service and in which context.

    This is not the perfect solution but any system call should be easily traced by the callee to the root of the process tree. The whole history and tree of who calls what and with which access rights must be visible by the whole computer. So it's another hardware-assisted feature that must be etched in silicon (or whatever).


    I am not blaming anyone here, and I know extensive efforts are being made, for example with Rust. However this is not a complete solution, as long as the whole computer system co-evolves around old premises dating from the 60s, when things were so simple and laid back.

View all 9 project logs

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 03/16/2024 at 10:51 point

So much is going on right now !!!
#A stack is progressing, I am currently writing an article on this subject.
I am also working on the addressing model, the MSB of the pointers in particular.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/03/2023 at 13:37 point

https://www.youtube.com/watch?v=lc5Np9OqDHU Even Douglas Crockford agrees.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 06/25/2023 at 18:31 point

https://chompie.rip/Blog+Posts/Put+an+io_uring+on+it+-+Exploiting+the+Linux+Kernel LOL.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates