Mirror Mirror On the Wall

If you haven't guessed already, this Log Entry will discuss, among other things, the subject of "reflection", and how we will try to get there from - here, and hopefully back, all in one piece, just in case that seems relevant. In an earlier project, I was doing a bunch of stuff with the identifier and structure record types used by the UCSD Pascal Compiler to store information about identifiers and structures, of course. While debugging a C++ version of that Pascal compiler that I ported from the original UCSD source, I found it necessary to implement, as I have previously mentioned, my own versions of the memory allocation routines, by using the placement new method, so as to be able to create a kind of sandbox, that more accurately represents the environment available to the original 16-bit p-machine, on the one hand, yet while also providing an environment that I think is more accurately suggestive of the environment that we might want on an eventual modern microcontroller implementation even though we are not limited to 16 bits of course.

This led to the need for a debugging environment that contains a mostly complete set of heap-walking tools, which I then proceeded to go at it with, so as to try to get the compiler to in effect, debug itself, with some success, I might add, even if it is not quite ready for generating production code, it is nonetheless, "mostly done", and can compile simple programs.

Now obviously, this suggests something to me. First of all, the original specification of the MODEL and TREE data structures, in the original C code for MegaHal looks like this.

typedef struct NODE {
    BYTE2 symbol;
    BYTE4 usage;
    BYTE2 count;
    BYTE2 branch;
    struct NODE **tree;
} TREE;

typedef struct {
    BYTE1 order;
    TREE *forward;
    TREE *backward;
    TREE **context;
    DICTIONARY *dictionary;
} MODEL;

Now let's take a look at how the NODE and TREE types are shaping up in the C++ version.

class TREE;

class NODE
{
public:
    BYTE2 symbol;
    BYTE4 usage;
    BYTE2 count;
    BYTE2 branch;
    NODE **tree;
    operator TREE* () { return reinterpret_cast<TREE*>(this); }
};

class TREE: public NODE
{
public:    
    static TREE *allocate ();
    static void free_tree(TREE *);
    void load_tree(FILE *file);
    int search_node(int symbol, bool *found_symbol);
    TREE *find_symbol(int symbol);
    TREE *add_symbol(BYTE2 symbol);
    void add_node(TREE *node, int position);
    TREE *find_symbol_add(BYTE2 symbol);
    operator NODE* () { return reinterpret_cast<NODE*>(this); }
};

Now as it turns out, UCSD Pascal also used its own tree structures to store information about identifiers and structures, as I keep on saying, and that data structure in my C++ implementation has taken on a whole new life, something like this:

class identifier:
    public pascal_id<0>,
    public bNodeType<identifier>,
    public pascal_data
{
friend class pascal_id<0>;
public:
    void set_packing (bool pk) { FISPACKED = pk; }
    bool packing (){ return FISPACKED; }
    void attach (CTP ptr)
    {
    markov = static_cast<bNodeType<identifier>*>(ptr);
    }
    CTP next () {
        CTP ptr;
    ptr = static_cast<CTP>(markov);
    return ptr; }
    
protected:
    void *operator new (size_t,void*);
    identifier();
    
public:
    CTP LLINK() {return static_cast<CTP>(branch1); }
    CTP RLINK() {return static_cast<CTP>(branch2); }
};

Now look carefully, and you will see that the identifier class publicly inherits some data from something called a pascal_id<0>, and then there is a bNodeType<identifier> object, which means that the identifier class in effect inherits the properties of a binary node, which is actually the base of a binary tree type, but which through the magic of something called the curiously recurrent template pattern is allowing the identifier class to inherit the properties of a binary node or tree, while simultaneously inheriting the properties of an identifier from the same definition of an identifier that is at the same time being defined. Weird, but true! Now the entire definition of structures and identifiers is actually rather mind-boggling, so I won't copy and paste the entire thing here.

Yet what if I try to modify MegaHal so that "under the hood" so to speak, we could give it some kind of awareness, of things that are, like C/C++ structs and classes, or JavaScript Object Notation, or even Lisp style property lists, and then throw in a ton of heap-walking and debugging tools on top of that, along with a very aggressive and much-improved memory management model, one that is designed not only to be very efficient on 16-bit systems but with is just as easily made 64-bit aware, for other applications.

Remember this, from back in the day?

template <class X>
class bNodeType
{
public:
    bNodeType<X> *root;
    bNodeType<X> *branch1;
    bNodeType<X> *branch2;
    bNodeType<X> *markov;
    
    bNodeType ();
    ~bNodeType ();
    void *bNodeType<X>::operator new (size_t,void*);

    bNodeType<X> *find_node (X &arg);
    char *get_data ()
    {
    char *result = NULL;
    return result;
    }
    bNodeType<X> *add_node (X &arg);
    void put_node (bNodeType<X> *(&));
    void del_tree (bNodeType<X> *);
    void trace_root (bNodeType<X> *(&found));
};

We can inherit from that, and use the modified memory model directly, almost right out of the box!

Oh, La, La La La!

But for whatever it's worth, what if MegaHal's AI model could also do something with this?

STRUCTURE = RECORD
    SIZE: ADDRRANGE;
    CASE FORM: STRUCTFORM OF
    SCALAR:   (CASE SCALKIND: DECLKIND OF
        DECLARED: (FCONST: CTP));
    SUBRANGE: (RANGETYPE: STP; MIN,MAX: VALU);
    POINTER:  (ELTYPE: STP);
    POWER:    (ELSET: STP);
    ARRAYS:   (AELTYPE,INXTYPE: STP;
        CASE AISPACKD:BOOLEAN OF
            TRUE: (ELSPERWD,ELWIDTH: BITRANGE;
        CASE AISSTRNG: BOOLEAN OF
            TRUE:(MAXLENG: 1..STRGLGTH)));
    RECORDS:  (FSTFLD: CTP; RECVAR: STP);
    FILES:    (FILTYPE: STP);
    TAGFLD:   (TAGFIELDP: CTP; FSTVAR: STP);
    VARIANT:  (NXTVAR,SUBVAR: STP; VARVAL: VALU)
END;

Now that of course is how the original Pascal represented the so-called RECORD type, which could just as easily do C-style structs, C++ classes, Lisp-like property lists, JSON, or SQL row sets, or other types of hierarchies, such as the well-known example that the common pet typically referred to as a "dog" is also known by the scientific name Canis familiarise. Pretty much any taxonomy or other hierarchical system can be represented as such, right?

So what would happen if we gave MegaHal the ability to generate complete ASTs (abstract syntax trees), on the one hand, and then let it actually chow down on its own source, on the other, while taking an approach that also allows multiple instances of Mega-Hal to run at the same time, for example, by simply calling new MODEL in C++, and then finding a way to send a prompt to one MODEL based on one training set, possibly running in multiple threads, and then feed that into another MODEL, so one model might be set up to generate a lot of output, that is to say, it could be quite verbose, then another model could be trained on the fly, and then a final model could be used to do a final spelling and grammar check and so-on. This is quite easy if you have functions like create_thread and create_pipe.

Now what may not immediately be obvious, is that those changes can be made, interestingly enough, without adding a lot of code. Well, who knows? Maybe another twenty to fifty thousand lines or so, total. Easy weekend project. Not quite. But pretty close, since quite a lot of very heavy lifting has already been done.

Yet that makes me wonder, what will happen when I try training Mega-Hal on heap walks of its own data structures. At some point, this becomes recursive, or else it just might be a proper form of what is referred to as "reflection", although it is not the same thing as sentience, some people might like to think of it as acting as such, even though at the end of the day, it DOES NOT actually have feelings.

Mirror Mirror On The Wall. Indeed.

Back to the Salt Mines, or else "Somewhere out there?"

Meet the New Bot, Same as the Old Bot?

Discussions

Become a Hackaday.io Member