Close

Translated the Pascal Code to C Code

A project log for Simple Compiler

A very simple compiler for minimalist home brew CPUs

agpcooperagp.cooper 06/22/2017 at 16:180 Comments

C Code

The code has been translated. Quite a bit of clean up:

The current version of the tool tranin has been uploaded.

Understanding the Code

The compiler code in most of the minimalist program codes I reviewed are very similar. Even to the point where missing instructions (i.e.">") and constant names are the same (i.e. SETLSS rather than SETLT and SETLEA instead of SETLE).

A good site for compiler construction (but not complete) is http://zserge.com/blog/cucu-part1.html. I will likely refer back to this site for when I add functions to my code.

A good pdf on compiler construction (by Jack W. Crenshaw) is http://compilers.iecc.com/crenshaw/ (I have uploaded is book compiler.pdf).

But the place to go is "Recursive-Descent Parsing" Chapter 6.6 of the AWK book pp 147-152. (I have uploaded a pdf of the book and the programs from Chapter 6).


The NewCPU Model

The CPU model for nearly all the compiler I have reviewed is:

Usually all data is on the stack. The Simple Compiler uses a separate Data Pointer (DP) (i.e. modelling a Pascal style "heap memory"). I will keep this method.

The the assembler code therefore revolves around Ax and Bx:

Two basic jumps:

The remainder maps the language symbols (i.e. '+', '-', '*', '/', '=',' <>','<','<=','>','>=') :

The set commands set (for true) or reset (for false) the AX depending on the Flag register. True is defined as Ax=1 and false is defined as Ax=0.

The following commands set the flag register (Fx):

Finally there is a HALT and a couple of I/O commands:

Language Construct

Languages can me modelled in Backus–Naur Form (BNF). Before extending the language I need to map the existing language. I use the syntax from the AWK handbook for the language construct:

// Factor         -> Identifier
//                -> Integer
//                -> ( BoolExpression )
// Term           -> Factor
//                -> Factor * Factor+
//                -> Factor / Factor+
// Expression     -> Term
//                -> + Term
//                -> - Term
//                -> Term + Term+
//                -> Term - Term+
// BoolExpression -> Expressiom
//                -> Expression = Expression
//                -> Expression <> Expression
//                -> Expression < Expression
//                -> Expression <= Expression
//                -> Expression > Expression
//                -> Expression >= Expression
// Statement      -> Begin
//                -> While
//                -> If
//                -> Write
//                -> Read
//                -> Assignment
// Begin          -> Statement    \\ This is where a statement separator (";") should added!
//                -> End
// While          -> BoolExpression Statement
// If             -> BoolExpression then Statement
//                -> BoolExpression then Statement "else" Statement
// Write          -> BoolExpression , BoolExpression+
// Read           -> Input
// Assignment     -> Identifier = BoolExpression

Note: the "+" at the end of a line means it can be repeated.

Its okay but I think the program should begin with a "begin" and end with an "end". The Statement procedure needs to be modified for this. There probably needs to be a semi-colon (";") as a statement separator as well. I think it might get messy without a statement delimiter later.

Bit-wise operations (i.e. shl, shr, or, and, xor, not) are also missing.

The parser code is complicated because the procedures use the CPU stack to communicate between procedures. But once you realise this it get a bit easier to read.

Fixed the EOF bug in the "BeginState" procedure and some more code cleanup to allow export of the tokeniser data. Pretty happy with the code as it stands and I now understand it fully.

AlanX

Discussions