Close

Single instruction format is more than enough!

A project log for SIFP - Single Instruction Format Processor

A super-scalar, reduced instruction set processor where microcode and machine code are the same thing!

zpekiczpekic 12/11/2023 at 03:360 Comments

Even simple 4 or 8-bit microprocessors or microcontrollers have many different instruction formats to encode a wide variety of instructions. How can a CPU work with just one format:

bits15...1211..98...65...32..0
registerPAXYS

Execute general purpose programming language code? 

Let's explain on example of a short routine which outputs to UART (MC6850 or similar) a character string terminated by zero (0x0000, all data is 16-bit, LSB 8-bit contains the ASCII code).

From uart.sif

UART_OutStr:LDA, M[X];
        IF_AZ;
        .branchto @UART_Done - $;
        MARK2;
        BRANCH;
        .into @UART_OutChr - $;
        INX, BRANCH;
        .data @UART_OutStr - $;
UART_Done:  RTS;

 While this looks like regular assembly code, it is actually microcode. So it can be compiled using the "mcc" microcode compiler. This produces following VHDL code (which can be directly compiled into the system ROM):

-- L0046@00E8 0980.UART_OutStr:  LDA, M[X];
--  r_p = 0000, r_a = 100, r_x = 110, r_y = 000, r_s = 000;
232 => X"0" & O"4" & O"6" & O"0" & O"0",

-- L0047@00E9 9000.  IF_AZ;
--  r_p = 1001, r_a = 000, r_x = 000, r_y = 000, r_s = 000;
233 => X"9" & O"0" & O"0" & O"0" & O"0",

-- L0048@00EA 0006.  data16 =  @UART_Done - $;
--  data16 = 0000000000000110;
234 => X"0006",

-- L0049@00EB 6003.  r_p = STP2, r_s = M[PUSH];
--  r_p = 0110, r_a = 000, r_x = 000, r_y = 000, r_s = 011;
235 => X"6" & O"0" & O"0" & O"0" & O"3",

-- L0050@00EC 2000.  BRANCH;
--  r_p = 0010, r_a = 000, r_x = 000, r_y = 000, r_s = 000;
236 => X"2" & O"0" & O"0" & O"0" & O"0",

-- L0051@00ED FFEC.  data16 =  @UART_OutChr - $;
--  data16 = 1111111111101100;
237 => X"FFEC",

-- L0052@00EE 2080.  INX, BRANCH;
--  r_p = 0010, r_a = 000, r_x = 010, r_y = 000, r_s = 000;
238 => X"2" & O"0" & O"2" & O"0" & O"0",

-- L0053@00EF FFF9.  data16 =  @UART_OutStr - $;
--  data16 = 1111111111111001;
239 => X"FFF9",

-- L0054@00F0 4002.UART_Done:  r_p = LDP, r_s = M[POP];
--  r_p = 0100, r_a = 000, r_x = 000, r_y = 000, r_s = 010;
240 => X"4" & O"0" & O"0" & O"0" & O"2",

Each instructions engages 0 to 5 registers present in the CPU simultaneously. Register P (program counter) has 16 possible actions (4-bit in the instruction field), while registers A, X, Y, S have 8 (3-bit fields).

It is useful to look up instruction field definitions in the sifp.mcc file which is included when compiling the assembly code. 

LDA, M[X];

Register X outputs content to address bus adder (code 6). Because no other register does this, the address bus value is 0 + X + 0 + 0 and it is valid (VMA = 1)

Register A loads data from the internal data bus. Because no other register outputs to this bus, it is in read mode (RnW = 1) and because there is valid address (VMA = 1), it means there will be memory read cycle. Loading A affects the AZ flag (1 if value is 0x0000).

All other registers are NOP, unaffected. 


IF_AZ;

During EXECUTE phase, P (program counter) points to next address after the current instruction. P projects to internal address bus, so value is P + 0 + 0 + 0, and VMA = 1. No register is outputing value, so cycle is read (RnW = 1). Based on the value of flag (in this case A[ccumulator]Z[ero]), P is either incremented (no branch) or added with value read from memory (relative branch).

All other registers are unaffected (NOP)

.data16 =  @UART_Done - $;

While all instructions are single word, P register in some cases (such as conditional) increments the value, which allows next word to be data. It this case the relative offset is branch target - current location. Value here is 0x0006, as 6 words forward is the UART_Done label. 

r_p = STP2, r_s = M[PUSH];
MARK2 alias resolves to this sequence. SIFP16 can execute only 1 memory access per instruction, so it can't do both a push of return adress and load of the P (program counter). This instruction first pushes the right return address to stack:

P(rogram counter): value + 2 will be output to internal data bus. This means that RnW = 0 (write to memory signal will be asserted)

S(tack): S will project value S - 1 to internal address bus, which will have value of 0 + 0 + 0 + (S - 1), VMA = 1, so result will be pushing "ahead" value of P to stack, and S will be decremented.

BRANCH;

In second part of subroutine call, a relative jump to new location must be executed. BRANCH is effectively same as IF_TRUE. 

.data16 =  @UART_OutChr - $;
Same relative jump target calculation like above. Note that this call is done using relative target, which allows relocatable code if the target is in the same code block (absolute jumps are also supported, a fancy compiler could differentiate and abstract the difference to produce relocatable modules).

INX, BRANCH;
SIPF CPU can execute up to 5 operations in one instruction, but most often it is 1 or 2 (the code I am running now achieves 1.25 operations per instruction, or 0.625 per clock cycle, or 15.6MOPS at 25MHz clock). Here is the case when 2 independent operations are done simultaneously ("super scalar" :-))

X: increments, updates XZ and XC flags - note that this operation does not project address or data, so no VMA, and RnW remains default 1.

P: projects address so address bus is P + 0 + 0 + 0, VMA = 1, there is memory read cycle (points to relative offset word below), which is added to P.

.data16 =  @UART_OutStr - $;
Calculated offset is here 0xFFF9 which means 7 steps "back" which is the label UART_OutStr. 

r_p = LDP, r_s = M[POP];
RTS alias resolves to these two operations:

P: program counter is loaded with that appears on internal data bus, which is read mode (RnW = 1) because no other register projects value ("write")

S: Stack pointer projects to internal address bus which is 0 + 0 + 0 + S, VMA = 1, this results in read ("pop") memory cycle. At the end of clock cycle, S is incremented (classic stack grows towards smaller memory address)

Very simple combinatorial logic resolves the following rules:

With the rules above, it is also possible to have effective internal data transfer, for example:

INY, LDX, A;

Increments Y while loading X with contents of A - there is no memory operation in this case. 

Discussions