|Heroic Snowman on Tool’s hell part I: setting up…|
|R on A 3 DOF Robotic Arm Simul…|
|EngineerDog.com on A 3 DOF Robotic Arm Simul…|
|R on Tool’s hell part I: setting up…|
|Lift on Tool’s hell part I: setting up…|
Just another WordPress.com site
So, as promised, this new series of posts will be meatier than the stuff I previously wrote. Brace yourself, we’re in for a bumpy ride.
Today we will discuss the designing and development of a custom basic-to-intermediate Central Processing Unit. Our little CPU will operate in 4 stages (fetch, decode, execute and write back) and will have a fairly basic architecture. The core of it all will be an Arithmetic Logic Unit (ALU) that will be capable of performing additions, subtractions, multiplications and divisions of 8-bit integers along the usual logical ops (comparisons, mainly.)
For the moment, we will avoid branch prediction, so, instructions of the “jump and call” type will not be supported. That is, our CPU won’t decode IF-ELSE and loop statements. (Sorry guys, but we want to keep things simple for the time being).
All right, the goal of this exercise is to develop a custom CPU for implementation in your favorite reconfigurable hardware device (or if you just want to keep it to simulation, that’s ok, but our design should be fully synthesizable).
Hopefully, at the end of this project we’ll be able to provide a program for our CPU to execute. Of course, the program will be written in our basic, custom, machine code. For extra fun, we could write an “assembly” language parser, how nice is that?
II Devising and setting up our CPU architecture
Our CPU will handle 8-bit wide operands. We will set the instruction width to 2 bytes, the upper byte will code up the actual instruction and the lower byte will represent the operand. The following operations will be available (presented in assembly format):
These are the basic ops supported by our CPU, however, more ops could be added fairly easy in the future.
Let’s code up our instructions. We will use the first 6 bits for op coding. Suppose we have 3 registers for internal computations, so, the last 2 bits (from 00 to 10) will encode the destiny register supported by some of our instructions.
For instance, let’s encode the move operation. Say we need to move the number 4 to register number two. Let’s use the following word to encode the operation: 000001 we also need to include where we are moving the operand. We’re moving the operand to register two : 01 (register 1 = 00, register 2 = 01, register 3 = 10) and, finally the operand we’re moving is the number 4: 00000100
The complete machine-code op is: 0000010100000100 — MOV R2, 00000100b0
The supported ops will be encoded as follows:
Let’s also consider that the ALU will always write back to register 1 (00). Some op-encoding examples (actually, this is our very first program. The CPU should be able to run it without any problems):
0 : 0000010100000100; -- MOV R2, 00000100b0 1 : 0000010000000111; -- MOV R1, 00000111b0 2 : 0000100000000001; -- ADD R1, R2 3 : 0000011000000101; -- MOV R3, 00000101b0 4 : 0001001000000111; -- SUB R3, 00000111b0 5 : 0001010100000100; -- MUL R2, 00000100b0 6 : 0001100000000011; -- DIV R1, 00000011b0 7 : 0000000000000000; -- NOP
Nice stuff, eh?
We will build our CPU under a pseudo Von Neumann architecture. We will store instructions and data on the same memory storage device. (For the moment, we will store instructions on a read-only memory and we will store results on CPU registers). We will break up the CPU main functionalities into individual blocks: ALU, Control Unit (Instruction Fetch and Decode), Registers and Memory.
This is the basic CPU block diagram:
Let’s get over what we have here. A ROM block will store the program, the fetch unit will poll the ROM block periodically (we will return to this later) and extract the corresponding instruction, it will then send the instruction to the decoder for further processing. Inside the decoder, each instruction will be divided into its atoms, parsed, and executed accordingly. The decoder unit controls the register bank (for data storing), ALU (for data computing) and a pair of multiplexers (for operand feeding to the ALU).
III CPU pseudo state machine progression
CPUs (mono architectures, at least) are sequential by nature. Superscalar architectures kinda cheat on this, but the very basic CPU always executes only one instruction per instruction cycle, so, the following is a list of the CPU operation cycle:
1 – Fetch instruction to Rom, halt fetch (we don’t want instruction overlapping!). 2 – Send instruction to decode unit. 3 – Decode instruction. 4 – Send operands to ALU/Bank register. 5 – Write back instruction result (in any case, a number must be written to a register). 6 – Instruction execution done, go to 1.
You gotta watch out for any instruction overlapping, that is, we must ensure that every instruction cycle starts with an instruction fetching and ends with a result write back, otherwise instructions could be overlapped and some write backs could be skipped.
That is for now, so far we’ve an idea of what we’re trying to accomplish, in the following part of these posts series we will deal with the ALU, this is the core of the CPU. We will try to implement actual arithmetic circuitry for additions, multiplications and divisions. Remember that subtractions can be performed by the same addition circuitry!
More to follow soon.