Welcome

This is a book about computers. It takes a practical approach, illustrating how a small computer can be designed, and implemented, using a step-by-step approach.

Starting with a simple building block that can store one bit, we continue, via registers and control logic and instruction decoding, towards a design that can run a small program.

We extend the instruction set, by adding instructions for controlling the program flow, and for interacting with the outside world (through a UART).

We stop when we have a computer that can run a program that has been compiled and linked using gcc.

We implement a subset of a real computer architecture - the OR1K architecture. In this way, we can convey the experience of building a real system, while at the same time making the task small enough to be completed without a large implementation effort.

Using an already available architecture also allows us to use available tools, such as this gcc with newlib for OR1K.

The book is designed as a Layered Book. This means that there are common parts, covering the general aspects of computer design, but also specific parts, treating layer-specific material. Each layer represents a particular design language, such as VHDL or Verilog.

You can read the book one layer at the time, but you can also move from one layer to another.

The book has the following layers.

You are now reading the Verilog layer. The purpose of this layer is to show how Verilog can be used to construct a computer that implements a specific architecture.

Moving between layers is done by following links. Here is an example.

This is the Verilog layer The other layers are: VHDL SystemC/TLM

You will see these links throughout the book, e.g. at the beginning or the end of a section. Following such a link will take you to another layer. You will arrive a the new layer at a position corresponding to the position from which you left off.

Acknowledgements

This book has been produced using pandoc and Python.

The html-version of the book has been styled using a slightly modified version of this css file from this pandoc demo page.

Choosing a Language

This is the Verilog layer The other layers are: VHDL SystemC/TLM

We describe our computer using a design language. In this way, we can have a textual representation of the computer, and we can use the textual representation as input to software tools, that will help us to simulate the behavior of our computer.

Verilog is a hardware description language. We use Verilog to describe our computer, and to simulate its functionality. It is also possible to use Verilog to actually synthesize a computer, in real hardware, for example in an FPGA.

Verilog is standardized by IEEE.

Information about Verilog can be found e.g. from Doulos.

Hello World

This is the Verilog layer The other layers are: VHDL SystemC/TLM

A simple example will get us started. We use a classical "Hello, world'' example, which will do nothing meaningful except printing a text string. The code for the example is shown in Figure 1.

The code for the example is from the Icarus Verilog User Guide. We will start using Icarus Verilog in Section Getting some tools.

1
2
3
4
5
6
7
8
9
module main;

initial
  begin
    $display("Hello, world");
    $finish;
  end

endmodule

hello.v

Figure 1. A hello world example in Verilog.

The code in Figure 1 contains a module, which is named main.

An initial block is used, for the purpose of defining the behavior of the module. The block contains a statement for displaying a string, and a statement for finishing the simulation.

We remark that the code in Figure 1 generates an artificial, simulated behavior. It does not provide any code that can be used for synthesizing actual hardware.

You can read about Verilog in Wikipedia, and at other places, such as Doulos, who provides this Verilog Designer's Guide.

Getting some Tools

This is the Verilog layer The other layers are: VHDL SystemC/TLM

We need some tools, in the form of software. We search for software that can be obtained without cost.

We use a Linux computer with Ubuntu 16.04, and a Mac computer with OS X El Capitan.

We decide to use Icarus Verilog.

We can install Icarus Verilog in Ubuntu, by doing

sudo apt-get install iverilog

We can install Icarus Verilog on Mac, by downloading its source code from this Icarus Verilog FTP repository.

The file verilog-20150513.tar.gz can be downloaded from the pre-v10 directory. The source can then be extracted and built, using the commands

tar xvf verilog-20150513.tar.gz 
cd verilog-20150513
./configure
make
sudo make install

Make it Run

This is the Verilog layer The other layers are: VHDL SystemC/TLM

The code in Figure 1 can be compiled and run.

Assuming that the program is stored in a file hello.v, compiling can be done as

iverilog -o hello hello.v

The above command generates the file hello, which can be run, by doing

vvp hello 

which results in the printout

Hello, world

Building a Computer

We have chosen a language, to describe our computer. We have taken a first, tiny step, and we have seen how we can get hold of some tools.

Our goal is to create a computer that can run programs, consisting of instructions. We want the instructions to be generated, using a compiler.

A computer reads instructions from a memory. Each instruction is represented as a sequence of bits. The values of the bits determine the type of instruction, and sometimes also arguments that the instruction shall use. The allowed instructions, for a given computer, belong to the computer's instruction set.

Most computers have instructions for loading data from a memory, and storing data to a memory. Other common instructions are instructions for doing mathematical operations, such as addition and subtraction, and instructions for making decisions. The decisions can be based on evaluations of certain conditions, such as checking if a number is zero, or if a certain bit is set in a piece of data.

An instruction that has been read from memory is decoded, meaning that the computer interprets the bits of the instruction, and then, depending on the values of the bits, takes different actions.

The actions taken are determined by the instructions. As an example, an instruction for addition results in the actual addition of two numbers, and most often also the storing of the result of the addition.

Storing one Bit

We start with a small building block, that can store only one bit. We then extend the building block, so that we can store larger pieces of information. At a certain stage in our development, we are ready to implement our first instruction.

A bit can have the values 0 or 1. In a computer, these values are represented by a low and a high value of an electrical signal.

The value of a bit can be stored. This means that the value is remembered, as long as it is stored. While the value is stored, the value can be read, and used, for the purpose of performing different operations. As an example, a bit could be used in an addition operation, or it could be copied so that it is stored somewhere else, for example at another place in a memory.

A D Flip-flop

This is the Verilog layer The other layers are: VHDL SystemC/TLM

The value of a bit can be stored in a building block called D flip-flop.

A D flip-flop stores one bit of data. A new value can be stored when a clock signal changes value. A component, which can change its stored value only when a clock signal changes, is called a synchronous component.

A D flip-flop implementation in Verilog is shown in Figure 2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
module d_ff(clk, data_in, data_out);

   input clk;
   input data_in;
   output data_out;

   wire clk, data_in;

   reg  reg_value;

   always @(posedge clk)
     reg_value <= data_in;
    
   assign data_out = reg_value;

endmodule    

d_ff.v

Figure 2. A D flip-flop in Verilog.

The code in Figure 2 defines a module. The module has two inputs, called clk and data_in, and one output, called data_out.

The input variables clk and data_in are defined using the keyword input and the output variable data_out is defined using the keyword output.

The input variables clk and data_in are also defined using the keyword wire.

The module defines a register variable called reg_value. The variable reg_value is defined using the keyword reg.

The variable reg_value will contain the actual value stored in the D flip-flop.

The variable reg_value is called a state variable.

An always block is defined using the keyword always. Following the keyword always is an indication, using the word posedge, stating that the actions in the always block shall take place at every rising edge of the clock signal. We see that the only action taken is to assign the value of the input data_in to the state variable reg_value. This assignment ensures that the state variable reg_value is updated at every rising edge of the clock.

An assignment of the variable data_out is done, outside of the always block. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

A Testbench

This is the Verilog layer The other layers are: VHDL SystemC/TLM

The D flip-flop implementation in Figure 2 has inputs and outputs. An external module, referred to as a testbench, can be used for the purpose of generating input signals to the D flip-flop, and observing output signals from the D-flip-flop.

A Verilog testbench is shown in Figure 3.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
`timescale 1ns / 1ns

module d_ff_tb;

   reg clk = 1;
    
   reg d_ff_data_in = 1;
   wire d_ff_data_out;

   initial begin
      $monitor("At time %t, data_in=%b, data_out=%b", 
               $time, d_ff_data_in, d_ff_data_out);
      #16 $finish;
   end

   initial begin
      #1 d_ff_data_in = 0;
      #5 d_ff_data_in = 1;
      #3 d_ff_data_in = 0;
   end
           
   initial begin
      $dumpfile("d_ff_tb_wave.vcd");
      $dumpvars(0,d_ff_0);
   end

   always #2 clk = !clk;

   d_ff d_ff_0(clk, d_ff_data_in, d_ff_data_out);

endmodule

d_ff_tb.v

Figure 3. A D flip-flop testbench in Verilog.

The testbench in Figure 3 starts with a definition of the time scale, stating that nanoseconds (ns) will be used as the time unit. A register variable called clk is defined. This variable represents the clock signal. The actual shape of the clock signal is defined by the line

   always #2 clk = !clk;

The input signal to the D flip-flop is defined by the register variable d_ff_data_in. The values used for the input signal are defined in an initial block, as

   initial begin
      #1 d_ff_data_in = 0;
      #5 d_ff_data_in = 1;
      #3 d_ff_data_in = 0;
   end

The testbench in Figure 3 is a behavioral model. A behavioral model can be used in simulation, but can not be synthesized into a working digital system, for use in e.g. an FPGA or an ASIC.

Build and Run

This is the Verilog layer The other layers are: VHDL SystemC/TLM

A system, containing the D flip-flop in [Figure 2][fig_d_flip_flop] and the testbench in Figure 3, can be analyzed and built using the command

iverilog -o d_ff_tb d_ff.v d_ff_tb.v

The simulation can be run by giving the command

vvp d_ff_tb

The resulting printout is shown in Figure 4.

VCD info: dumpfile d_ff_tb_wave.vcd opened for output.
At time                    0, data_in=1, data_out=1
At time                    1, data_in=0, data_out=1
At time                    4, data_in=0, data_out=0
At time                    6, data_in=1, data_out=0
At time                    8, data_in=1, data_out=1
At time                    9, data_in=0, data_out=1
At time                   12, data_in=0, data_out=0

Figure 4. Printout from running the testbench in Figure 3.

The printout in Figure 4 shows the values of data_in and data_out for a sequence of time instants. The time instants are defined by a $monitor statement inside an initial block in Figure 3, with the effect that a printout is done whenever the time changes, or one of the variables d_ff_data_in or d_ff_data_out changes value.

The changes for the variable d_ff_data_in are defined in an initial block in Figure 3.

The printout in Figure 4 also contains a printout of the file name d_ff_tb_wave.vcd. This is a file where waveform data are stored. The display of waveforms is treated in Section Making Waves.

Making Waves

This is the Verilog layer The other layers are: VHDL SystemC/TLM

The testbench in Figure 3 generates printouts as shown in Figure 4. The printouts show values of digital signals, each having the value one or zero. We can represent these signals as waveforms, with the level of the waveform being one or zero. Thinking of the value one as a high voltage level, and the value zero as a low voltage level, we can think of the waveforms as representing actual voltages, in an actual digital system.

A waveform can be visualized using the GTKWave program. We can download a GTKWave version for Mac, in the form of a zip-file that contains an executable GTKWave program. The GTKWave program can be started from a Mac Terminal, by giving the command open followed by the app file name of the program. As an example, I could start the program by doing

open /Users/oladahl/prog/gtkwave/gtkwave.app

A GTKWave version for Ubuntu can be installed in Ubuntu, by giving the command

sudo apt-get install gtkwave

The program can then be started by giving the command gtkwave.

A waveform can be generated from Verilog by insertion of a $dumpfile statement, followed by a $dumpvars statement indicating from which module the waveform data shall be recorded. An example is seen in the testbench code in Figure 3.

Waveforms, generated from the testbench in Figure 3, are shown in Figure 5.

fig_d_ff_tb_wave_verilog
Figure 5. Waveforms, obtained from running the testbench in Figure 3.

We see in Figure 5 how the waveforms correspond to the printouts shown in Figure 4.

Storing Data in Registers

When a computer executes instructions, it often needs intermediate storage places. Reading instructions from memory, writing results back to memory. For example adding numbers, and writing back only when all numbers have been added. Then registers can be used, to hold the intermediate sum, while the calculation is ongoing. We can refer to such a row using the term register. Another use of registers is for addressing. In this scenario, the value stored in the register is an address, addressing a part of the memory. One such register is holding an address pointing to the next instruction to be executed. This register is referred to as the program counter.

A Register

A D flip-flop can store one bit. We can imagine a register as a row of D flip-flops, each storing one bit, with the possibility to load new values into all D flip-flops simultaneously.

This is the Verilog layer The other layers are: VHDL SystemC/TLM

A register implementation in Verilog is shown in Figure 6.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
module n_bit_register(clk, data_in, data_out);

   parameter N = 8;

   input clk;
   input[N-1:0] data_in;
   output[N-1:0] data_out;

   wire      clk;
   wire [N-1:0]  data_in;
   
   reg [N-1:0]   reg_value;

   always @(posedge clk)
     reg_value <= data_in;
    
   assign data_out = reg_value;

endmodule    

n_bit_register.v

Figure 6. A register in Verilog.

The code in Figure 6 defines a module. The module has two inputs, called clk and data_in, and one output, called data_out.

The module defines a register variable called reg_value, that will contain the actual value stored in the register.

An always block ensures that the state variable reg_value is updated at every rising edge of the clock.

An assignment of the variable data_out is done, outside of the always block. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

A Testbench

An external module, referred to as a testbench, can be used for the purpose of generating input signals to, and observing output signals from, the register in Figure 6.

In the testbench module, we use a parameter, to specify the width of the register.

This is the Verilog layer The other layers are: VHDL SystemC/TLM

The parameter is defined as a Verilog parameter, as

   parameter N=4;

The clock signal is generated using a register variable named clk, defined as

   reg clk = 1;

The actual clock generation is done in an always block, as

   always #2 
     clk = !clk;

The generation of input signals to the register in Figure 6 is done using a Verilog process, as

   always @(posedge clk)
     reg_data_in <= reg_data_in + 1;

The input signal and the output signal are defined as

   reg [N-1:0] reg_data_in = 1'b1;
   wire[N-1:0] reg_data_out;

The signals are used in the instantiation of the register, which is done as

   n_bit_register #(.N(N)) reg_0(clk, reg_data_in, reg_data_out);

The reporting of the results is done in a process, as

   initial begin
      $monitor("At time %t, data_in=%b, data_out=%b", 
               $time, reg_data_in, reg_data_out);
      #16 $finish;
   end

Build and Run

The register in Figure 6 and a testbench, with code as shown in in Section A Testbench, can be built and run.

A makefile can be created. The makefile can contain commands for building and running the register and the testbench.

This is the Verilog layer The other layers are: VHDL SystemC/TLM

A makefile is shown in Figure 7.

1
2
3
4
5
6
7
8
9
SOURCES := n_bit_register.v n_bit_register_tb.v

n_bit_register_tb: $(SOURCES)
    iverilog -o $@ $^

.PHONY: clean

clean: 
    rm n_bit_register_tb

Makefile

Figure 7. A makefile for building and running the register in Figure 6.

It can be seen, in the makefile in Figure 7, that the iverilog command is used, in the same way as described in Section Build and Run in Chapter Storing one bit.

Assume the register is stored in a file named n_bit_register.v, and the testbench is stored in a file named n_bit_register_tb.v. Running the makefile, by giving the command make results in printouts, as

$ make
iverilog -o n_bit_register_tb n_bit_register.v n_bit_register_tb.v

A script file can be created, and used for running the simulated register and the testbench. Using a script file named run.sh, with contents as

#!/bin/bash

vvp n_bit_register_tb

for running the simulation, gives the result as shown in Figure 8.

$ ./run.sh 
VCD info: dumpfile n_bit_register_tb_wave_verilog.vcd opened for output.
At time                    0, data_in=0001, data_out=0001
At time                    4, data_in=0010, data_out=0001
At time                    8, data_in=0011, data_out=0010
At time                   12, data_in=0100, data_out=0011
At time                   16, data_in=0101, data_out=0100

Figure 8. Printouts from a simulation of the register in Figure 6.

We can generate waveforms, in the same way as described in Section Making Waves. The resulting waveform, for the register with printouts as shown above, is displayed in Figure 9.

fig_n_bit_register_tb_wave_verilog
Figure 9. Waveforms from a simulation with printouts as shown in Figure 8.

Our First Instruction

A computer executes programs by following instructions. The instructions belong to an an instruction set. As mentioned in Chapter Welcome, we will use a subset of the OR1K instruction set as the instruction set for our computer.

As a first step, we will try to build a computer with only one instruction. Although somewhat restricted, this computer will be able to

We will start with deciding on a program to run on our computer. The program will be stored in a memory, and its instructions will be read, one by one, and actions will be taken.

A Program

From the OpenRisc Architecture page we can find the OpenRISC 100 Architecture Manual.

From the OpenRISC 100 Architecture Manual, we find the instruction l.movhi rD, K on page 81.

We see that this instruction takes a 16-bit value K, and shifts it left by 16-bits, and then places the resulting value in the register rD.

We also see the instruction format for l.movhi rD, K, with its different fields. There are

We can write the instruction, with the fields as described above, as a 32-bit binary word. This gives

000110DDDDD----0KKKKKKKKKKKKKKKK

The binary instruction format for l.movhi rD, K can also be seen in Section 17 of the OpenRISC 100 Architecture Manual.

Suppose we want to make a program that starts with

  1. storing the value 1 in r0
  2. storing the value 2 in r1
  3. storing the value 3 in r3

The program should then store the value 0 in registers r0, r1, and r2.

This program can be implemented by using the instruction l.movhi rD, K, and choosing different values for rD and K.

Using the instruction format as described above, we find that the resulting program can be written as

00011000000----00000000000000001
00011000001----00000000000000010
00011000010----00000000000000011
00011000000----00000000000000000
00011000001----00000000000000000
00011000010----00000000000000000

Grouping the binary digits in groups of four gives

0001 1000 000- ---0 0000 0000 0000 0001
0001 1000 001- ---0 0000 0000 0000 0010
0001 1000 010- ---0 0000 0000 0000 0011
0001 1000 000- ---0 0000 0000 0000 0000
0001 1000 001- ---0 0000 0000 0000 0000
0001 1000 010- ---0 0000 0000 0000 0000

We can convert the program to a representation where we use hexadecimal numbers. This conversion results in the program shown in Figure 10.

18000001
18200002
18400003
18000000
18200000
18400000

Figure 10. A program using a l.movhi instruction for writing values into registers.

Addressing a Memory

We can store a program, like the program shown in Figure 10, in a memory.

The program in Figure 10 consists of instructions. Each instruction is represented by a 32-bit word.

As a first step towards executing the program, we can create a program counter that reads the 32-bit instructions, one by one, from the memory.

Reading an instruction is done by using the program counter value to address the memory. When we are done with reading an instruction, we might want to read the next instruction.

We could imagine a program counter that refers to a specific 32-bit word, stored in the memory. In a program with 32-bit instructions, like the program in Figure 10, this makes it possible to read the next instruction by incrementing the program counter with one.

Another alternative is to let the program counter represent an address expressed in bytes. In such a situation, we can read the next instruction by incrementing the program counter with four. This type of addressing is referred to as byte-addressing.

This is the Verilog layer The other layers are: VHDL SystemC/TLM

A memory implementation in Verilog is shown in Figure 11.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
module memory(clk, write_enable, address, data_in, data_out);

   parameter address_width = 32;
   parameter data_width = 32;
   parameter size = 256;

   input clk;
   input write_enable;
   input [address_width-1:0] address;
   input [data_width-1:0] data_in;
   output[data_width-1:0] data_out;

   wire clk;
   wire write_enable;
   wire [address_width-1:0] address;
   wire [data_width-1:0] data_in;

   reg [data_width-1:0] memory [0:size-1];

   initial begin
     $readmemh("memory_contents.txt", memory);
   end

   always @(posedge clk) begin
     if (write_enable == 1) 
       memory[address] <= data_in;
   end 

   assign data_out = memory[address];

endmodule    

memory.v

Figure 11. A memory in Verilog.

Create a pc that reads addresses expressed in bytes. Meaning that it increments itself with four for each instruction read. A program counter implementation in Verilog is shown in [Figure 12][fig_pc].

module pc(clk, pc_out);

   parameter pc_width = 32;

   input clk;
   output[pc_width-1:0] pc_out;

   wire clk;
   
   reg [pc_width-1:0] pc_value = 'b0;

   always @(posedge clk)
     pc_value <= pc_value + 4;
    
   assign pc_out = pc_value;

endmodule    

Figure 12. A program counter in Verilog.

Connect the pc and the memory into a design, so that when it runs, the program is read, and printed.

We define signals, such as pc

   wire [address_width-1:0] pc_value;

and data read from the memory

   wire [data_width-1:0] data_out;

and clock signal, as

   reg clk = 0;

The clock signal is generated as

   always #2 clk = ~clk;

Decoding the Instruction

Running the program

Hello Assembly World

The Program

l.andi r0, r0, 0
l.addi r0, r0, 0x9
l.slli r0, r0, 28

l.andi r1, r1, 0
l.addi r1, r1, 72
l.sw 0(r0), r1

Tools

Testing in QEMU

or1k-elf-as -o start.o start.s or1k-elf-ld -T default.ld -o prog.elf start.o /home/ola/prog/qemu/bin/qemu-system-or32 -nographic -kernel prog.elf

Extending our Computer

And with Immediate Half Word

We see the instruction format for l.andi rD, rA, K, with its different fields. There are

We can write the instruction, with the fields as described above, as a 32-bit binary word. This gives

101001DDDDDAAAAAKKKKKKKKKKKKKKKK

The binary instruction format for l.andi rD, rA, K can also be seen in Section 17 of the OpenRISC 100 Architecture Manual.

Suppose we want to make a program that uses the andi

In assembly code, this program would be

    l.movhi r0, 0
    l.ori r0, r0, 15
    l.andi r1, r0, 7
    l.andi r2, r1, 3
    l.andi r3, r2, 1

Using the instruction format as described above, we find that the corresponding machine code program becomes

000110 00000 00000 0000000000000000
101010 00000 00000 0000000000001111
101001 00001 00000 0000000000000111
101001 00010 00001 0000000000000011
101001 00011 00010 0000000000000001

Grouping the binary digits in groups of four gives

0001 1000 0000 0000 0000 0000 0000 0000
1010 1000 0000 0000 0000 0000 0000 1111
1010 0100 0010 0000 0000 0000 0000 0111
1010 0100 0100 0001 0000 0000 0000 0011
1010 0100 0110 0010 0000 0000 0000 0001

We can convert the program to a representation where we use hexadecimal numbers. This conversion results in the program shown in Figure 13.

18000000
A800000F
A4200007
A4410003
A4620001

Figure 13. A program using the instruction l.andi.

Store to memory

Running the Program

Hello C World

The Program

Tools

Testing in QEMU

Extending our Computer

Running the Program