Into Computers

Ola Dahl

February 6, 2020

Welcome

This is a book about computers. It describes how a small computer can be designed and implemented, using a step-by-step approach.

Starting with a simple building block that can store one bit, we continue, via registers and control logic and instruction decoding, towards a design that can run a small program.

We extend the instruction set, by adding instructions for controlling the program flow, and for interacting with the outside world, through a UART.

We stop when we have a computer that can run a program that has been compiled and linked using gcc.

We implement a subset of a real computer architecture - the RISC-V architecture. In this way, we can convey the experience of building a real system, while at the same time making the task small enough to be completed without a large implementation effort.

Using an already available architecture also allows us to use available tools, such as this RISC-V toolchain.

The book is designed as a Layered Book. This means that there are common parts, covering the general aspects of computer design, but also specific parts, treating layer-specific material. Each layer represents a particular design language, such as VHDL or Verilog.

You can read the book one layer at the time, but you can also move from one layer to another.

The book has the following layers.

You are now reading the SystemC/TLM layer. The purpose of this layer is to show how SystemC and TLM can be used to construct a computer that implements a specific architecture.

Moving between layers is done by following links. Here is an example.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

You will see these links throughout the book, e.g. at the beginning or the end of a section. Following such a link will take you to another layer. You will arrive a the new layer at a position corresponding to the position from which you left off.

Acknowledgements

This book has been produced using pandoc and Python.

The html-version of the book has been styled using a slightly modified version of this css file from this pandoc demo page.

Choosing a Language

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We describe our computer using a design language. In this way, we can have a textual representation of the computer, and we can use the textual representation as input to software tools, that will help us to simulate the behavior of our computer.

SystemC is a C++ library that makes it possible to create event-based simulations. TLM is an additional library that makes it possible to do transaction level simulation. We use SystemC and TLM to describe our computer, and to simulate its functionality.

SystemC and TLM are standardized by IEEE and Accellera.

Accellera provides information about SystemC and TLM.

Hello World

This is the SystemC/TLM layer The other layers are: VHDL Verilog

A simple example will get us started. We use a classical “Hello, world’’ example, which will do nothing meaningful except printing a text string. The code for the example is shown in Figure 1.

#include "systemc"
#include "tlm.h"

#include <iostream>

int sc_main(int argc, char* argv[])
{
    std::cout << "Hello, world" << std::endl; 
    sc_core::sc_start();
    return 0;
}

hello.cpp

Figure 1. A hello world example in SystemC.

The code in Figure 1 starts with three include directives. The first and the second include directive include functionality for SystemC and TLM, respectively. The third include directive includes the iostream header, which contains functionality for printing text.

The code in Figure 1 contains a function named sc_main. The function starts with a statement that prints a string. The simulation is then started, by calling the function sc_core::sc_start. In the next statement, the value zero is returned.

You can read about in SystemC in Wikipedia, and at other places, such as Doulos, who provides information about SystemC and TLM.

Getting some Tools

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We need some tools, in the form of software. We search for software that can be obtained without cost.

We use a Linux computer with Ubuntu 18.04, and a Mac computer with macOS Mojave.

We can download SystemC and TLM from Accellera. We download the SystemC 2.3.3 version by using the link named Core SystemC Language and Examples.

We unpack the file, using the command

tar zxvf systemc-2.3.3.tar.gz

Build and install can be done by first creating and visiting a directory objdir as

cd systemc-2.3.3
mkdir objdir
cd objdir/

Configure and build can be done as

../configure --prefix=/usr/local/systemc-2.3.3 CXXFLAGS="-Wall -std=gnu++11"
sudo mkdir /usr/local/systemc-2.3.3
make

We can test the build, by doing

make check

which takes some time to execute.

As a last step, we can install by doing

sudo make install

Make it Run

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The program in Figure 1 can be compiled, linked, and run.

Assuming that the program is stored in a file hello.cpp, compiling can be done as

g++ -Wall -std=gnu++11 -c -I /usr/local/systemc-2.3.3/include hello.cpp

The above command generates the object file hello.o, which can be linked into an executable program named hello.

The linking is simplified by setting an environment variable L_SYSTEMC. On Linux, we set the environment variable as

L_SYSTEMC=/usr/local/systemc-2.3.3/lib-linux64

and on Mac we set it as

L_SYSTEMC=/usr/local/systemc-2.3.3/lib-macosx64

The linking is now done by the command

g++ -o hello hello.o -L $L_SYSTEMC -lsystemc

The program can be run on Linux by giving the command

LD_LIBRARY_PATH=$L_SYSTEMC ./hello

and on Mac by giving the command

./hello

The resulting printout is of the form

        SystemC 2.3.3-Accellera --- Jun 22 2019 19:36:41
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED 
Hello, world

where the date corresponds to the date when you built the SystemC library, as described in Section Getting some tools.

Building a Computer

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We have chosen a language, to describe our computer. We have taken a first, tiny step, and we have seen how we can get hold of some tools.

Our goal is to create a computer that can run programs, consisting of instructions. We want the instructions to be generated, using a compiler.

A computer reads instructions from a memory. Each instruction is represented as a sequence of bits. The values of the bits determine the type of instruction, and sometimes also arguments that the instruction shall use. The allowed instructions, for a given computer, belong to the computer’s instruction set.

Most computers have instructions for loading data from a memory, and storing data to a memory. Other common instructions are instructions for doing mathematical operations, such as addition and subtraction, and instructions for making decisions. The decisions can be based on evaluations of certain conditions, such as checking if a number is zero, or if a certain bit is set in a piece of data.

An instruction that has been read from memory is decoded, meaning that the computer interprets the bits of the instruction, and then, depending on the values of the bits, takes different actions.

The actions taken are determined by the instructions. As an example, an instruction for addition results in the actual addition of two numbers, and most often also the storing of the result of the addition.

Storing one Bit

We start with a small building block, that can store only one bit. We then extend the building block, so that we can store larger pieces of information. At a certain stage in our development, we are ready to implement our first instruction.

A bit can have the values 0 or 1. In a computer, these values are represented by a low value and a high value of an electrical signal.

The value of a bit can be stored. This means that the value is remembered, as long as it is stored. While the value is stored, the value can be read, and used, for the purpose of performing different operations. As an example, a bit could be used in an addition operation, or it could be copied so that it is stored somewhere else, for example at another place in a memory.

A D Flip-flop

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The value of a bit can be stored in a building block called D flip-flop.

A D flip-flop stores one bit of data. A new value can be stored when a clock signal changes value. A component, which can change its stored value only when a clock signal changes, is called a synchronous component.

A D flip-flop implementation in SystemC is shown in Figure 2.

#include "d_ff.h"

SC_HAS_PROCESS(d_ff);

d_ff::d_ff(sc_core::sc_module_name name):
    sc_module(name) 
{
    SC_METHOD(update);
    sensitive << clk.pos(); 
}

void d_ff::update()
{
    reg_value = data_in.read();
    data_out.write(reg_value);
}

d_ff.cpp

Figure 2. A D flip-flop in SystemC.

The code in Figure 2 starts with an include directive. The include directive refers to a header file called d_ff.h. The header file defines a class called d_ff, as

class d_ff : sc_core::sc_module
{
  public: 
    sc_in<bool> clk;
    sc_in<bool> data_in;
    sc_out<bool> data_out;
    d_ff(sc_core::sc_module_name name); 
  private:
    void update();
    bool reg_value;
}; 

The class defines two inputs, called clk and data_in, and one output, called data_out.

The class also defines a function called update, and a variable called reg_value. The variable reg_value is defined using the keyword bool.

The variable reg_value will contain the actual value stored in the D flip-flop.

The variable reg_value is called a state variable.

The class inherits from another class, called sc_core::sc_module. In this way, the class becomes a SystemC module.

The code in Figure 2 defines the function update to be a SystemC process. This is done using the keyword SC_METHOD. In addition, it defines the process update to be sensitive to rising edges of the clock signal. The result of the sensitivity definition is that the function update will be called at every positive edge of the clock signal.

We see, from the contents of the function update, that the value of the input data_in is assigned to the state variable reg_value. This assignment ensures that the state variable reg_value is updated at every rising edge of the clock.

An assignment of the variable data_out is also done, inside the function update. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

A Testbench

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The D flip-flop implementation in Figure 2 has inputs and outputs. An external module, referred to as a testbench, can be used for the purpose of generating input signals to the D flip-flop, and observing output signals from the D-flip-flop.

A SystemC testbench is shown in Figure 3.

#include "d_ff_tb.h"

SC_HAS_PROCESS(d_ff_tb);

d_ff_tb::d_ff_tb(sc_core::sc_module_name name):
    sc_module(name),
    d_ff_0("d_ff_0"),
    clk("d_ff_clk", 4, SC_NS, 1.0)
{
    SC_THREAD(stim_gen);
    d_ff_0.clk(clk); 
    d_ff_0.data_in(d_ff_data_in); 
    d_ff_0.data_out(d_ff_data_out);
    SC_METHOD(reporter);
    sensitive << d_ff_0.clk.pos();
    sensitive << d_ff_data_in; 
}

void d_ff_tb::stim_gen()
{
    d_ff_data_in.write(true);
    wait(1, SC_NS);
    d_ff_data_in.write(false);
    wait(5, SC_NS);
    d_ff_data_in.write(true);
    wait(3, SC_NS);
    d_ff_data_in.write(false);
    wait(); 
}

void d_ff_tb::reporter()
{
    std::cout << "Time: " << sc_time_stamp(); 
    std::cout << ", data_in=" << d_ff_data_in.read(); 
    std::cout << ", data_out=" << d_ff_data_out.read()
              << std::endl;
}

d_ff_tb.cpp

Figure 3. A D flip-flop testbench in SystemC.

The code in Figure 3 starts with an include directive. The include directive refers to a header file called d_ff_tb.h. The header file defines a class called d_ff_tb, as

class d_ff_tb : sc_core::sc_module
{
  public: 
    d_ff_tb(sc_core::sc_module_name name); 
    d_ff d_ff_0; 
  private:
    sc_clock clk; 
    sc_signal<bool> d_ff_data_in; 
    sc_signal<bool> d_ff_data_out;
    void stim_gen();
    void reporter(); 
}; 

The code in Figure 3 defines the functions stim_gen and reporter to be SystemC processes. This is done using the keywords SC_THREAD and SC_METHOD.

The process named reporter is responsible for printout of results, and it is made sensitive to changes in the clock signal, or changes in the input signal to the D flip-flop, as

    sensitive << d_ff_0.clk.pos();
    sensitive << d_ff_data_in; 

The clock signal is defined by a variable called clk. The actual shape of the clock signal is defined by the arguments to the constructor for the class clk, as seen in the line

    clk("d_ff_clk", 4, SC_NS, 1.0)

The input signal to the D flip-flop is defined by the variable d_ff_data_in. The values used for the input signal are defined in the function stim_gen, as

void d_ff_tb::stim_gen()
{
    d_ff_data_in.write(true);
    wait(1, SC_NS);
    d_ff_data_in.write(false);
    wait(5, SC_NS);
    d_ff_data_in.write(true);
    wait(3, SC_NS);
    d_ff_data_in.write(false);
    wait(); 
}

Build and Run

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The D flip-flop in Figure 2 and the testbench in Figure 3 can be compiled, using

g++ -Wall -std=gnu++11 -c -I /usr/local/systemc-2.3.3/include d_ff_tb.cpp
g++ -Wall -std=gnu++11 -c -I /usr/local/systemc-2.3.3/include d_ff.cpp
g++ -Wall -std=gnu++11 -c -I /usr/local/systemc-2.3.3/include d_ff_tb_main.cpp

The combined system, containing the D flip-flip and the testbench, can be created by linking the object files from the compilation into an executable program. The linking is simplified by setting an environment variable L_SYSTEMC, as described in Section Make it Run.

An executable program can be generated by giving the command

g++ -o d_ff_tb_main d_ff_tb_main.o d_ff_tb.o d_ff.o -L $L_SYSTEMC -lsystemc

The program can now be run, in Linux by giving the command

LD_LIBRARY_PATH=$L_SYSTEMC ./d_ff_tb_main

and on Mac by giving the command

./d_ff_tb_main 

The resulting printout is shown in Figure 4.

        SystemC 2.3.3-Accellera --- Jun 22 2019 19:36:41
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED
Time: 0 s, data_in=0, data_out=0
Time: 0 s, data_in=1, data_out=0

Info: (I702) default timescale unit used for tracing: 1 ps (d_ff_tb_wave.vcd)
Time: 1 ns, data_in=0, data_out=1
Time: 4 ns, data_in=0, data_out=1
Time: 6 ns, data_in=1, data_out=0
Time: 8 ns, data_in=1, data_out=0
Time: 9 ns, data_in=0, data_out=1
Time: 12 ns, data_in=0, data_out=1
Time: 16 ns, data_in=0, data_out=0

Figure 4. Printout from running the testbench in Figure 3.

The printout in Figure 4 shows the values of data_in and data_out for a sequence of time instants. The time instants are defined by the sensitivity statements for the SystemC method named reporter in Figure 3, with the effect that the SystemC method reporter is executed whenever the clock signal has a rising edge, or the variable d_ff_data_in changes value. The changes for the variable d_ff_data_in are defined in the function d_ff_tb::stim_gen in Figure 3.

The printout in Figure 4 also contains a printout of the file name d_ff_tb_wave.vcd. This is a file where waveform data are stored. The display of waveforms is treated in Section Making Waves.

Making Waves

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The testbench in Figure 3 generates printouts as shown in Figure 4. The printouts show values of digital signals, each having the value one or zero. We can represent these signals as waveforms, with the level of the waveform being one or zero. Thinking of the value one as a high voltage level, and the value zero as a low voltage level, we can think of the waveforms as representing actual voltages, in an actual digital system.

A waveform can be visualized using the GTKWave program. We can download a GTKWave version for Mac, in the form of a zip-file that contains an app folder with an executable GTKWave program. The GTKWave program can be started from a Mac Terminal (after first having started it via the GUI via right-clicking on the app folder of the program and selecting Open - since I was sure that I wanted to open it) by giving the command open followed by the app file name of the program. As an example, I could start the program by doing

open /Users/ola/prog/gtkwave/gtkwave.app

A GTKWave version for Ubuntu can be installed in Ubuntu, by giving the command

sudo apt-get install gtkwave

The program can then be started by giving the command gtkwave.

A waveform can be generated from SystemC by calling the function sc_create_vcd_trace_file, and then calling a function sc_trace, for determining which data that shall be recorded.

For the D flip-flop example, with build and run instructions as described in Section Build and Run, we put the waveform-related code in the file d_ff_tb_main.cpp. This file is referred to in the compilation commands, shown in Section Build and Run, and its contents are shown here, as

#include "systemc.h"
#include "d_ff_tb.h"

int sc_main(int argc, char* argv[])
{
    d_ff_tb d_ff_tb_0("d_ff_tb_0");
    
    sc_trace_file *d_ff_tb_wave =
    sc_create_vcd_trace_file("d_ff_tb_wave");

    sc_trace(d_ff_tb_wave, d_ff_tb_0.d_ff_0.clk, "clk");
    sc_trace(d_ff_tb_wave, d_ff_tb_0.d_ff_0.data_in, "data_in");
    sc_trace(d_ff_tb_wave, d_ff_tb_0.d_ff_0.data_out, "data_out");
    
    sc_start(17, SC_NS); 

    return 0;
}

Waveforms, generated from the testbench in Figure 3, are shown in Figure 5.

fig_d_ff_tb_wave

Figure 5. Waveforms, obtained from running the testbench in Figure 3.

We see in Figure 5 how the waveforms correspond to the printouts shown in Figure 4.

Storing Data in Registers

When a computer executes instructions, it often needs intermediate storage places. As an example, consider an addition of two data items, both stored in memory. In this situation, it might be convenient to read the data items from memory and store them in an intermediate storage place, from where the inputs to the addition operation can be taken. The result of the addition could also be stored in the intermediate storage area, before it is transferred to memory.

An intermediate storage place can consist of a register, or a set of registers. A register typically allows faster accesses, for reading and writing data, than a memory.

A set of registers could be used when performing an addition. Two items of data could be read from memory, and stored in two registers. A third register, or one of the two already used, could be used to store the result of the addition, before it is written back to memory.

Registers can also be used to hold other types of values. As an example, a register is often used for holding the current value of the program counter

We could also use registers for holding status bits, that provide information about the result of a computation. One example of such a register is a status register. A status register can hold information indicating, for example, if an addition resulted in overflow, or if an operation resulted in a zero value.

A set of registers, organized together, so that it is possible to refer to each of the individual registers, for example using an address, can be called a register file.

A Register

A D flip-flop can store one bit. We can imagine a register as a row of D flip-flops, each storing one bit, with the possibility to load new values into all D flip-flops simultaneously.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

A register implementation in SystemC is shown in Figure 6.

#include "n_bit_register.h"

SC_HAS_PROCESS(n_bit_register);

n_bit_register::n_bit_register(sc_core::sc_module_name name, int N):
    sc_module(name),
    reg_value_min(-(1 << N)),
    reg_value_max( (1 << N) - 1 )
{
    SC_METHOD(update);
    sensitive << clk.pos(); 
}

void n_bit_register::update()
{
    reg_value = data_in.read();
    if (reg_value < reg_value_min || reg_value > reg_value_max)
    {
        std::cerr << "reg_value " << reg_value << "is out of "
                  << "range [" << reg_value_min << ", "
                  << reg_value_max << ",]" << "\n";
    }
    data_out.write(reg_value);
}

n_bit_register.cpp

Figure 6. A register in SystemC.

The code in Figure 6 starts with an include directive. The include directive refers to a header file n_bit_register.h, which defines a class called n_bit_register, as

class n_bit_register : sc_core::sc_module
{
  public: 
    sc_in<bool> clk;
    sc_in<int> data_in;
    sc_out<int> data_out;
    n_bit_register(sc_core::sc_module_name name, int N); 
  private:
    void update();
    int reg_value;
    const int reg_value_min;
    const int reg_value_max;
}; 

The class defines two inputs, called clk and data_in, and one output, called data_out.

The class also defines a function called update, and a variable called reg_value.

The variable reg_value will contain the actual value stored in the register.

The code in Figure 6 defines, using the keyword SC_METHOD, the function update to be a SystemC process. In addition, it defines the process update to be sensitive to rising edges of the clock signal. The result of the sensitivity definition is that the function update will be called at every positive edge of the clock signal.

We see, from the contents of the function update, that it ensures that the state variable reg_value is updated at every rising edge of the clock.

The function update also checks if the value of reg_value is within the allowed range. If this is not the case, an error message is printed.

The range for reg_value is set in the constructor for the class n_bit_register.

An assignment of the variable data_out is also done, inside the function update. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

A Testbench

An external module, referred to as a testbench, can be used for the purpose of generating input signals to, and observing output signals from, the register in Figure 6.

In the testbench module, we use a parameter, to specify the width of the register.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The parameter is defined using a C++ define directive, as

#define N 4

The clock signal is generated using a variable of the class sc_clock, defined as

    sc_clock clk; 

The actual clock generation is done using parameters specified in the instantiation of the clk variable. This is done in by instantiating the clk variable inside the constructor. The constructor is implemented as

n_bit_register_tb::n_bit_register_tb(sc_core::sc_module_name name):
    sc_module(name),
    n_bit_register_0("n_bit_register_0", N),
    clk("n_bit_register_clk", 4, SC_NS, 1.0), 
    data_in_value(1)
{
    n_bit_register_0.clk(clk); 
    n_bit_register_0.data_in(n_bit_register_data_in); 
    n_bit_register_0.data_out(n_bit_register_data_out);
    SC_METHOD(stim_gen);
    sensitive << n_bit_register_0.clk.pos();
    SC_METHOD(reporter);
    sensitive << n_bit_register_0.clk.pos();
}

and the instantiation of the clk variable is done in the constructor initialization block, as

    clk("n_bit_register_clk", 4, SC_NS, 1.0), 

The generation of input signals to the register in Figure 6 is done using a SystemC process, defined as a function as

void n_bit_register_tb::stim_gen()
{
    n_bit_register_data_in.write(data_in_value++);
}

and made into a process by the SC_METHOD directive. The SC_METHOD directive is used in the constructor, as shown above.

The input signal and the output signal are defined as

    sc_signal<int> n_bit_register_data_in; 
    sc_signal<int> n_bit_register_data_out;

The signals are used in the instantiation of the register, which is done in the constructor, shown above.

The reporting of the results is done in a process, defined as a function as

void n_bit_register_tb::reporter()
{
    std::cout << "Time: " << sc_time_stamp(); 
    std::cout << ", data_in=" << std::bitset<N> (n_bit_register_data_in.read());
    std::cout << ", data_out=" << std::bitset<N> (n_bit_register_data_out.read())
          << std::endl;
}

and made into a process by the SC_METHOD directive. The SC_METHOD directive is used in the constructor, as shown above.

Build and Run

The register in Figure 6 and a testbench, with code as shown in in Section A Testbench, can be built and run.

A makefile can be created. The makefile can contain commands for building and running the register and the testbench.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

A makefile is shown in Figure 7.

UNITS := n_bit_register n_bit_register_tb
HEADER_ONLY_UNITS := 
MAIN_UNIT := n_bit_register_tb_main

OBJS := $(addsuffix .o, $(UNITS) $(MAIN_UNIT))
HEADERS := $(addsuffix .h, $(UNITS) $(HEADER_ONLY_UNITS))

SYSTEMC := /usr/local/systemc-2.3.3
INCLUDE_DIR := $(SYSTEMC)/include
UNAME := $(shell uname)
ifeq ($(UNAME),Darwin)
  LIB_DIR_NAME := lib-macosx64
else
  LIB_DIR_NAME := lib-linux64
endif
LIB_DIR := $(SYSTEMC)/$(LIB_DIR_NAME)/
LIB_NAME := systemc

$(MAIN_UNIT): $(OBJS)
    g++ -o $@ $^ -L $(LIB_DIR) -l $(LIB_NAME)

%.o: %.cpp $(HEADERS)
    g++ -Wall -std=gnu++11 -c -g -I $(INCLUDE_DIR) $<

.PHONY: clean

clean: 
    rm $(MAIN_UNIT) $(OBJS)

Makefile

Figure 7. A makefile for building and running the register in Figure 6.

It can be seen, in the makefile in Figure 7, that the g++ command is used, in the same way as described in Section Build and Run in Chapter Storing one bit.

Assume the register is stored in files named n_bit_register.h and n_bit_register.cpp, and the testbench is stored in files named n_bit_register_tb.h and n_bit_register_tb.cpp. Assume also that a main program is stored in a file named n_bit_register_tb_main.cpp.

Running the makefile, by giving the command make results in printouts, as

$ make
g++ -c -Wall -g -I /usr/local/systemc-2.3.1/include n_bit_register.cpp
g++ -c -Wall -g -I /usr/local/systemc-2.3.1/include n_bit_register_tb.cpp
g++ -c -Wall -g -I /usr/local/systemc-2.3.1/include n_bit_register_tb_main.cpp
g++ -o n_bit_register_tb_main n_bit_register.o n_bit_register_tb.o n_bit_register_tb_main.o -L /usr/local/systemc-2.3.1/lib-macosx64/ -lsystemc 

A script file can be created, and used for running the simulated register and the testbench. Using a script file named run.sh, with contents as

#!/bin/bash

SYSTEMC=/usr/local/systemc-2.3.3

UNAME=$(uname)
if ! [ "$UNAME" = "Darwin" ]; then
  LIB_DIR_NAME=lib-linux64
else
  LIB_DIR_NAME=
fi    
echo $LIB_DIR_NAME

LIB_DIR=$SYSTEMC/$LIB_DIR_NAME

LD_LIBRARY_PATH=$LIB_DIR ./n_bit_register_tb_main

for running the simulation, gives the result as shown in Figure 8.

        SystemC 2.3.3-Accellera --- Jun 22 2019 19:36:41
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED
Time: 0 s, data_in=0000, data_out=0000
Time: 0 s, data_in=0001, data_out=0000

Info: (I702) default timescale unit used for tracing: 1 ps (n_bit_register_tb_wave_systemc_tlm.vcd)
Time: 4 ns, data_in=0010, data_out=0001
Time: 8 ns, data_in=0011, data_out=0010
Time: 12 ns, data_in=0100, data_out=0011
Time: 16 ns, data_in=0101, data_out=0100

Figure 8. Printouts from a simulation of the register in Figure 6.

We can generate waveforms, in the same way as described in Section Making Waves. The resulting waveform, for the register with printouts as shown above, is displayed in Figure 9.

fig_n_bit_register_tb_wave_systemc_tlm

Figure 9. Waveforms from a simulation with printouts as shown in Figure 8.

Our First Instruction

A computer executes programs by following instructions. The instructions belong to an instruction set. As mentioned in Chapter Welcome, we will use a subset of the RISC-V architecture as the instruction set for our computer.

As a first step, we will try to build a computer with only one instruction. Although somewhat restricted, this computer will be able to

We will start with deciding on a program to run on our computer. The program will be stored in a memory, and its instructions will be read, one by one, and actions will be taken.

A Program

From the RISC-V architecture page, we can download the the RISC-V Instruction Set Manual ISA.

We look for an instruction that can load a value into a register. Using such an instruction, we can create a small program that loads specified values into some of the registers.

We choose to used the RV32I Base Integer Instruction Set, which is described in Chapter 2 of ISA.

The instructions in this instruction set set are 32 bits.

The bits in an instruction are numbered, with 31 for the leftmost bit, down to 0 for the rightmost bit.

We use the notation b1:b2 to describe a range of bits, such as 31:0 for describing all 32 bits, or e.g. 7:0 for describing the rightmost byte.

In Section 2.3 of ISA we can see how 32-bit instructions that handle immediate data are encoded.

One instruction format is called U-type. The bits in U-type instructions are described as

In Section 2.4 of ISA, we find a description of the instruction LUI, which stands for load upper immediate, and which is used to “build 32-bit constants”.

We also see that the LUI instruction “places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros”.

We conclude that

In Chapter 25 in ISA, in Table 25.1, we see that the opcode for LUI is 0110111.

We can write the LUI instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in Figure 10, as

imm (31:12), rd (11:7), LUI opcode(0110111)

Figure 10. Instruction format for the LUI instruction, adapted from Table 25.1 in ISA.

In order to create a program that stores values into registes, we should select which registers to use.

In Section 2.1 in ISA, we see that there are 32 registers, each 32-bits wide, referred to as registers x0 to x31.

We also see that in register x0, all bits are hardwired to the value zero.

For the other registers, we see, in Chapter 26 in ISA, in Table 26.1, that the registers have different roles. In these rolese, the registers have alternative names, indicating their roles.

For example, register x1 (named ra) is used as return address and register x2 (named sp) is used as stack pointer.

There are also registers that are used for storage of temporary values, such as x5 (named t0, and also serving as alternate link registers), and x6 and x7 (named t1 and t2, respectively).

Using the alternative names, which are referred to ABI names in Table 26.1 in ISA, and which also can be referred to as assembler mnemonics due to their usage in assembly programs (ref URL), we can create a program that performs actions, as

  1. write three different values to registers t0, t1, and t2.
  2. write the value zero to registers t0, t1, and t2

In assembly language, we could write a program, using lowercase for the instruction name, as

lui t0, 1
lui t1, 2
lui t2, 3
lui t0, 0
lui t1, 0
lui t2, 0

Figure 11. An assembly program, using a LUI instruction to write values to registers.

Here we should note, again, that the LUI instruction writes the immediate value, which is 1 for the first instruction in our program, to the top 20 bits of the destination register, which for this instruction is t0, while at the same time filling in the lowest 12 bits with zeros.

For the first instruction in Figure 11, which is

lui t0, 1

this means that the number being stored in t0 is 1 followed by 12 zeros. In binary form, this becomes

1000000000000

Counting the bits from right to left, with the rightmost bit having number zero, we know, from the properties of binary numbers (ref) that the n^th bit has the weight 2^n.

In this number, all weights are zero except for bit number 12. This gives the corresponding decimal number as

2^12 = 4096

We can write this number also in hexadecimal form. One way of arriving at the hexadecimal representation is to start with the binary representation, in this case

1000000000000

and then group the bits, in groups of four bits in each group. This gives

1 0000 0000 0000

We then let each group of four bits be represented by one hexadecimal digit. Using the prefix 0x, which is commonly used for to indicate that a number is hexadecimal, we get

1 0000 0000 0000 = 0x1000

In a similar way, we can calculate the value that will be stored in register t1, by the instruction

lui t1, 2

as

10 0000 0000 0000 = 0x2000

which, when converted to decimal form, becomes

0x2000 = 8192

For the third instruction in Figure 11,

lui t2, 3

the corresponding calculation yields

0x3000 = 12288

In order to run the program in Figure 11 on our computer, which will be build in the sections that follow, we need to write the program using binary code.

We saw , in Chapter 25 in ISA, in Table 25.1, that the opcode for LUI is 0110111, and we have seen how the top 20 bits of the value to be stored in the destination register are represented in the instruction.

We see, in Chapter 26 in ISA, in Table 26.1, how registers t0, t1, and t2 are ABI names for the registers x5, x6, and x7.

Using the numeric values 5, 6, and 7 for these registers, we can now write the program in Figure 11 in binary code, as

00000000000000000001 00101 0110111
00000000000000000010 00110 0110111
00000000000000000011 00111 0110111
00000000000000000000 00101 0110111
00000000000000000000 00110 0110111
00000000000000000000 00111 0110111

Grouping the binary digits in groups of four gives

0000 0000 0000 0000 0001 0010 1011 0111
0000 0000 0000 0000 0010 0011 0011 0111
0000 0000 0000 0000 0011 0011 1011 0111
0000 0000 0000 0000 0000 0010 1011 0111
0000 0000 0000 0000 0000 0011 0011 0111
0000 0000 0000 0000 0000 0011 1011 0111

We can convert the program to a representation where we use hexadecimal numbers. This conversion results in the program shown in Figure 12.

0x000012B7
0x00002337
0x000033B7
0x000002B7
0x00000337
0x000003B7

Figure 12. A binary program, using a LUI instruction to write values to registers.

Addressing a Memory

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We can store a program, like the program shown in Figure 12, in a memory.

The program in Figure 12 consists of instructions. Each instruction is represented by a 32-bit word.

As a first step towards executing the program, we can create a program counter that reads the 32-bit instructions, one by one, from a memory.

Reading an instruction is done by using the program counter value to address the memory. When we are done with reading an instruction, we might want to read the next instruction.

We could imagine a program counter that refers to a specific 32-bit word, stored in the memory. In a program with 32-bit instructions, like the program in Figure 12, this makes it possible to read the next instruction by adding one to the program counter.

Another alternative is to let the program counter represent an address expressed in bytes. In such a situation, we can read the next instruction by incrementing the program counter by four. This type of addressing is referred to as byte-addressing.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We can implement a memory in SystemC by using TLM. We use TLM as a means to model communication over a memory-mapped bus.

A header file for a memory implementation in SystemC/TLM is shown in Figure 13.

#ifndef MEMORY_H
#define MEMORY_H

#include "tlm.h"
#include "tlm_utils/simple_target_socket.h"

class Memory : sc_core::sc_module
{
public: 
    tlm_utils::simple_target_socket<Memory> socket; 
    Memory(sc_core::sc_module_name name, int size); 
private: 
    void read_contents_from_file(const char *file_name, int *mem, int max_values);
    void b_transport(tlm::tlm_generic_payload& trans, sc_core::sc_time& delay); 
    unsigned int size;
    int *mem;
}; 

#endif

memory.h

Figure 13. A header file for a memory implementation in SystemC/TLM.

The header file in Figure 13 defines a C++ class called Memory.

An array named mem, which contains elements of type int, is used to represent the actual storage. The array is defined, using a C++ pointer, on line 16 in Figure 13.

The memory contents are initialized in the constructor (URL). The constructor, which is declared on line 11 in Figure 13, is implemented in a file named memory.cpp, as

Memory::Memory(sc_core::sc_module_name name, int size): 
    sc_module(name),
    socket("socket"),
    size(size)
{
    mem = new int[size];
    read_contents_from_file("memory_contents.txt", mem, size);
    socket.register_b_transport(this, &Memory::b_transport);
}

The constructor calls a function named read_contents_from_file, which reads data from a file named memory_contents.txt, and stores the data into the array mem.

The function b_transport, declared at line 14 in Figure 13 and implemented in the file memory.cpp, as

void Memory::b_transport(tlm::tlm_generic_payload& trans, sc_core::sc_time& delay)
{
    tlm::tlm_command cmd = trans.get_command();
    unsigned char *data_ptr = trans.get_data_ptr(); 
    unsigned int data_length = trans.get_data_length();
    sc_dt::uint64 address = trans.get_address(); 
    if (address > size - 1)
    {
    std::cout << "ERROR : address " << address << " out of range!" << std::endl;
        trans.set_response_status(tlm::TLM_ADDRESS_ERROR_RESPONSE);
        return;
    }
    if (cmd == tlm::TLM_READ_COMMAND)
    {
        memcpy(data_ptr, &mem[address], data_length);
    }
    else if (cmd == tlm::TLM_WRITE_COMMAND)
    {
        memcpy(&mem[address], data_ptr, data_length);
    }
    else
    {
        std::cout << "Error: received neither write nor read command" << std::endl; 
    }
}

handles reading of data from, and writing of data to, memory.

Data is written, when the command in the TLM transaction (URL) is set to tlm::WRITE_COMMAND, as

    else if (cmd == tlm::TLM_WRITE_COMMAND)
    {
        memcpy(&mem[address], data_ptr, data_length);
    }

by copying data_length bytes from the TLM transaction, as indicated by the pointer data_ptr, to one of the elements in the memory array. The element to be written is defined by the variable address, which is read from the transaction, together with other variables used in the memory access, in the beginning of the b_transport function, as

    tlm::tlm_command cmd = trans.get_command();
    unsigned char *data_ptr = trans.get_data_ptr(); 
    unsigned int data_length = trans.get_data_length();
    sc_dt::uint64 address = trans.get_address(); 

Data is read from the memory, when the command in the TLM transaction is set to tlm::READ_COMMAND, as

    if (cmd == tlm::TLM_READ_COMMAND)
    {
        memcpy(data_ptr, &mem[address], data_length);
    }

by copying data_length bytes from the memory array, at a position defined by the variable address, to the TLM transaction, as indicated by the pointer data_ptr.

A program counter implementation in SystemC/TLM is shown in Figure 14.

#ifndef PC_H
#define PC_H

#include "tlm.h"
#include "tlm_utils/simple_initiator_socket.h"

class Pc : sc_core::sc_module
{
public: 
    tlm_utils::simple_initiator_socket<Pc> socket; 
    Pc(sc_core::sc_module_name name); 
private:
    void process(); 
    int pc_value; 
    int data_read; 
}; 

#endif

pc.h

Figure 14. A program counter in SystemC/TLM.

The current value of the program counter is represented by a variable named pc_value, which is defined on line 14 in Figure 14.

The variable pc_value is updated in a SystemC process (URL) named process, declared on line 13 in Figure 14, and implemented in a file named pc.cpp.

The update is done by adding the value 4 to pc_value, as

    pc_value += 4;

We can connect the memory in Figure 13 with the program counter in Figure 14. By doing so, we can use the program counter to address a memory, where a program is stored. We can then read instructions, one by one, by incrementing the program counter.

We can not yet decode the instructions.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The connection of the memory in Figure 13 with the program counter in Figure 14 can be done in a testbench, stored in a file named addressing_tb.cpp.

We define variables, for the program counter, as

    Pc *pc; 

and for the memory, as

    Memory *memory; 

We instantiate the variables, as

        pc = new Pc("cpu"); 
        memory = new Memory("memory", size); 

We connect the program counter and the memory, using a TLM initiator socket (URL), defined on line 10 in Figure 14, and a TLM target socket (URL), defined on line 10 in Figure 13, as

        pc->socket.bind(memory->socket); 

The actual reading of data from the memory is done in the SystemC process named process, in the file pc.cpp, by first computing the memory address from the program counter, as

        trans->set_address(pc_value >> 2); 

A b_transport (URL) call, using the initiator socket in the program counter is then done, as

        socket->b_transport(*trans, delay);

As a preparation for the next read, the program counter is updated, as

    pc_value += 4;

We prepare the memory contents in a file, with contents corresponding to program shown in Figure 12, as

000012B7
00002337
000033B7
000002B7
00000337
000003B7

When we run the simulation, we get


        SystemC 2.3.3-Accellera --- Jun 22 2019 19:36:41
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED
pc_value=0000000000000000, data_read=00000000000000000001001010110111
pc_value=0000000000000100, data_read=00000000000000000010001100110111
pc_value=0000000000001000, data_read=00000000000000000011001110110111
pc_value=0000000000001100, data_read=00000000000000000000001010110111
pc_value=0000000000010000, data_read=00000000000000000000001100110111
pc_value=0000000000010100, data_read=00000000000000000000001110110111
pc_value=0000000000011000, data_read=00110010010111110111010001101110

We see that the memory contents, written in binary from when we run the simulation, correspond to the program in Figure 12.

Decoding the Instruction

This is the SystemC/TLM layer The other layers are: VHDL Verilog

From Section A Program, we know that the LUI instruction, which we use in the program in Figure 11, has a format that consists of three parts, as illustrated in Figure 10.

We can create a simple instruction decoder that, given an input in the form of an instruction having the same format as the LUI instruction, generates output data, in the form of

  1. a 32-bit immediate value, with bits 31:12 given by the corresponding bits in the instruction, and with bits 11:0 set to zero.

  2. A 5-bit register id, as given by bits 11:7 in the instruction.

In SystemC/TLM, we choose to implement the instruction decoding as a member function named idecode in a SystemC module named Cpu.

The function, which decodes instructions with instruction format as specified in Figure 10, is shown in Figure 15.

void Cpu::idecode(uint32_t instruction, uint32_t *reg_id_d, uint32_t *imm_value)
{
    *reg_id_d = (instruction & 0xF80) >> 7;
    *imm_value = instruction & 0xFFFFF000;
}

cpu.cpp

Figure 15. An instruction decoding function for instructions having the same format as the LUI instruction.

The instruction decoding function in Figure 15 is called from a SystemC process in the SystemC module Cpu.

The SystemC process reads instructions from a memory, in a for loop which has a first line as

    for (int i = 0; i < n_reads; i++)

The for loop defines two variables, for the register id and for the immediate value encoded in the LUI instruction, as

        uint32_t reg_id_d;
        uint32_t imm_value; 

These variabes are assigned, when the function idecode is called, inside the for loop, as

        idecode(data_read, &reg_id_d, &imm_value);

Registers, as described in Section A Register can be combined into a register file.

We define the register file as a C++ array, in the header file for the SystemC module Cpu, as

    static const int n_registers = 3;
    uint32_t registers[n_registers];

cpu.h

Figure 16. A register file with three registers.

A computer capable of running the program in Figure 11 can now be constructed, by connecting the memory in Figure 13, the program counter in Figure 14, the instruction decoder in Figure 15, and the register file in Figure 16.

We can do these connections in a testbench, stored in a file named one_instruction_tb.cpp.

In the testbench, we define a SystemC module

class One_Instruction_Tb : sc_core::sc_module

with instance variables for the CPU and the memory, as

    Cpu *cpu; 
    Memory *memory; 

The instance variables are instantiated, in the module constructor, as

        cpu = new Cpu("cpu"); 
        memory = new Memory("memory", size); 

The module constructor also binds an initiator socket, defined in the file cpu.h, as

    tlm_utils::simple_initiator_socket<Cpu> socket; 

to a target socket, defined in the file memory.h, as

    tlm_utils::simple_target_socket<Memory> socket; 

by calling the SystemC function bind on the initiator socket, as

        cpu->socket.bind(memory->socket); 

The program counter, which is defined in the file cpu.h as

    uint32_t pc_value; 

is used when computing the memory address, in the file cpu.cpp as

        pc_value += 4;

The memory address is used, when decoding and executing instructions, in a for-loop, implemented in the SystemC process

void Cpu::process()

in the file cpu.cpp.

The for-loop uses a TLM transaction (URL?), defined and instantiated as

    tlm::tlm_generic_payload* trans = new tlm::tlm_generic_payload;

The transaction is initiated as

    cmd = tlm::TLM_READ_COMMAND; 
    trans->set_command(cmd);
    unsigned char *data_ptr = reinterpret_cast<unsigned char*>(&data_read); 
    trans->set_data_ptr(data_ptr); 
    trans->set_data_length(4); 

and used in the for-loop, with the address defined by the program counter, as

        trans->set_address(pc_value >> 2); 

to read the instruction from the memory. This is done by calling b_transport (URL?) on the initiator socket, as

        socket->b_transport(*trans, delay);

The complete processing of an instruction, which consists of reading the instruction, following by decoding the instruction and writing to the register specified in the instruction, becomes

        trans->set_address(pc_value >> 2); 
        socket->b_transport(*trans, delay);
        idecode(data_read, &reg_id_d, &imm_value);
        reg_write(reg_id_d, imm_value);

A block diagram of the design is shown in Figure 17.

fig_dia_one_instruction

Figure 17. A block diagram of our first computer, capable of running programs with LUI instructions.

The block diagram in Figure 17 shows the program counter, in a block labelled PC. The program counter addresses the memory, which results in an instruction being read. The instruction is used as input to the instruction decoder, in a block labelled Idecode.

The instruction decoder decodes the instruction, which in this case results in the fields imm and reg id, shown also in Figure 10, being extracted from the instruction, and used as input to the register file, here represented by a block labelled Registers.

Running the program

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We store the memory contents, corresponding to the program in Figure 12, in a file memory_contents.txt.

This file will be read, during startup, and stored in the memory shown in Figure 13.

We use reporting statements, in the Cpu SystemC module in cpu.cpp, to illustrate the execution of the program, as

        std::bitset<16> pc_value_binary(pc_value);
        std::bitset<32> data_read_binary(data_read);
        std::bitset<5> reg_id_d_binary(reg_id_d);
        std::bitset<32> imm_value_binary(imm_value); 
        std::cout << "pc_value=" << pc_value_binary << "\n";
        std::cout << "data_out=" << data_read_binary << "\n";
        std::cout << "reg_id_d=" << reg_id_d_binary << "\n";
        std::cout << "imm_value=" << imm_value_binary << "\n"; 
        for (int reg_number = 0; reg_number < 3; reg_number++)
        {
            std::bitset<32> reg_value_binary(registers[reg_number]);  
            std::cout << "reg_" << reg_number << "_value=" << reg_value_binary << "\n"; 
        }

These statements give a printout of

Running the program gives, for the first three instructions, a printout as

pc_value=0000000000000000
data_out=00000000000000000001001010110111
reg_id_d=00101
imm_value=00000000000000000001000000000000
reg_0_value=00000000000000000001000000000000
reg_1_value=00000000000000000000000000000000
reg_2_value=00000000000000000000000000000000
pc_value=0000000000000100
data_out=00000000000000000010001100110111
reg_id_d=00110
imm_value=00000000000000000010000000000000
reg_0_value=00000000000000000001000000000000
reg_1_value=00000000000000000010000000000000
reg_2_value=00000000000000000000000000000000
pc_value=0000000000001000
data_out=00000000000000000011001110110111
reg_id_d=00111
imm_value=00000000000000000011000000000000
reg_0_value=00000000000000000001000000000000
reg_1_value=00000000000000000010000000000000
reg_2_value=00000000000000000011000000000000

We see, for the program counter values expressed in binary as 0, 100, and 1000 (corresponding to decimal values 0, 4, and 8), that the three registers have values as expected from the first three instructions in the program in Figure 11.

For the last three instructions, we get

pc_value=0000000000001100
data_out=00000000000000000000001010110111
reg_id_d=00101
imm_value=00000000000000000000000000000000
reg_0_value=00000000000000000000000000000000
reg_1_value=00000000000000000010000000000000
reg_2_value=00000000000000000011000000000000
pc_value=0000000000010000
data_out=00000000000000000000001100110111
reg_id_d=00110
imm_value=00000000000000000000000000000000
reg_0_value=00000000000000000000000000000000
reg_1_value=00000000000000000000000000000000
reg_2_value=00000000000000000011000000000000
pc_value=0000000000010100
data_out=00000000000000000000001110110111
reg_id_d=00111
imm_value=00000000000000000000000000000000
reg_0_value=00000000000000000000000000000000
reg_1_value=00000000000000000000000000000000
reg_2_value=00000000000000000000000000000000

We see, for the program counter values expressed in binary as 1100, 10000, and 10100 (corresponding to decimal values 12, 16, and 20), that the three registers have values as expected from the last three instructions in the program in Figure 11.

Hello Assembly World

FROM HERE ON THE BOOK IS IN A MORE WORK-IN-PROGRESS STATE

WORK IS ONGOING TO COMPLETE THE BOOK, AND RELEASE IT

The Program

l.andi r0, r0, 0
l.addi r0, r0, 0x9
l.slli r0, r0, 28

l.andi r1, r1, 0
l.addi r1, r1, 72
l.sw 0(r0), r1

Tools

Testing in QEMU

or1k-elf-as -o start.o start.s or1k-elf-ld -T default.ld -o prog.elf start.o /home/ola/prog/qemu/bin/qemu-system-or32 -nographic -kernel prog.elf

Extending our Computer

And with Immediate Half Word

We see the instruction format for l.andi rD, rA, K, with its different fields. There are

We can write the instruction, with the fields as described above, as a 32-bit binary word. This gives

101001DDDDDAAAAAKKKKKKKKKKKKKKKK

The binary instruction format for l.andi rD, rA, K can also be seen in Section 17 of the [OpenRISC 100 Architecture Manual][openrisc_arch_manual].

Suppose we want to make a program that uses the andi

In assembly code, this program would be

    l.movhi r0, 0
    l.ori r0, r0, 15
    l.andi r1, r0, 7
    l.andi r2, r1, 3
    l.andi r3, r2, 1

Using the instruction format as described above, we find that the corresponding machine code program becomes

000110 00000 00000 0000000000000000
101010 00000 00000 0000000000001111
101001 00001 00000 0000000000000111
101001 00010 00001 0000000000000011
101001 00011 00010 0000000000000001

Grouping the binary digits in groups of four gives

0001 1000 0000 0000 0000 0000 0000 0000
1010 1000 0000 0000 0000 0000 0000 1111
1010 0100 0010 0000 0000 0000 0000 0111
1010 0100 0100 0001 0000 0000 0000 0011
1010 0100 0110 0010 0000 0000 0000 0001

We can convert the program to a representation where we use hexadecimal numbers. This conversion results in the program shown in Figure 13.

18000000
A800000F
A4200007
A4410003
A4620001

Figure 13. A program using the instruction l.andi.

Store to memory

Running the Program

Hello C World

The Program

Tools

Testing in QEMU

Extending our Computer

Running the Program

References

[ISA], The RISC-V Instruction Set Manual Volume I: Unprivileged ISA, available at this RISC-V architecture page