Into Computers

Ola Dahl

July 2, 2022

1 Welcome

This is the SystemC/TLM layer The other layers are: VHDL Verilog

This is a book about computers. It describes how a small computer can be designed and implemented, using a step-by-step approach.

Starting with a simple building block that can store one bit, we continue, via registers and control logic and instruction decoding, towards a design that can run a small program.

We extend the instruction set, by adding instructions for controlling the program flow, and for interacting with the outside world, through a UART.

We stop when we have a computer that can run a program that has been compiled and linked using gcc.

We implement a subset of a real computer architecture - the RISC-V architecture. In this way, we can convey the experience of building a real system, while at the same time making the task small enough to be completed without a large implementation effort.

Using an already available architecture also allows us to use available tools, such as this RISC-V toolchain.

The book is designed as a Layered Book. This means that there are common parts, covering the general aspects of our computer design, but also specific parts, treating layer-specific material. Each layer represents a particular design language, such as VHDL or Verilog.

You can read the book one layer at the time, but you can also move from one layer to another.

The book has the following layers.

You are now reading the SystemC/TLM layer. The purpose of this layer is to show how SystemC and TLM can be used to construct a computer that implements a specific architecture.

Moving between layers is done by following links. Here is an example.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

You will see these links throughout the book, e.g. at the beginning or the end of a section. Following such a link will take you to another layer. You will arrive a the new layer at a position corresponding to the position from which you left off.

1.1 Software

The software accompanying the book is available in a Git repo on GitHub.

Examples in the book contain links to files in the repo.

1.2 Acknowledgements

This book has been produced using pandoc and Python.

The html-version of the book has been styled using a slightly modified version of this css file from this pandoc demo page.

1.3 Choosing a Language

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We describe our computer using a design language. In this way, we can have a textual representation of the computer, and we can use the textual representation as input to software tools, that will help us to simulate the behavior of our computer.

SystemC is a C++ library that makes it possible to create event-driven simulations. TLM is an additional library that makes it possible to do transaction level simulation. We use SystemC and TLM to describe our computer, and to simulate its functionality.

SystemC and TLM are standardized by IEEE and Accellera.

Accellera provides information about SystemC and TLM.

1.4 Hello World

This is the SystemC/TLM layer The other layers are: VHDL Verilog

A simple example will get us started. We use a hello, world example, which will do nothing meaningful except printing a text string.

The code for the example is shown in Figure 1.

#include "systemc"
#include "tlm.h"

#include <iostream>

int sc_main(int argc, char* argv[])
{
    std::cout << "Hello, world" << std::endl; 
    sc_core::sc_start();
    return 0;
}

hello.cpp

Figure 1. A hello world example in SystemC.

The code in Figure 1 starts with three include directives. The first and the second include directive include functionality for SystemC and TLM, respectively. The third include directive includes the iostream header, which contains functionality for printing text.

The code in Figure 1 contains a function named sc_main. The function starts with a statement that prints a string. The simulation is then started, by calling the function sc_core::sc_start. In the next statement, the value zero is returned.

You can read about in SystemC in Wikipedia, and at other places, such as Doulos, who provides information about SystemC and TLM.

1.5 Getting some Tools

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We need some tools, in the form of software. We search for software that can be obtained without cost.

We use a Linux computer with Ubuntu, and a Mac computer with macOS Big Sur.

We decide to use SystemC and TLM from Accellera.

Installation instructions for the chosen tools can be found in the book software repo, on this tools page.

1.6 Make it Run

The program in Figure 1 can be compiled, linked, and run.

Instructions for doing this can be found in the book software repo, for Ubuntu and Mac.

1.7 Building a Computer

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We have chosen a language, to describe our computer. We have taken a first, tiny step, by installing some tools, and building and running a hello, world example.

Our goal is to create a computer that can run programs, consisting of instructions. We want the instructions to be generated, using a compiler.

A computer reads instructions from a memory. Each instruction is represented as a sequence of bits. The values of the bits determine the type of instruction, and sometimes also arguments that the instruction shall use. The allowed instructions, for a given computer, belong to the computer’s instruction set.

Most computers have instructions for loading data from a memory, and for storing data to a memory. Other common instructions are instructions for doing mathematical operations, such as addition and subtraction, and instructions for making decisions. The decisions can be based on evaluations of certain conditions, such as checking if a number is zero, or if a certain bit is set in a piece of data.

An instruction that has been read from memory is decoded, meaning that the computer interprets the bits of the instruction, and then, depending on the values of the bits, takes different actions.

The actions taken are determined by the instructions. As an example, an instruction for addition results in the actual addition of two numbers, and most often also the storing of the result of the addition.

2 Storing one Bit

We start with a small building block, that can store only one bit. We then extend the building block, so that we can store larger pieces of information. At a certain stage in our development, we are ready to implement our first instruction.

A bit can have the values 0 or 1. In a computer, these values are represented by a low value and a high value of an electrical signal.

The value of a bit can be stored. This means that the value is remembered, as long as it is stored. While the value is stored, the value can be read, and used, for the purpose of performing different operations. As an example, the value of a bit could be used in an addition operation, or it could be copied so that it is stored somewhere else, for example at another place in a memory.

2.1 A D Flip-flop

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The value of a bit can be stored in a building block called D flip-flop.

A D flip-flop stores one bit of data. A new value can be stored when a clock signal changes value. A component, which can change its stored value only when a clock signal changes, is called a synchronous component.

A D flip-flop implementation in SystemC is shown in Figure 2.

class d_ff : sc_core::sc_module
{
    SC_HAS_PROCESS(d_ff);

public:
    sc_core::sc_in<bool> clk;
    sc_core::sc_in<bool> data_in;
    sc_core::sc_out<bool> data_out;

    bool reg_value;

private:
    void update()
    {
        reg_value = data_in.read();
        data_out.write(reg_value);
    }

public:
    d_ff(sc_core::sc_module_name name):
        sc_module(name) 
    {
        SC_METHOD(update);
        sensitive << clk.pos(); 
    }
}; 

d_ff.h

Figure 2. A D flip-flop in SystemC.

The code in Figure 2 defines a class called d_ff.

The class d_ff inherits from another class, called sc_core::sc_module. In this way, the class becomes a SystemC module.

The class defines two inputs, called clk and data_in, and one output, called data_out.

It also defines a variable reg_value.

This variable will contain the actual value stored in the D flip-flop.

The variable reg_value is called a state variable.

The class defines a function called update.

The function update is defined, using the keyword SC_METHOD on line 23, to be a a SystemC method process.

It can also be seen, on line 24, that the process update is defined to be sensitive to rising edges of the clock signal. The result of the sensitivity definition is that the function update will be called at every positive edge of the clock signal.

We see, from the contents of the function update, that the value of the input variable data_in is assigned to the state variable reg_value. This assignment ensures that the state variable reg_value is updated at every rising edge of the clock.

An assignment of the variable data_out is also done, inside the function update. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

2.2 A Testbench

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The D flip-flop implementation in Figure 2 has inputs and outputs. An external module, referred to as a testbench, can be used for the purpose of generating input signals to the D flip-flop, and observing output signals from the D-flip-flop.

A SystemC module, implementing a SystemC testbench is shown in Figure 3.

class d_ff_tb : sc_core::sc_module
{
    SC_HAS_PROCESS(d_ff_tb);

    d_ff d_ff_0; 
    sc_core::sc_clock clk; 
    sc_core::sc_signal<bool> d_ff_data_in; 
    sc_core::sc_signal<bool> d_ff_data_out;

    void stim_gen()
    {
        d_ff_data_in.write(false);
        wait(5, sc_core::SC_NS);
        d_ff_data_in.write(true);
        wait(4, sc_core::SC_NS);
        d_ff_data_in.write(false);
        wait(); 
    }

    void reporter()
    {
        std::cout << "Time: " << sc_core::sc_time_stamp(); 
        std::cout << ", clk=" << clk.read(); 
        std::cout << ", data_in=" << d_ff_data_in.read(); 
        std::cout << ", data_out=" << d_ff_data_out.read()
                  << std::endl;
    }

public: 
    d_ff_tb(sc_core::sc_module_name name):
        sc_core::sc_module(name),
        d_ff_0("d_ff_0"),
        clk("d_ff_clk", 4, sc_core::SC_NS, 0.5, 2, sc_core::SC_NS, true)
    {
        SC_THREAD(stim_gen);
        d_ff_0.clk(clk); 
        d_ff_0.data_in(d_ff_data_in); 
        d_ff_0.data_out(d_ff_data_out);
        SC_METHOD(reporter);
        sensitive << d_ff_0.clk.pos();
        sensitive << d_ff_data_in; 
    }

    void init_sc_trace()
    {
        sc_core::sc_trace_file *d_ff_tb_wave =
            sc_core::sc_create_vcd_trace_file("d_ff_tb_systemc_tlm_wave");

        sc_core::sc_trace(d_ff_tb_wave, d_ff_0.clk, "clk");
        sc_core::sc_trace(d_ff_tb_wave, d_ff_0.data_in, "data_in");
        sc_core::sc_trace(d_ff_tb_wave, d_ff_0.data_out, "data_out");
        sc_core::sc_trace(d_ff_tb_wave, d_ff_0.reg_value, "reg_value");
    } 
}; 

d_ff_tb.h

Figure 3. A D flip-flop testbench in SystemC.

The code in Figure 3 implementes a SystemC module, in the form of a C++ class called d_ff_tb.

The class in Figure 3 has member functions stim_gen and reporter , which are defined to be SystemC processes, using the keywords SC_THREAD and SC_METHOD.

The process named reporter is responsible for printout of results, and it is made sensitive to changes in the clock signal, or changes in the input signal to the D flip-flop, as

        sensitive << d_ff_0.clk.pos();
        sensitive << d_ff_data_in; 

The clock signal is defined by a variable called clk. The actual shape of the clock signal is defined by the arguments to the constructor for the class clk, as seen in the line

        clk("d_ff_clk", 4, sc_core::SC_NS, 0.5, 2, sc_core::SC_NS, true)

The input signal to the D flip-flop is defined by the variable d_ff_data_in. The values used for the input signal are defined in the function stim_gen, as

    void stim_gen()
    {
        d_ff_data_in.write(false);
        wait(5, sc_core::SC_NS);
        d_ff_data_in.write(true);
        wait(4, sc_core::SC_NS);
        d_ff_data_in.write(false);
        wait(); 
    }

2.3 Build and Run

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The D flip-flop in Figure 2 and the testbench in Figure 3 can be built, and the testbench can be run, using the makefile and the run script in the flip_flop/systemc_tlm directory in the book repo.

The resulting printout from running the testbench is shown in Figure 4.

        SystemC 2.3.3-Accellera --- Sep 21 2021 06:25:05
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED
Time: 0 s, clk=0, data_in=0, data_out=0

Info: (I702) default timescale unit used for tracing: 1 ps (d_ff_tb_systemc_tlm_wave.vcd)
Time: 2 ns, clk=1, data_in=0, data_out=0
Time: 5 ns, clk=0, data_in=1, data_out=0
Time: 6 ns, clk=1, data_in=1, data_out=0
Time: 9 ns, clk=0, data_in=0, data_out=1
Time: 10 ns, clk=1, data_in=0, data_out=1
Time: 14 ns, clk=1, data_in=0, data_out=0

Figure 4. Printout from running the testbench in Figure 3.

The printout in Figure 4 shows the values of data_in and data_out for a sequence of time instants. The time instants are defined by the sensitivity statements for the SystemC method named reporter in Figure 3, with the effect that the SystemC method reporter is executed whenever the clock signal has a rising edge, or the variable d_ff_data_in changes value. The changes for the variable d_ff_data_in are defined in the function d_ff_tb::stim_gen in Figure 3.

The printout in Figure 4 also contains a printout of the file name d_ff_tb_wave.vcd. This is a file where waveform data are stored. The display of waveforms is treated in Section Making Waves.

2.4 Making Waves

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The testbench in Figure 3 generates printouts as shown in Figure 4. The printouts show values of digital signals, each having the value one or zero. We can represent these signals as waveforms, with the level of the waveform being one or zero. Thinking of the value one as a high voltage level, and the value zero as a low voltage level, we can think of the waveforms as representing actual voltages, in an actual digital system.

A waveform can be visualized using the GTKWave program.

Instructions for installing and running GTKWave are found in the book software repo.

A waveform can be generated from SystemC by calling the function sc_create_vcd_trace_file, and then calling a function sc_trace, for determining which data that shall be recorded.

For the D flip-flop example, with build and run instructions as described in Section Build and Run, we put the waveform-related code in the file d_ff_tb_main.cpp. This file is referred to in the compilation commands, shown in Section Build and Run, and its contents are shown here, as

#include "systemc"
#include "d_ff_tb.h"

int sc_main(int argc, char* argv[])
{
    d_ff_tb d_ff_tb_0("d_ff_tb_0");

    d_ff_tb_0.init_sc_trace();
    
    sc_core::sc_start(17, sc_core::SC_NS); 

    return 0;
}

Waveforms, generated from the testbench in Figure 3, are shown in Figure 5.

systemc_tlm_d_ff_tb_wave Figure 5. Waveforms, obtained from running the testbench in Figure 3.

We see in Figure 5 how the waveforms correspond to the printouts shown in Figure 4.

3 Storing Data in Registers

When a computer executes instructions, it often needs intermediate storage places. As an example, consider an addition of two data items, both stored in memory. In this situation, it might be convenient to read the data items from memory and store them in an intermediate storage place, from where the inputs to the addition operation can be taken. The result of the addition could also be stored in the intermediate storage area, before it is transferred to memory.

An intermediate storage place can consist of a register, or a set of registers. A register typically allows faster accesses, for reading and writing data, than a memory.

A set of registers could be used when performing an addition. Two items of data could be read from memory, and stored in two registers. A third register, or one of the two already used, could be used to store the result of the addition, before it is written back to memory.

Registers can also be used to hold other types of values. As an example, a register is often used for holding the current value of the program counter

We could also use registers for holding status bits, that provide information about the result of a computation. One example of such a register is a status register. A status register can hold information indicating, for example, if an addition resulted in overflow, or if an operation resulted in a zero value.

A set of registers, organized together, so that it is possible to refer to each of the individual registers, for example using an address, can be called a register file.

3.1 A Register

A D flip-flop can store one bit. We can imagine a register as a row of D flip-flops, each storing one bit, with the possibility to load new values into all D flip-flops simultaneously.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

A register implementation in SystemC is shown in Figure 6.

class n_bit_register : sc_core::sc_module
{
public: 
    sc_in<bool> clk;
    sc_in<int> data_in;
    sc_out<int> data_out;

    const int reg_value_min;
    const int reg_value_max;

    int reg_value;
    
    void update()
    {
        reg_value = data_in.read();
        if (reg_value < reg_value_min || reg_value > reg_value_max)
        {
            std::cerr << "reg_value " << reg_value << "is out of "
                      << "range [" << reg_value_min << ", "
                      << reg_value_max << ",]" << "\n";
        }
        data_out.write(reg_value);
    }

    SC_HAS_PROCESS(n_bit_register);

    n_bit_register(sc_core::sc_module_name name, int N):
        sc_module(name),
        reg_value_min(-(1 << N)),
        reg_value_max( (1 << N) - 1 )
    {
        SC_METHOD(update);
        sensitive << clk.pos(); 
    }
}; 

n_bit_register.h

Figure 6. A register in SystemC.

The code in Figure 6 defines a SystemC module, in th form of a class named n_bit_register.

The class defines two inputs, called clk and data_in, and one output, called data_out.

The class also defines a variable called reg_value.

The variable reg_value will contain the actual value stored in the register.

A function update, which updates the variable reg_value, is also defined.

The code in Figure 6 defines, using the keyword SC_METHOD, the function update to be a SystemC process. In addition, it defines the process update to be sensitive to rising edges of the clock signal. The result of the sensitivity definition is that the function update will be called at every positive edge of the clock signal.

As a result, is the variable reg_value is updated at every rising edge of the clock.

The function update also checks if the value of reg_value is within the allowed range. If this is not the case, an error message is printed.

The range for reg_value is set in the constructor for the class n_bit_register.

An assignment of the variable data_out is done, inside the function update. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

3.2 A Testbench

An external module, referred to as a testbench, can be used for the purpose of generating input signals to, and observing output signals from, the register in Figure 6.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

A testbench for the register in Figure 6 is implemented in the file n_bit_register_tb.h.

In the testbench, we use a parameter, to specify the width of the register.

The parameter is defined as a const C++ variable, as

    static const int N = 4;

The clock signal is generated using a variable of the class sc_clock, defined as

    sc_clock clk; 

The actual clock generation is done using parameters specified in the instantiation of the clk variable. This is done in by instantiating the clk variable inside the constructor. The constructor is implemented as

    n_bit_register_tb(sc_core::sc_module_name name):
        sc_module(name),
        clk("n_bit_register_clk", 4, SC_NS, 1.0), 
        data_in_value(1),
        n_bit_register_0("n_bit_register_0", N)
    {
        n_bit_register_0.clk(clk); 
        n_bit_register_0.data_in(n_bit_register_data_in); 
        n_bit_register_0.data_out(n_bit_register_data_out);
        SC_HAS_PROCESS(n_bit_register_tb);
        SC_METHOD(stim_gen);
        sensitive << n_bit_register_0.clk.pos();
        SC_METHOD(reporter);
        sensitive << n_bit_register_0.clk.pos();
    }

and the instantiation of the clk variable is done in the constructor initialization block, as

        clk("n_bit_register_clk", 4, SC_NS, 1.0), 

The generation of input signals to the register in Figure 6 is done using a SystemC process, defined as a function as

    void stim_gen()
    {
        n_bit_register_data_in.write(data_in_value);
        data_in_value++;
    }

and made into a process by the SC_METHOD directive. The SC_METHOD directive is used in the constructor, as shown above.

The input signal and the output signal are defined as

    sc_signal<int> n_bit_register_data_in; 
    sc_signal<int> n_bit_register_data_out;

The signals are used in the instantiation of the register, which is done in the constructor, shown above.

The reporting of the results is done in a process, defined as a function as

    void reporter()
    {
        std::cout << "Time: " << sc_time_stamp(); 
        std::cout << ", data_in=" << std::bitset<N> (n_bit_register_data_in.read());
        std::cout << ", data_out=" << std::bitset<N> (n_bit_register_data_out.read())
                  << std::endl;
    }

and made into a process by the SC_METHOD directive. The SC_METHOD directive is used in the constructor, as shown above.

3.3 Build and Run

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The register in Figure 6 and the testbench described in Section A Testbench can be built, and the testbench can be run, using the makefile and the run script in the register/systemc_tlm directory in the book repo.

The resulting printout from running the testbench is shown in Figure 7.

        SystemC 2.3.3-Accellera --- Sep 21 2021 06:25:05
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED
Time: 0 s, data_in=0000, data_out=0000
Time: 0 s, data_in=0001, data_out=0000

Info: (I702) default timescale unit used for tracing: 1 ps (n_bit_register_tb_systemc_tlm_wave.vcd)
Time: 4 ns, data_in=0010, data_out=0001
Time: 8 ns, data_in=0011, data_out=0010
Time: 12 ns, data_in=0100, data_out=0011
Time: 16 ns, data_in=0101, data_out=0100

Figure 7. Printout from running the testbench described in Section A Testbench.

We can generate waveforms, in the same way as described in Section Making Waves. The resulting waveform, for the register with printouts as shown above, is displayed in Figure 8.

systemc_tlm_register_tb_wave Figure 8. Waveforms, obtained from running the testbench described in Section A Testbench.

4 Our First Instruction

A computer executes programs by following instructions. The instructions belong to an instruction set. As mentioned in Chapter Welcome, we will use a subset of the RISC-V architecture as the instruction set for our computer.

As a first step, we will try to build a computer with only one instruction. Although somewhat restricted, this computer will be able to

We will start with deciding on a program to run on our computer. The program will be stored in a memory, and its instructions will be read, one by one, and actions will be taken.

4.1 A Program

From the RISC-V architecture page, we can download the the RISC-V Instruction Set Manual ISA.

We look for an instruction that can load a value into a register. Using such an instruction, we can create a small program that loads specified values into some of the registers.

We choose to used the RV32I Base Integer Instruction Set, which is described in Chapter 2 of ISA.

The instructions in this instruction set set are 32 bits.

The bits in an instruction are numbered, with 31 for the leftmost bit, down to 0 for the rightmost bit.

We use the notation b1:b2 to describe a range of bits, such as 31:0 for describing all 32 bits, or e.g. 7:0 for describing the rightmost byte.

In Section 2.3 of ISA we can see how 32-bit instructions that handle immediate data are encoded.

One instruction format is called U-type. The bits in U-type instructions are described as

In Section 2.4 of ISA, we find a description of the instruction LUI, which stands for load upper immediate, and which is used to “build 32-bit constants”.

We also see that the LUI instruction “places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros”.

We conclude that

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for LUI is 0110111.

We can write the LUI instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in Figure 9, as

imm [31:12], rd [4:0], opcode[6:0] = 0110111

Figure 9. Instruction format for the LUI instruction, adapted from Table 24.2 in ISA.

The instruction in <!-fig_reg imm_lui_format –> uses bit numbers that refer to the values represented in the different fields. This means that

Alternatively, we can represent the instruction using the bit numbers for the instruction itself.

This representation can be useful when implementing features in our computer, where different bits in an instruction need to be picked out.

In this case, the bit number within the instruction might be more useful than the bit number for the value stored in a certain field.

We can represent the LUI instruction, using instruction bit numbers, as shown in Figure 10, as

imm (31:12), rd (11:7), opcode(6:0) = 0110111

Figure 10. Instruction format for the LUI instruction, using bit numbers from the instruction.

We will use both representations, with bit numbers for the fields, as illustrated in Figure 9 and using brackets to separate the fields, and with bit numbers from the instruction, as illustrated in Figure 10 and using parantheses to separate the fields.

In Section 2.1 in ISA, we see that there are 32 registers, each 32-bits wide, referred to as registers x0 to x31.

We also see that in register x0, all bits are hardwired to the value zero.

When considering a certain calling convention, registers are often given dedicated roles.

For the registers x0 to x31 in RISC-V, a list of such roles, together with a role-specific, alternative name for each register, is given in Table 25.1 in Chapter 25 in ISA.

For example, register x1 (named ra) is used as return address and register x2 (named sp) is used as stack pointer.

There are also registers that are used for storage of temporary values, such as x5 (named t0, and also serving as alternate link register), and x6 and x7 (named t1 and t2, respectively).

The alternative register names are referred to as ABI names in Table 26.1 in ISA.

ABI names for registers are used in assembly programs.

Using the LUI instruction and the register ABI names, we can create a program that performs actions, as

  1. write three different values to registers t0, t1, and t2.
  2. write the value zero to registers t0, t1, and t2

In assembly language, we could write a program, using lowercase for the instruction name, as

lui t0, 1
lui t1, 2
lui t2, 3
lui t0, 0
lui t1, 0
lui t2, 0

Figure 11. An assembly program, using a LUI instruction to write values to registers.

We recall that the LUI instruction writes the immediate value, which is 1 for the first instruction in our program, to the top 20 bits of the destination register, which for this instruction is t0, while at the same time filling in the lowest 12 bits with zeros.

For the first instruction in Figure 11, which is

lui t0, 1

this means that the number being stored in t0 is 1 followed by 12 zeros. In binary form, this becomes

1000000000000

Counting the bits from right to left, with the rightmost bit having number zero, we know, from the properties of binary numbers that the n:th bit has the weight 2^n.

In this number, all weights are zero except for bit number 12. This gives the corresponding decimal number as

2^12 = 4096

We can write this number also in hexadecimal form. One way of arriving at the hexadecimal representation is to start with the binary representation, in this case

1000000000000

and then group the bits, in groups of four bits in each group. This gives

1 0000 0000 0000

We then let each group of four bits be represented by one hexadecimal digit. Using the prefix 0x, which is commonly used for to indicate that a number is hexadecimal, we get

1 0000 0000 0000 = 0x1000

In a similar way, we can calculate the value that will be stored in register t1, by the instruction

lui t1, 2

as

10 0000 0000 0000 = 0x2000

which, when converted to decimal form, becomes

0x2000 = 8192

For the third instruction in Figure 11,

lui t2, 3

the corresponding calculation yields

0x3000 = 12288

In order to run the program in Figure 11 on our computer, which will be build in the sections that follow, we need to write the program using binary code.

We saw, in Chapter 24 in ISA, in Table 24.2, that the opcode for LUI is 0110111.

We have also seen, in Figure 9 and Figure 10 , how the top 20 bits of the value to be stored in the destination register are represented in the instruction.

We see, in Chapter 25 in ISA, in Table 25.1, how registers t0, t1, and t2 are ABI names for the registers x5, x6, and x7.

Using the numeric values 5, 6, and 7 for these registers, we can now write the program in Figure 11 in binary code, as

00000000000000000001 00101 0110111
00000000000000000010 00110 0110111
00000000000000000011 00111 0110111
00000000000000000000 00101 0110111
00000000000000000000 00110 0110111
00000000000000000000 00111 0110111

Grouping the binary digits in groups of four gives

0000 0000 0000 0000 0001 0010 1011 0111
0000 0000 0000 0000 0010 0011 0011 0111
0000 0000 0000 0000 0011 0011 1011 0111
0000 0000 0000 0000 0000 0010 1011 0111
0000 0000 0000 0000 0000 0011 0011 0111
0000 0000 0000 0000 0000 0011 1011 0111

We can convert the program to a representation where we use hexadecimal numbers. This conversion results in the program shown in Figure 12.

0x000012B7
0x00002337
0x000033B7
0x000002B7
0x00000337
0x000003B7

Figure 12. A binary program, using a LUI instruction to write values to registers.

4.2 Addressing a Memory

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We can store a program, like the program shown in Figure 12, in a memory.

The program in Figure 12 consists of instructions. Each instruction is represented by a 32-bit word.

As a first step towards executing the program, we can create a program counter that reads the 32-bit instructions, one by one, from a memory.

Reading an instruction is done by using the program counter value to address the memory. When we are done with reading an instruction, we might want to read the next instruction.

We could imagine a program counter that refers to a specific 32-bit word, stored in the memory. In a program with 32-bit instructions, like the program in Figure 12, this makes it possible to read the next instruction by adding one to the program counter.

Another alternative is to let the program counter represent an address expressed in bytes. In such a situation, we can read the next instruction by incrementing the program counter by four. This type of addressing is referred to as byte-addressing.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We can implement a memory in SystemC by using TLM. We use TLM as a means to model communication over a memory-mapped bus.

A memory implementation in SystemC/TLM is shown in Figure 13.

class memory : sc_core::sc_module
{
    std::vector<uint32_t> mem;

    void read_from_file(std::string file_name, size_t max_values)
    {
        std::ifstream memory_file(file_name);
        if (!memory_file.is_open())
        {
            std::cerr << "ERROR: could not open file " << file_name << std::endl; 
            exit(1); 
        }
        size_t lines_read = 0;
        std::string line; 
        while (std::getline(memory_file, line) && lines_read < max_values)
        {
            std::istringstream(line) >> std::hex >> mem[lines_read];
            lines_read++;
        }
        memory_file.close();
    }
    
    void b_transport(tlm::tlm_generic_payload& trans, sc_core::sc_time& delay)
    {
        size_t index = static_cast<size_t>(trans.get_address() / 4);
        if (index > mem.size() - 1)
        {
            std::cerr << "Error : address " << trans.get_address()
                      << " out of range!" << std::endl;
            trans.set_response_status(tlm::TLM_ADDRESS_ERROR_RESPONSE);
            return;
        }

        uint32_t *mem_ptr = reinterpret_cast<uint32_t *>(trans.get_data_ptr());
        tlm::tlm_command cmd = trans.get_command();
        if (cmd == tlm::TLM_READ_COMMAND)
        {
            *mem_ptr = mem[index];
        }
        else if (cmd == tlm::TLM_WRITE_COMMAND)
        {
            mem[index] = *mem_ptr;
        }
        else
        {
            std::cout << "Error: illegal command - not read or write" << std::endl; 
        }
    }

public: 
    tlm_utils::simple_target_socket<memory> socket; 

    memory(sc_core::sc_module_name name, size_t size): 
        sc_module(name),
        mem(size, 0),
        socket("socket")
    {
        read_from_file("../memory_contents.txt", size);
        socket.register_b_transport(this, &memory::b_transport);
    }
}; 

memory.h

Figure 13. A memory implementation in SystemC/TLM.

The memory implementation in Figure 13 defines a C++ class called memory.

An variable named mem is used to represent the actual storage. The variable mem is defined as a C++ vector, as

    std::vector<uint32_t> mem;

The memory contents are initialized in the constructor as

        mem(size, 0),

The constructor calls a function named read_from_file, as

        read_from_file("../memory_contents.txt", size);

The function read_from_file reads data from a file, here named memory_contents.txt, and stores the data into the vector mem.

The constructor also registers the b_transport function, as

        socket.register_b_transport(this, &memory::b_transport);

The b_transport function implements the actual access to the memory.

A read access is implemented by copying a 32-bit word from the vector mem to the TLM transaction, as

        if (cmd == tlm::TLM_READ_COMMAND)
        {
            *mem_ptr = mem[index];
        }

where a pointer mem_ptr, defined and assigned as

        uint32_t *mem_ptr = reinterpret_cast<uint32_t *>(trans.get_data_ptr());

is used to enable the actual copying.

A write access is implemented as

        else if (cmd == tlm::TLM_WRITE_COMMAND)
        {
            mem[index] = *mem_ptr;
        }

by copying from the TLM transaction to the vector mem.

A program counter implementation in SystemC/TLM is shown in Figure 14.

class pc : sc_core::sc_module
{
    uint32_t pc_value;

    void process()
    {
        tlm::tlm_generic_payload trans;
        trans.set_command(tlm::TLM_READ_COMMAND);

        uint32_t data_read; 
        unsigned char *data_ptr = reinterpret_cast<unsigned char*>(&data_read); 
        trans.set_data_ptr(data_ptr); 

        trans.set_data_length(4); 

        for (int i = 0; i < 7; i++)
        {
            trans.set_address(pc_value); 
            sc_core::sc_time delay(sc_core::SC_ZERO_TIME);
            socket->b_transport(trans, delay);
            std::cout << "pc_value=" << std::setw(4) << std::setfill('0')
                      << std::hex << pc_value
                      << ", data_read=" << std::setw(8) << data_read << std::endl;
            pc_value += 4;
        }
    }

public: 
    tlm_utils::simple_initiator_socket<pc> socket; 

    pc(sc_core::sc_module_name name):
        sc_module(name),
        pc_value(0),
        socket("socket")
    {
        SC_HAS_PROCESS(pc);
        SC_THREAD(process); 
    }

}; 

pc.h

Figure 14. A program counter in SystemC/TLM.

The current value of the program counter is represented by a variable named pc_value, defined as

    uint32_t pc_value;

The variable pc_value is updated in a SystemC thread named process, by adding the value 4 to pc_value, as

            pc_value += 4;

We can connect the memory in Figure 13 with the program counter in Figure 14. By doing so, we can use the program counter to address a memory, where a program is stored. We can then read instructions, one by one, by incrementing the program counter.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The connection of the memory in Figure 13 with the program counter in Figure 14 can be done in a testbench, stored in a file named addressing_tb.cpp.

We define variables, for the program counter, as

    pc pc_0; 

and for the memory, as

    memory memory_0;

We instantiate the variables, in the addressing_tb constructor, as

        pc_0("pc_0"),
        memory_0("memory_0", 8)

Using the TLM initiator socket, defined in Figure 14 as

    tlm_utils::simple_initiator_socket<pc> socket; 

and the TLM target socket, defined in Figure 13 as

    tlm_utils::simple_target_socket<memory> socket; 

we connect the the program counter and the memory, as

        pc_0.socket.bind(memory_0.socket); 

The actual reading of data from the memory is done in the SystemC process named process, in the file pc.h, by a b_transport call on the TLM intiator socket, as

            socket->b_transport(trans, delay);

As a preparation for the next read, the program counter is updated, as

            pc_value += 4;

We prepare the memory contents in a file, with contents corresponding to the program shown in Figure 12, as

000012B7
00002337
000033B7
000002B7
00000337
000003B7

When we run the simulation, we get


        SystemC 2.3.3-Accellera --- Sep 21 2021 06:25:05
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED
pc_value=0000, data_read=000012b7
pc_value=0004, data_read=00002337
pc_value=0008, data_read=000033b7
pc_value=000c, data_read=000002b7
pc_value=0010, data_read=00000337
pc_value=0014, data_read=000003b7
pc_value=0018, data_read=00000000

We see that the memory contents, written in binary from when we run the simulation, correspond to the program in Figure 12.

4.3 Decoding the Instruction

This is the SystemC/TLM layer The other layers are: VHDL Verilog

From Section A Program, we know that the LUI instruction, which we use in the program in Figure 11, has a format that consists of three parts, as illustrated in Figure 10.

We can create a simple instruction decoder that, given an input in the form of an instruction having the same format as the LUI instruction, generates output data, in the form of

  1. a 32-bit immediate value, with bits 31:12 given by the corresponding bits in the instruction, and with bits 11:0 set to zero.

  2. A 5-bit register id, as given by bits 11:7 in the instruction.

In SystemC/TLM, we choose to implement the instruction decoding as a member function named idecode in a SystemC module named cpu, defined in the file cpu.h.

The function, which decodes instructions with instruction format as specified in Figure 10, is shown in Figure 15.

    void idecode(uint32_t instruction, uint32_t *rd, uint32_t *imm_value)
    {
        *rd = (instruction & 0xF80) >> 7;
        *imm_value = instruction & 0xFFFFF000;
    }

cpu.h

Figure 15. An instruction decoding function for instructions having the same format as the LUI instruction.

The instruction decoding function in Figure 15 is called from a SystemC process in the SystemC module cpu.

The SystemC process reads instructions from a memory, in a for loop which has a first line as

        for (int i = 0; i < 6; i++)

The for loop defines two variables, for the register id and for the immediate value encoded in the LUI instruction, as

            uint32_t rd;
            uint32_t imm_value; 

These variabes are assigned, when the function idecode is called, inside the for loop, as

            idecode(data_out, &rd, &imm_value);

Registers, as described in Section A Register, can be combined into a register file.

We define the register file as a C++ vector, in the SystemC module cpu, as

    std::vector<uint32_t> registers;

cpu.h

Figure 16. A register file, implemented as a C++ vector.

A computer capable of running the program in Figure 11 can now be constructed, by connecting the memory in Figure 13, the program counter in Figure 14, the instruction decoder in Figure 15, and the register file in Figure 16.

We can do these connections in a testbench, stored in a file named one_instruction_tb.cpp.

In the testbench, we define a SystemC module

class one_instruction_tb : sc_core::sc_module

with instance variables for the CPU and the memory, as

    cpu cpu_0;
    memory memory_0;

The instance variables are instantiated, in the module constructor, as

        cpu_0("cpu_0"),
        memory_0("memory_0", 8)

The module constructor also binds an initiator socket, defined in the file cpu.h, as

    tlm_utils::simple_initiator_socket<cpu> socket; 

to a target socket, defined in the file memory.h, as

    tlm_utils::simple_target_socket<memory> socket; 

by calling the SystemC function bind on the initiator socket, as

        cpu_0.socket.bind(memory_0.socket); 

The instructions stored in the memory are read and decoded, in a for-loop implemented in the SystemC process named process, in the file cpu.h.

The for-loop uses a TLM transaction, defined as

        tlm::tlm_generic_payload trans;

The program counter, defined as

    uint32_t pc_value; 

is assigned to the transaction, as

            trans.set_address(pc_value); 

An instruction is read from the memory, by calling b_transport on the initiator socket, as

            socket->b_transport(trans, delay);

The complete processing of an instruction, which consists of reading the instruction, following by decoding the instruction and writing to the register specified in the instruction, becomes

            idecode(data_out, &rd, &imm_value);
            reg_write(rd, imm_value);

A block diagram of the design is shown in Figure 17.

systemc_tlm_dia_one_instruction Figure 17. A block diagram of our first computer, capable of running programs with LUI instructions.

The block diagram in Figure 17 shows the program counter, in a block labelled PC. The program counter addresses the memory, which results in an instruction being read. The instruction is used as input to the instruction decoder, in a block labelled Idecode.

The instruction decoder decodes the instruction, which in this case results in the fields imm and reg id, shown also in Figure 10, being extracted from the instruction, and used as input to the register file, here represented by a block labelled Registers.

4.4 Running the program

This is the SystemC/TLM layer The other layers are: VHDL Verilog

We store the memory contents, corresponding to the program in Figure 12, in a file memory_contents.txt.

This file will be read, during startup, and stored in the memory shown in Figure 13.

We use reporting statements, in the cpu SystemC module in cpu.h, to illustrate the execution of the program, as

    void report(uint32_t data_out, uint32_t rd, uint32_t imm_value)
    {
        std::cout << "pc_value=" << std::setw(4) << std::setfill('0')
                  << std::hex << pc_value
                  << ", data_out=" << std::setw(8) << data_out << std::endl;
        std::cout << "rd=" << std::setw(2) << std::setfill('0')
                  << std::hex << rd
                  << ", imm_value=" << std::setw(8) << imm_value << std::endl;

        for (size_t reg_number = 0; reg_number < 3; reg_number++)
        {
            if (reg_number > 0)
            {
                std::cout << ", ";
            }
            std::cout << "r" << reg_number << "_value="
                      << std::setw(8) << std::setfill('0')
                      << std::hex << registers[reg_number];
        }
        std::cout << std::endl << std::endl;
    }

These statements give a printout of

Running the program gives, for the first three instructions, a printout as

pc_value=0000, data_out=000012b7
rd=05, imm_value=00001000
r0_value=00001000, r1_value=00000000, r2_value=00000000

pc_value=0004, data_out=00002337
rd=06, imm_value=00002000
r0_value=00001000, r1_value=00002000, r2_value=00000000

pc_value=0008, data_out=000033b7
rd=07, imm_value=00003000
r0_value=00001000, r1_value=00002000, r2_value=00003000

We see, for the program counter values 0, 4, and 8, that the three registers have values as expected from the first three instructions in the program in Figure 11.

For the last three instructions, we get

pc_value=000c, data_out=000002b7
rd=05, imm_value=00000000
r0_value=00000000, r1_value=00002000, r2_value=00003000

pc_value=0010, data_out=00000337
rd=06, imm_value=00000000
r0_value=00000000, r1_value=00000000, r2_value=00003000

pc_value=0014, data_out=000003b7
rd=07, imm_value=00000000
r0_value=00000000, r1_value=00000000, r2_value=00000000

We see, for the program counter values expressed as hexadecimal numbers c, 10 and 14 (corresponding to decimal values 12, 16, and 20), that the three registers have values as expected from the last three instructions in the program in Figure 11.

5 Hello Assembly World

The program in Figure 11 has only one type of instruction. We can create a larger program, for example a hello world program, and then use that program to determined which instructions to add to our computer.

5.1 The Program

We create a program, that we aim to run on our computer.

To begin with, we run the program on a computer simulated in QEMU.

We can download and build a version of QEMU that simulates RISC-V, by following the instructions in the book software repo, for Ubuntu and Mac.

A program, printing Hello, is listed in Figure 18.

.global _start

_start:

    # the value loaded in t0 is the upper 20 bits of the base for
    # SIFIVE_U_DEV_UART0 in sifive_u_memmap struct in
    # https://git.qemu.org/?p=qemu.git;a=blob;f=hw/riscv/sifive_u.c
    lui t0, 0x10010

    andi t1, t1, 0
    addi t1, t1, 72
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 101
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 108
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 108
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 111
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 10
    sw t1, 0(t0)

finish:
    beq t1, t1, finish

hello.s

Figure 18. A hello world program in RISC-V assembly.

The program in Figure 18 uses the lui instruction. A constant value, here chosen as 0x10010, is used as operand. The value 0x10010 is the base address for one of the UARTs, in the hardware that we are simulating with this configuration of QEMU.

We use the sw instruction to write a character to the UART. We store each character in a register, in this case t1, and we write the characters to the UART, using the sw instruction repeatedly, as can be seen in Figure 18.

We store a character in t1 by first storing the value zero in t1, using the andi instruction. When this is done, we use the addi instruction to store the ASCII code of the character we want to write, in t1. This can be seen in Figure 18, where the first character, an ‘H’ with ASCII code 72, is stored using andi and addi.

The last instruction in the program in Figure 18 is beq, which does a branch if its operands are equal. We use the beq instruction to create an infinite loop, by branching to the address of the beq instruction. The purpose is to prevent the computer from incrementing the program counter to a value that points to an address outside of our program. In this way, we prevent the computer from executing possibly illegal instructions.

5.2 Tools

We use a GNU toolchain, for assembling and linking our program

We can download and install the toolchain, using instructions in the book software repo, for Ubuntu and Mac.

5.3 Testing in QEMU

We can build and run the program in Figure 18, using files available in the book repo.

We do the build by navigating, from the base of the repo, to the directory of these files, as

cd hello_asm/asm

and then issue the make command, as

make

The program can be run, using the script run_interactive.sh, as

./run_interactive.sh 

which should lead to QEMU being started, and the string Hello being printed.

We note that the string Hello is printed twice. The reason for this is that the computer we simulate in QEMU has two processors, and the program is executed on both of these processors.

QEMU can be closed down, using the key combination C-a x.

We can also run the program using expect, where a string for closing QEMU is automatically sent to the program.

We run the program using expect, via the script run.sh, as

./run.sh

This should result in printouts, as

spawn qemu-system-riscv32 -machine sifive_u -nographic -bios none -kernel hello -echr 69
Hello
Hello

5.4 And Immediate

5.4.1 Instruction Format

In Section 2.4 of ISA, we find a description of the instruction format for Integer Register-Immediate Instructions.

The bits in this instruction format are

We also see that that the and immediate instruction ANDI has this format.

The instruction performs a bitwise and operation, between an immediate value and a value stored in a register. The result of the operation is stored in a destination register.

More specifically, as stated in Section 2.4 of ISA, we note that ANDI is a logical operation that performs a bitwise and on “register rs1 and the sign-extended 12-bit immediate” and places the result in rd.

We conclude that

5.4.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for ANDI is 0010011.

We also see that the value of the funct3 field, for ANDI, is 111.

We can write the ANDI instruction, with the fields described above, as a 32-bit binary word. The resulting format is illustrated in Figure 19, as

imm [11:0], rs1[4:0], funct3[(2:0] = 111, rd[4:0], opcode[6:0] = 0010011

Figure 19. Instruction format for the ANDI instruction, adapted from Table 24.2 in ISA.

Alternatively, we can represent the ANDI instruction, using the bit numbers in the instruction, as shown in Figure 20, as

imm (31:20), rs1(19:15), funct3(14:12) = 111, rd (11:7), opcode(6:0) = 0010011

Figure 20. Instruction format for the ANDI instruction, using bit numbers from the instruction.

5.4.3 A Program

As an example program that uses the ANDI instruction, we can use the first two instructions in the program in Figure 18.

This results in a program, as

    lui t0, 0x10010
    andi t1, t1, 0

Translating the first instruction to binary, with the hex value 0x10010 expressed in binary, as

0001 0000 0000 0001 0000

together with the instruction format for the LUI instruction, as shown in Figure 10, and the register number for t0, which according to Table 25.1 in ISA is 5, gives, for the LUI instruction

0001 0000 0000 0001 0000 00101 0110111

which, when grouped into eight groups of four binary digits each, becomes

0001 0000 0000 0001 0000 0010 1011 0111

which we can write in heximal notation, as

100102B7

For the ANDI instruction, using the instruction format in Figure 20 and the register number for t1, which according to Table 25.1 in ISA is 6, we get

000000000000 00110 111 00110 0010011

which, when grouped into eight groups of four binary digits each, becomes

0000 0000 0000 0011 0111 0011 0001 0011

which we can write in hexadecimal notation, as

00037313

We can write the program, in hexadecimal format, as shown in Figure 21.

100102B7
00037313

Figure 21. Program code, in hexadecimal format, for the first two instructions in the program in Figure 18.

5.4.4 Extending our Computer

In Section 2.4 of ISA, instructions of type OP-IMM are described.

Among these, we find the ANDI instruction, with an instruction format as illustrated in in Figure 19 and Figure 20.

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The instruction decoder in Figure 15 is extended, so that also instructions of type OP-IMM are decoded.

The extended instruction decoder is shown in Figure 22.

class idecode_t
{
    const uint32_t LUI = 0x37;
    const uint32_t OP_IMM = 0x13;

    uint32_t imm_lui(uint32_t instr)
    {
        uint32_t imm_mask = 0xfffff000;
        uint32_t imm_value = instr & imm_mask;
        return imm_value;
    }

    uint32_t imm_op_imm(uint32_t instr)
    {
        // 12 bits, at the left-most position
        uint32_t imm_mask = ((1 << 12) - 1) << 20;

        uint32_t imm_value = (instr & imm_mask) >> 20;
        // sign-extend?
        if (imm_value & (1 << 12))
        {
            uint32_t sign_bits = ((1 << 20) - 1) << 12;
            imm_value = imm_value | sign_bits;
        }
        return imm_value;
    }

public:
    void decode(uint32_t instr,
                uint32_t &imm_value, uint32_t &rs1, uint32_t &rd, uint32_t &opcode)
    {
        uint32_t opcode_mask = 0x7f;
        opcode = instr & opcode_mask;

        if (opcode == LUI)
        {
            imm_value = imm_lui(instr);
            rs1 = 0;
        }
        else if (opcode == OP_IMM)
        {
            imm_value = imm_op_imm(instr);
            uint32_t rs1_mask = ((1 << 5) - 1) << 15;
            rs1 = (instr & rs1_mask) >> 15;
        }
        else
        {
            imm_value = 0;
            rs1 = 0;
        }
        uint32_t rd_mask = ((1 << 5) - 1) << 7;
        rd = (instr & rd_mask) >> 7;
    }
};

idecode.h

Figure 22. An instruction decoder for LUI and OP-IMM instructions (of which ANDI is one).

We can see, in Figure 22 how the opcode value for OP-IMM instructions is defined, as

    const uint32_t OP_IMM = 0x13;

The opcode is extracted from the instruction, as

        uint32_t opcode_mask = 0x7f;
        opcode = instr & opcode_mask;

and used in the assignment of the immediate value, and the source register id rs1, as

        if (opcode == LUI)
        {
            imm_value = imm_lui(instr);
            rs1 = 0;
        }
        else if (opcode == OP_IMM)
        {
            imm_value = imm_op_imm(instr);
            uint32_t rs1_mask = ((1 << 5) - 1) << 15;
            rs1 = (instr & rs1_mask) >> 15;
        }
        else
        {
            imm_value = 0;
            rs1 = 0;
        }

where, for the instruction type OP-IMM, we see how the 32-bit immediate value is assigned, by calling a function imm_op_imm, implemented as

    uint32_t imm_op_imm(uint32_t instr)
    {
        // 12 bits, at the left-most position
        uint32_t imm_mask = ((1 << 12) - 1) << 20;

        uint32_t imm_value = (instr & imm_mask) >> 20;
        // sign-extend?
        if (imm_value & (1 << 12))
        {
            uint32_t sign_bits = ((1 << 20) - 1) << 12;
            imm_value = imm_value | sign_bits;
        }
        return imm_value;
    }

where the 32-bit immediate value is computed, by sign-extending the 12-bit imm field in Figure 19.

We also see, in Figure 22, how the register identity rd for the destination register, is assigned, as

        uint32_t rd_mask = ((1 << 5) - 1) << 7;
        rd = (instr & rd_mask) >> 7;

Figure 22 also shows how the LUI instruction, with format according to Figure 10 is decoded, in the same way as it is decoded in Figure 15.

The ANDI instruction shall perform an and operation.

We introduce an ALU, for performing logical and arithmetic operations.

An ALU, capable of performing an and operation, is shown in Figure 23.

            const uint32_t LUI = 0x37;
            const uint32_t OP_IMM = 0x13;

            uint32_t rd_value;
            if (opcode == LUI)
            {
                rd_value = imm_value;    
            }
            else if (opcode == OP_IMM)
            {
                rd_value = imm_value & reg_read(rs1);
            }
            else
            {
                rd_value = 0;
            }

cpu.h

Figure 23. An ALU with an and operation.

We see, in Figure 23 how the and operation is performed, when the opcode is representing an OP-IMM instruction.

We also see, in Figure 23, that for the LUI instruction, no operation is performed.

The computer executing the program in Figure 11 has registers defined as in Figure 16.

For the purpose of executing the ANDI instruction, we add a read operation to the registers.

The definition of the registers, with operations for write and read, is shown in Figure 24.

    static const int n_registers = 32;
    std::vector<uint32_t> registers;

    void reg_write(uint32_t rd, uint32_t value)
    {
        registers[rd] = value; 
    }

    uint32_t reg_read(uint32_t rs)
    {
        return registers[rs];
    }

cpu.h

Figure 24. A register bank with 32 registers, with operations for write and read.

We can see, in Figure 24, how the register indicated by rd is updated, in the write operation, as

    void reg_write(uint32_t rd, uint32_t value)
    {
        registers[rd] = value; 
    }

To complete our computer, for this step of our development, we need a memory and a program counter.

We choose to re-use the memory, as shown in Figure 13, and the program counter, as shown in Figure 14.

A block diagram of the design is shown in Figure 25.

systemc_tlm_dia_andi Figure 25. A block diagram of our computer, capable of running programs with LUI and ANDI instructions.

A computer, designed according to the block diagram in Figure 25 and capable of running a program consisting of the first two instructions in the program in Figure 18, with hexadecimal representation according to Figure 21, can now be implemented.

We do the implementation in a testbench, stored in a file named andi_tb.cpp.

The testbench in andi_tb.cpp instantiates and connects

5.4.5 Running the program

This is the SystemC/TLM layer The other layers are: VHDL Verilog

The testbench in andi_tb.cpp instantiates a cpu, defined in cpu.h, which contains report statements, as

    void report(uint32_t data_out, uint32_t rs1, uint32_t rd,
                uint32_t imm_value, uint32_t rd_value)
    {
        std::cout << "pc_value=" << std::setw(8) << std::setfill('0')
                  << std::hex << pc_value
                  << ", data_out=" << std::setw(8) << data_out
                  << ", rs1=" << std::setw(2) << rs1 << std::endl;
        std::cout << "rd=" << std::setw(2) << std::setfill('0')
                  << std::hex << rd
                  << ", imm_value=" << std::setw(8) << imm_value
                  << ", rd_value=" << std::setw(8) << rd_value << std::endl;
        std::cout << std::endl;
    }

Running our program, which consists of the first two instructions in the program in Figure 18, with hexadecimal representation according to Figure 21, gives a printout as

pc_value=00000000, data_out=100102b7, rs1=00
rd=05, imm_value=10010000, rd_value=10010000

pc_value=00000004, data_out=00037313, rs1=06
rd=06, imm_value=00000000, rd_value=00000000

In the printout, the name rd_value refers, with reference to Figure 25, to the result output signal from the ALU which is connected to the rd_value input signal in the register file, with implementation according to Figure 24.

We see, at pc_value 0, how the value 10010000 is written to register 5.

Using the fact, according to Chapter 25 in ISA, in Table 25.1, that register 5 has the ABI name t0, we can conclude that the printout at pc_value 0 seems consistent with the first instruction in Figure 18.

In a similar way, by observing the rd_value 00000000 at pc_value 4 in the printout, and noting that register 6 has the ABI name t1, we can conclude that the printout at pc_value 4 seems consistent with the second instruction in Figure 18.

5.5 Add Immediate

5.5.1 Instruction Format

In Section 2.4 of ISA, in the description of the instruction format for Integer Register-Immediate Instructions, we see that that the and immediate instruction ANDI has this instruction format.

The bits in this instruction format are described in Section And Immediate

The ADDI instruction performs an addition, between an immediate value and a value stored in a register. The result of the operation is stored in a destination register.

More specifically, as stated in Section 2.4 of ISA, we note that “ADDI adds the sign-extended 12-bit immediate to register rs1”.

5.5.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for ADDI is 0010011.

We also see that the value of the funct3 field, for ADDI, is 000.

We can write the ADDI instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in Figure 26, as

imm [11:0], rs1[4:0], funct3[(2:0] = 000, rd[4:0], opcode[6:0] = 0010011

Figure 26. Instruction format for the ADDI instruction, adapted from Table 24.2 in ISA.

Alternatively, we can represent the ADDI instruction, using the bit numbers in the instruction, as shown in Figure 27, as

imm (31:20), rs1(19:15), funct3(14:12) = 000, rd (11:7), opcode(6:0) = 0010011

Figure 27. Instruction format for the ADDI instruction, using bit numbers from the instruction.

5.5.3 A Program

As an example program that uses the ADDI instruction, we can use the first three instructions in the program in Figure 18.

This results in a program, as

    lui t0, 0x10013
    andi t1, t1, 0
    addi t1, t1, 72

The first two instructions in this program are translated to binary in Section And Immediate and shown in hexadecimal format in Figure 21.

A translation to binary of the third instruction

    addi t1, t1, 72

can be done, using the instruction format in Figure 27 and the register number for t1, which according to Table 25.1 in ISA is 6.

The result, using the binary value for 72, which is 1001000, is

000001001000 00110 000 00110 0010011

Grouping the binary digits into groups of four binary digits each, gives

0000 0100 1000 0011 0000 0011 0001 0011

which can be written in hexadecimal format, as

04830313

The complete program, in hexadecimal format, using the hexadecimal format of the first two instructions from Figure 21 is shown in Figure 28.

100132B7
00037313
04830313

Figure 28. Program code, in hexadecimal format, for the first three instructions in the program in Figure 18.

5.5.4 Extending our Computer

This is the SystemC/TLM layer The other layers are: VHDL Verilog

In Chapter 24 in ISA, in Table 24.2, and in Figure 20 and Figure 27, we see that the opcode for both ANDI and ADDI is 0010011.

We also see that ANDI and ADDI have different values for the funct3 field.

We update the instruction decoder, shown in Figure 22, so that funct3 is supported.

We add funct3 as an output, as

    void decode(uint32_t instr,
                uint32_t &imm_value, uint32_t &rs1, uint32_t &rd,
                uint32_t &opcode, uint32_t &funct3)

and assign a value to funct3, by extracting, from the instruction, the bits corresponding to the funct3 field, as

        uint32_t funct3_mask = 0x7000;
        funct3 = (instr & funct3_mask) >> 12;

The updated instruction decoder can be found in the book repo.

The ALU, shown in Figure 23, is updated so that funct3 is respected, for the opcode OP_IMM.

We add funct3 as variable, as

            uint32_t funct3;

and assign it, when calling the idecode function, as

            idc.decode(data_out, imm_value, rs1, rd, opcode, funct3);

We use funct3, when selecting and performing the ALU operation, as

            if (opcode == LUI)
            {
                rd_value = imm_value;    
            }
            else if (opcode == OP_IMM && funct3 == 0x7)
            {
                rd_value = imm_value & reg_read(rs1);
            }
            else if (opcode == OP_IMM && funct3 == 0x0)
            {
                rd_value = imm_value + reg_read(rs1);
            }
            else
            {
                rd_value = 0;
            }

The updated ALU can be found in the book repo, in the file cpu.h.

To complete our computer, for this step of our development, we re-use the memory, the program counter, and the register file from Section And Immediate.

A block diagram of the design is shown in Figure 29.

systemc_tlm_dia_addi Figure 29. A block diagram of our computer, capable of running programs with LUI, ANDI, and ADDI instructions.

A computer, designed according to the block diagram in Figure 29 and capable of running a program consisting of the first three instructions in the program in Figure 18, with hexadecimal representation according to Figure 28, can now be implemented.

We do the implementation in a testbench, stored in a file named addi_tb.cpp.

The testbench in addi_tb.cpp instantiates and connects

5.5.5 Running the program

This is the SystemC/TLM layer The other layers are: VHDL Verilog

Running our program, which consists of the first three instructions in the program in Figure 18, with hexadecimal representation according to Figure 28, gives a printout, from the report statements in addi_tb.cpp, as

pc_value=00000000, data_out=100102b7, rs1=00
rd=05, imm_value=10010000, rd_value=10010000

pc_value=00000004, data_out=00037313, rs1=06
rd=06, imm_value=00000000, rd_value=00000000

pc_value=00000008, data_out=04830313, rs1=06
rd=06, imm_value=00000048, rd_value=00000048

We see, at pc_value 0, how the value 10010000 is written to register 5.

Using the fact that register 5 has the ABI name t0, we can conclude that the printout at pc_value 0 seems consistent with the first instruction in Figure 18.

In a similar way, by observing, at pc_value 4, how the value 00000000 is written to register 6 and noting that register 6 has the ABI name t1, we can conclude that the printout at pc_value 4 seems consistent with the second instruction in Figure 18.

For the third instruction, at pc_value 8, we see that the hexadecimal value 48 is written to register t1.

Noting that the hexadecimal value 48 corresponds to the decimal value 72, and that the value in register t1 was 0 after the second instruction, we can conclude that the printout at pc_value 8 seems consistent with the addition of 72 to t1, as done in the third instruction in Figure 18.

5.6 Store to Memory

FROM HERE ON THE BOOK IS IN A MORE WORK-IN-PROGRESS STATE

WORK IS ONGOING TO COMPLETE THE BOOK, AND RELEASE IT

5.6.1 Instruction Format

In Section 2.6 of ISA, we find a description of the instruction formats for Load and Store Instructions.

The instruction format for store instructions has

Store instructions store bits from the rs2 register to memory.

In addition, we note that the imm[11:5] field and the imm[4:0] represent an offset.

The rs1 field is used, together with the offset, to calculate the address where data shall be stored. The address is calculated as the sum of the rs1 field and the sign-extended offset.

5.6.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for SW is 0100011.

We also see that the value of the funct3 field, for SW, is 010.

We can write the SW instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in [Figure 30][systemc_tlm_fig_op_sw_format], as

imm[11:5] (31:25), rs2(24:20), rs1(19:15), funct3(14:12) = 010, imm[4:0] (11:7), opcode(6:0) = 0100011

Figure 30. Instruction format for the SW instruction, adapted from Table 24.2 in ISA.

5.6.3 A Program

As an example program that uses the SW instruction, we can use the first four instructions in the program in Figure 18.

This results in a program, as

    lui t0, 0x10013
    andi t1, t1, 0
    addi t1, t1, 72
    sw t1, 0(t0)

The first three instructions in this program are translated to binary in Section TBD, and shown in hexadecimal format in Figure 28.

A translation to binary of the fourth instruction

    sw t1, 0(t0)

can be done, using the instruction format in [Figure 30][systemc_tlm_fig_op_sw_format] and the numbers for t0, which according to Table 25.1 in (ref) is 5, and t1, which according to Table 25.1 in (ref) is 6.

The result is written in the format of [Figure 30][systemc_tlm_fig_op_sw_format] as

0000000 00110 00101 010 00000 0100011

Grouping the binary digits into groups of four binary digits each, give

0000 0000 0110 0010 1010 0000 0010 0011

which can be written in hexadecimal format, as

0062A023

The complete program, in hexadecimal format, using the hexadecimal format of the first three instructions from Figure 28 is shown in Figure 31.

100132B7
00037313
04830313
0062A023

Figure 31. Program code, in hexadecimal format, for the first four instructions in the program in Figure 18.

5.6.4 Extending our Computer

5.6.5 Running the program

We create a testbench.

We put prints in the testbench, so we can follow the execution of the program.

We put some extra nops (all zeros) after the two instructions.

Here you can see the result.

5.7 Branch if Equal

5.7.1 Instruction Format

In Section 2.5 of ISA, we find a description of the instruction formats for Control Transfer Instructions.

The instruction format for conditional branch instructions has

Branch instructions compare registers rs1 and rs2. The BEQ instruction, which is used in the program in Figure 18, branches if the registers rs1 and rs2 are equal.

The 12-bit immediate value imm[12:1] encodes the branch offset, as a signed multiple of 2 bytes.

5.7.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for BEQ is 1100011.

We also see that the value of the funct3 field, for BEQ, is 000.

We can write the BEQ instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in Figure 32, as

imm[12] (31), imm[10:5] (30:25), rs2(24:20), rs1(19:15), funct3(14:12)=000, imm[4:1] (11:8), imm[11] (7), opcode(6:0) = 1100011

Figure 32. Instruction format for the BEQ instruction, adapted from Table 24.2 in ISA.

5.7.3 A Program

As an example program that uses the BEQ instruction, we can use the first four instructions, and the last instruction, in the program in Figure 18.

This results in a program, as

    lui t0, 0x10013
    andi t1, t1, 0
    addi t1, t1, 72
    sw t1, 0(t0)
finish:
    beq t1, t1, finish

The first four instructions in this program are translated to binary in Section TBD, and shown in hexadecimal format in Figure 31.

A translation to binary of the fifth instruction

    beq t1, t1, finish

can be done, using the instruction format in Figure 32 and the number for t1, which according to Table 25.1 in (ref) is 6.

The result is written in the format of Figure 32 as

0 000000 00110 00110 000 0000 0 1100011

Grouping the binary digits into groups of four binary digits each, give

0000 0000 0110 0011 0000 0000 0110 0011

which can be written in hexadecimal format, as

00630063

The complete program, in hexadecimal format, using the hexadecimal format of the first four instructions from Figure 31 is shown in Figure 33.

100132B7
00037313
04830313
0062A023
00630063

Figure 33. Program code, in hexadecimal format, for the first five instructions in the program in Figure 18.

5.7.4 Extending our Computer

5.7.5 Running the program

We create a testbench.

We put prints in the testbench, so we can follow the execution of the program.

We put some extra nops (all zeros) after the two instructions.

Here you can see the result.

5.8 Running the complete program

5.8.1 A hand-written version

5.8.2 Using the RISC-V toolchain

6 Hello C World

6.1 The Program

6.2 Tools

6.3 Testing in QEMU

6.4 Extending our Computer

6.5 Running the Program

7 References

[ISA], The RISC-V Instruction Set Manual Volume I: Unprivileged ISA, available at this RISC-V ISA Specification page