Into Computers

Ola Dahl

July 2, 2022

1 Welcome

This is the VHDL layer The other layers are: Verilog SystemC/TLM

This is a book about computers. It describes how a small computer can be designed and implemented, using a step-by-step approach.

Starting with a simple building block that can store one bit, we continue, via registers and control logic and instruction decoding, towards a design that can run a small program.

We extend the instruction set, by adding instructions for controlling the program flow, and for interacting with the outside world, through a UART.

We stop when we have a computer that can run a program that has been compiled and linked using gcc.

We implement a subset of a real computer architecture - the RISC-V architecture. In this way, we can convey the experience of building a real system, while at the same time making the task small enough to be completed without a large implementation effort.

Using an already available architecture also allows us to use available tools, such as this RISC-V toolchain.

The book is designed as a Layered Book. This means that there are common parts, covering the general aspects of our computer design, but also specific parts, treating layer-specific material. Each layer represents a particular design language, such as VHDL or Verilog.

You can read the book one layer at the time, but you can also move from one layer to another.

The book has the following layers.

You are now reading the VHDL layer. The purpose of this layer is to show how VHDL can be used to construct a computer that implements a specific architecture.

Moving between layers is done by following links. Here is an example.

This is the VHDL layer The other layers are: Verilog SystemC/TLM

You will see these links throughout the book, e.g. at the beginning or the end of a section. Following such a link will take you to another layer. You will arrive a the new layer at a position corresponding to the position from which you left off.

1.1 Software

The software accompanying the book is available in a Git repo on GitHub.

Examples in the book contain links to files in the repo.

1.2 Acknowledgements

This book has been produced using pandoc and Python.

The html-version of the book has been styled using a slightly modified version of this css file from this pandoc demo page.

1.3 Choosing a Language

This is the VHDL layer The other layers are: Verilog SystemC/TLM

We describe our computer using a design language. In this way, we can have a textual representation of the computer, and we can use the textual representation as input to software tools, that will help us to simulate the behavior of our computer.

VHDL is a hardware description language. We use VHDL to describe our computer, and to simulate its functionality. It is also possible to use VHDL to actually synthesize a computer, in real hardware, for example in an FPGA.

VHDL is standardized by IEEE.

Information about VHDL can be found e.g. from Doulos.

1.4 Hello World

This is the VHDL layer The other layers are: Verilog SystemC/TLM

A simple example will get us started. We use a hello, world example, which will do nothing meaningful except printing a text string.

The code for the example is shown in Figure 1.

The code for the example is from the GHDL guide. We will start using GHDL in Section Getting some Tools.

use std.textio.all;

entity hello_world is
end hello_world;

architecture behavior of hello_world is
begin
  process
    variable the_line: line;
  begin
    write(the_line, String'("Hello, world"));
    writeline(output, the_line);
    wait;
  end process;
end behavior;

hello.vhdl

Figure 1. A hello world example in VHDL.

The code in Figure 1 starts with a use clause. The purpose of this clause is to indicate which VHDL libraries that will be used. In this case we use one library, with functionality for printing text.

The code in Figure 1 contains an entity. The entity is empty, since we do not have any input ports or output ports.

Then comes the architecture part, where a process is defined.

The process assigns a string to a variable called the_line. The string is then printed, and a call to wait is done, for the purpose of pausing the simulation.

We remark that the code in Figure 1 generates an artificial, simulated behavior. It does not provide any code that can be used for synthesizing actual hardware.

You can read about VHDL in Wikipedia, and at other places. A book called Free Range VHDL is available for free download. You may also want to look at this VHDL Guide from Doulos.

1.5 Getting some Tools

This is the VHDL layer The other layers are: Verilog SystemC/TLM

We need some tools, in the form of software. We search for software that can be obtained without cost.

We use a Linux computer with Ubuntu, and a Mac computer with macOS Big Sur.

We decide to use GHDL.

Installation instructions for the chosen tools can be found in the book software repo, on this tools page.

1.6 Make it Run

The code in Figure 1 can be compiled and run.

Instructions for doing this can be found in the book software repo, for Ubuntu and Mac.

1.7 Building a Computer

This is the VHDL layer The other layers are: Verilog SystemC/TLM

We have chosen a language, to describe our computer. We have taken a first, tiny step, by installing some tools, and building and running a hello, world example.

Our goal is to create a computer that can run programs, consisting of instructions. We want the instructions to be generated, using a compiler.

A computer reads instructions from a memory. Each instruction is represented as a sequence of bits. The values of the bits determine the type of instruction, and sometimes also arguments that the instruction shall use. The allowed instructions, for a given computer, belong to the computer’s instruction set.

Most computers have instructions for loading data from a memory, and for storing data to a memory. Other common instructions are instructions for doing mathematical operations, such as addition and subtraction, and instructions for making decisions. The decisions can be based on evaluations of certain conditions, such as checking if a number is zero, or if a certain bit is set in a piece of data.

An instruction that has been read from memory is decoded, meaning that the computer interprets the bits of the instruction, and then, depending on the values of the bits, takes different actions.

The actions taken are determined by the instructions. As an example, an instruction for addition results in the actual addition of two numbers, and most often also the storing of the result of the addition.

2 Storing one Bit

We start with a small building block, that can store only one bit. We then extend the building block, so that we can store larger pieces of information. At a certain stage in our development, we are ready to implement our first instruction.

A bit can have the values 0 or 1. In a computer, these values are represented by a low value and a high value of an electrical signal.

The value of a bit can be stored. This means that the value is remembered, as long as it is stored. While the value is stored, the value can be read, and used, for the purpose of performing different operations. As an example, the value of a bit could be used in an addition operation, or it could be copied so that it is stored somewhere else, for example at another place in a memory.

2.1 A D Flip-flop

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The value of a bit can be stored in a building block called D flip-flop.

A D flip-flop stores one bit of data. A new value can be stored when a clock signal changes value. A component, which can change its stored value only when a clock signal changes, is called a synchronous component.

A D flip-flop implementation in VHDL is shown in Figure 2.

library ieee;
use ieee.std_logic_1164.all; 

entity d_ff is
  port(
    clk: in std_logic;
    data_in: in std_logic;
    data_out: out std_logic);
  end d_ff;

architecture rtl of d_ff is

  signal reg_value: std_logic;

begin

  update: process(clk)
  begin
    if rising_edge(clk) then
      reg_value <= data_in;
    end if;
  end process; 

  data_out <= reg_value;

end rtl; 

d_ff.vhdl

Figure 2. A D flip-flop in VHDL.

The code in Figure 2 starts with a reference to a library. We use the library to get access to a data type called std_logic. Variables of this data type represent binary data.

An entity is then defined. The entity has a port where inputs and outputs are defined. We have two inputs, called clk and data_in, and we have one output, called data_out.

The architecture block, which is called rtl, for register-transfer level, defines a variable called reg_value. The variable reg_value is defined using the keyword signal.

The variable reg_value will contain the actual value stored in the D flip-flop.

The variable reg_value is called a state variable.

A VHDL process called update defines actions to be taken at every rising edge of the clock signal. We see that the only action taken is to assign the value of the input data_in to the state variable reg_value. This assignment ensures that the state variable reg_value is updated at every rising edge of the clock.

An assignment of the variable data_out is done, outside of the process update. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

2.2 A Testbench

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The D flip-flop implementation in Figure 2 has inputs and outputs. An external module, referred to as a testbench, can be used for the purpose of generating input signals to the D flip-flop, and observing output signals from the D-flip-flop.

A VHDL testbench is shown in Figure 3.

library ieee; 
use ieee.std_logic_1164.all;

entity d_ff_tb is
end d_ff_tb;

architecture behavior of d_ff_tb is

  component d_ff
    port(
      clk: in std_logic;
      data_in: in std_logic;
      data_out: out std_logic);
  end component;

  signal clk: std_logic := '0';

  constant clk_half_period: time := 2 ns; 
  constant n_clk_cycles: integer := 4; 

  signal d_ff_data_in: std_logic := '0'; 
  signal d_ff_data_out: std_logic;

begin

  d_ff_0: d_ff
    port map(
      clk => clk,
      data_in => d_ff_data_in,
      data_out => d_ff_data_out);

  clk_gen: process is
  begin
    for i in 1 to n_clk_cycles loop
      wait for clk_half_period;
      clk <= '1';
      wait for clk_half_period; 
      clk <= '0';
    end loop;
    wait; 
  end process;

  stim_gen: process is
  begin
    wait for 1 ns;
    d_ff_data_in <= '0';
    wait for 4 ns;
    d_ff_data_in <= '1';
    wait for 4 ns;
    d_ff_data_in <= '0';
    wait;
  end process; 

  reporter: process(clk, d_ff_data_in) is
  begin
    if (rising_edge(clk) or falling_edge(clk) or d_ff_data_in'event) then
       report "data_in=" & std_logic'image(d_ff_data_in) & 
              ", data_out=" & std_logic'image(d_ff_data_out);
    end if; 
  end process; 

end behavior; 

d_ff_tb.vhdl

Figure 3. A D flip-flop testbench in VHDL.

The testbench in Figure 3 starts with a library reference, followed by a definition of an empty entity.

The architecture section defines a component named d_ff. This component represents the D-flip flop, implemented according to Figure 2, that will be tested by the testbench.

The architecture section defines a signal variable called clk on line 16. This variable represents the clock signal. The actual shape of the clock signal is defined in the process named clk_gen, by the lines

      wait for clk_half_period;
      clk <= '1';
      wait for clk_half_period; 
      clk <= '0';

The input signal to the D flip-flop is defined by the signal variable d_ff_data_in on line 21. The values used for the input signal are defined in the stim_gen process, as

  stim_gen: process is
  begin
    wait for 1 ns;
    d_ff_data_in <= '0';
    wait for 4 ns;
    d_ff_data_in <= '1';
    wait for 4 ns;
    d_ff_data_in <= '0';
    wait;
  end process; 

The architecture section of the testbench in Figure 3 is named behavior, to indicate that the testbench is a behavioral model. A behavioral model can be used in simulation, but can not be synthesized into a working digital system, for use in e.g. an FPGA or an ASIC.

2.3 Build and Run

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The D flip-flop in Figure 2 and the testbench in Figure 3 can be built, and the testbench can be run, using the makefile and the run script in the flip_flop/vhdl directory in the book repo.

The resulting printout from running the testbench is shown in Figure 4.

d_ff_tb.vhdl:57:8:@2ns:(report note): data_in='0', data_out='U'
d_ff_tb.vhdl:57:8:@4ns:(report note): data_in='0', data_out='0'
d_ff_tb.vhdl:57:8:@5ns:(report note): data_in='1', data_out='0'
d_ff_tb.vhdl:57:8:@6ns:(report note): data_in='1', data_out='0'
d_ff_tb.vhdl:57:8:@8ns:(report note): data_in='1', data_out='1'
d_ff_tb.vhdl:57:8:@9ns:(report note): data_in='0', data_out='1'
d_ff_tb.vhdl:57:8:@10ns:(report note): data_in='0', data_out='1'
d_ff_tb.vhdl:57:8:@12ns:(report note): data_in='0', data_out='0'
d_ff_tb.vhdl:57:8:@14ns:(report note): data_in='0', data_out='0'
d_ff_tb.vhdl:57:8:@16ns:(report note): data_in='0', data_out='0'

Figure 4. Printout from running the testbench in Figure 3.

The printout in Figure 4 shows the values of data_in and data_out for a sequence of time instants. The VHDL code that does the printout is located inside an if-statement in the reporter process in Figure 3, and shown here as

  reporter: process(clk, d_ff_data_in) is
  begin
    if (rising_edge(clk) or falling_edge(clk) or d_ff_data_in'event) then
       report "data_in=" & std_logic'image(d_ff_data_in) & 
              ", data_out=" & std_logic'image(d_ff_data_out);
    end if; 
  end process; 

The effect is that a printout is done whenever the clock signal has a rising edge, or the variable d_ff_data_in changes value. The changes for the variable d_ff_data_in are defined in the stim_gen process in Figure 3.

2.4 Making Waves

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The testbench in Figure 3 generates printouts as shown in Figure 4. The printouts show values of digital signals, each having the value one or zero. We can represent these signals as waveforms, with the level of the waveform being one or zero. Thinking of the value one as a high voltage level, and the value zero as a low voltage level, we can think of the waveforms as representing actual voltages, in an actual digital system.

A waveform can be visualized using the GTKWave program.

Instructions for installing and running GTKWave are found in the book software repo.

A waveform can be generated from VHDL by running the ghdl program with an added command line switch named vcd.

This is done in the run script which runs the testbench in Figure 3, in the file flip_flop/vhdl/run.sh in the book repo.

Using the run script, we can generate waveforms, as shown in Figure 5.

vhdl_d_ff_tb_wave Figure 5. Waveforms, obtained from running the testbench in Figure 3.

We see in Figure 5 how the waveforms correspond to the printouts shown in Figure 4.

3 Storing Data in Registers

When a computer executes instructions, it often needs intermediate storage places. As an example, consider an addition of two data items, both stored in memory. In this situation, it might be convenient to read the data items from memory and store them in an intermediate storage place, from where the inputs to the addition operation can be taken. The result of the addition could also be stored in the intermediate storage area, before it is transferred to memory.

An intermediate storage place can consist of a register, or a set of registers. A register typically allows faster accesses, for reading and writing data, than a memory.

A set of registers could be used when performing an addition. Two items of data could be read from memory, and stored in two registers. A third register, or one of the two already used, could be used to store the result of the addition, before it is written back to memory.

Registers can also be used to hold other types of values. As an example, a register is often used for holding the current value of the program counter

We could also use registers for holding status bits, that provide information about the result of a computation. One example of such a register is a status register. A status register can hold information indicating, for example, if an addition resulted in overflow, or if an operation resulted in a zero value.

A set of registers, organized together, so that it is possible to refer to each of the individual registers, for example using an address, can be called a register file.

3.1 A Register

A D flip-flop can store one bit. We can imagine a register as a row of D flip-flops, each storing one bit, with the possibility to load new values into all D flip-flops simultaneously.

This is the VHDL layer The other layers are: Verilog SystemC/TLM

A register implementation in VHDL is shown in Figure 6.

library ieee;
use ieee.std_logic_1164.all; 

entity n_bit_register is
  generic (N: integer := 8); 
  port(
    clk: in std_logic;
    data_in: in std_logic_vector(N-1 downto 0);
    data_out: out std_logic_vector(N-1 downto 0));
  end n_bit_register;

architecture rtl of n_bit_register is

  signal reg_value: std_logic_vector(N-1 downto 0);

begin

  update: process(clk)
  begin
    if rising_edge(clk) then
      reg_value <= data_in;
    end if;
  end process; 

  data_out <= reg_value;

end rtl; 

n_bit_register.vhdl

Figure 6. A register in VHDL.

The code in Figure 6 defines an entity called n_bit_register. The entity has a port where inputs and outputs are defined. We have two inputs, called clk and data_in, and we have one output, called data_out.

The architecture block defines a variable called reg_value. The variable reg_value will contain the actual value stored in the register.

A VHDL process called update ensures that the state variable reg_value is updated at every rising edge of the clock.

An assignment of the variable data_out is done, outside of the process update. This assignment ensures that the output data_out has the same value as the current value of the state variable reg_value.

3.2 A Testbench

An external module, referred to as a testbench, can be used for the purpose of generating input signals to, and observing output signals from, the register in Figure 6.

This is the VHDL layer The other layers are: Verilog SystemC/TLM

A testbench for the register in Figure 6 is implemented in the file n_bit_register_tb.vhdl.

In the testbench, we use a parameter, to specify the width of the register.

The parameter is defined as a VHDL constant, as

  constant N: integer := 4;

The clock signal is generated using a variable named clk, together with two constants, as

  signal clk: std_logic := '0';

  constant clk_half_period: time := 2 ns; 
  constant n_clk_cycles: integer := 5; 

The actual clock generation is done in a VHDL process, as

  clk_gen: process is
  begin
    for i in 1 to n_clk_cycles loop
      clk <= '1';
      wait for clk_half_period;
      clk <= '0';
      wait for clk_half_period; 
    end loop;
    wait; 
  end process;

The generation of input signals to the register in Figure 6 is done using a VHDL process, as

  stim_gen: process(clk) is
  begin
    if (rising_edge(clk)) then
      reg_data_in <= std_logic_vector(
        unsigned(reg_data_in) + 1);
    end if; 
  end process; 

The input signal and the output signal are defined as

  signal reg_data_in: std_logic_vector(N-1 downto 0) :=
    (0 => '1', others => '0'); 
  signal reg_data_out: std_logic_vector(N-1 downto 0);

The signals are used in the instantiation of the register, which is done as

  n_bit_register_0: n_bit_register
    port map(
      clk => clk,
      data_in => reg_data_in,
      data_out => reg_data_out);

The reporting of the results is done in a process, as

  reporter: process(clk) is
  begin
    if (rising_edge(clk)) then
       report "data_in=" &
         reverse_string(std_logic_vector_to_string(reg_data_in)) & 
         ", data_out=" &
         reverse_string(std_logic_vector_to_string(reg_data_out));
    end if; 
  end process; 

3.3 Build and Run

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The register in Figure 6 and the testbench described in Section A Testbench can be built, and the testbench can be run, using the makefile and the run script in the register/vhdl directory in the book repo.

The resulting printout from running the testbench is shown in Figure 7.

n_bit_register_tb.vhdl:97:8:@0ms:(report note): data_in=0001, data_out=UUUU
n_bit_register_tb.vhdl:97:8:@4ns:(report note): data_in=0010, data_out=0001
n_bit_register_tb.vhdl:97:8:@8ns:(report note): data_in=0011, data_out=0010
n_bit_register_tb.vhdl:97:8:@12ns:(report note): data_in=0100, data_out=0011
n_bit_register_tb.vhdl:97:8:@16ns:(report note): data_in=0101, data_out=0100

Figure 7. Printout from running the testbench described in Section A Testbench.

We can generate waveforms, in the same way as described in Section Making Waves. The resulting waveform, for the register with printouts as shown above, is displayed in Figure 8.

vhdl_register_tb_wave Figure 8. Waveforms, obtained from running the testbench described in Section A Testbench.

4 Our First Instruction

A computer executes programs by following instructions. The instructions belong to an instruction set. As mentioned in Chapter Welcome, we will use a subset of the RISC-V architecture as the instruction set for our computer.

As a first step, we will try to build a computer with only one instruction. Although somewhat restricted, this computer will be able to

We will start with deciding on a program to run on our computer. The program will be stored in a memory, and its instructions will be read, one by one, and actions will be taken.

4.1 A Program

From the RISC-V architecture page, we can download the the RISC-V Instruction Set Manual ISA.

We look for an instruction that can load a value into a register. Using such an instruction, we can create a small program that loads specified values into some of the registers.

We choose to used the RV32I Base Integer Instruction Set, which is described in Chapter 2 of ISA.

The instructions in this instruction set set are 32 bits.

The bits in an instruction are numbered, with 31 for the leftmost bit, down to 0 for the rightmost bit.

We use the notation b1:b2 to describe a range of bits, such as 31:0 for describing all 32 bits, or e.g. 7:0 for describing the rightmost byte.

In Section 2.3 of ISA we can see how 32-bit instructions that handle immediate data are encoded.

One instruction format is called U-type. The bits in U-type instructions are described as

In Section 2.4 of ISA, we find a description of the instruction LUI, which stands for load upper immediate, and which is used to “build 32-bit constants”.

We also see that the LUI instruction “places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros”.

We conclude that

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for LUI is 0110111.

We can write the LUI instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in Figure 9, as

imm [31:12], rd [4:0], opcode[6:0] = 0110111

Figure 9. Instruction format for the LUI instruction, adapted from Table 24.2 in ISA.

The instruction in <!-fig_reg imm_lui_format –> uses bit numbers that refer to the values represented in the different fields. This means that

Alternatively, we can represent the instruction using the bit numbers for the instruction itself.

This representation can be useful when implementing features in our computer, where different bits in an instruction need to be picked out.

In this case, the bit number within the instruction might be more useful than the bit number for the value stored in a certain field.

We can represent the LUI instruction, using instruction bit numbers, as shown in Figure 10, as

imm (31:12), rd (11:7), opcode(6:0) = 0110111

Figure 10. Instruction format for the LUI instruction, using bit numbers from the instruction.

We will use both representations, with bit numbers for the fields, as illustrated in Figure 9 and using brackets to separate the fields, and with bit numbers from the instruction, as illustrated in Figure 10 and using parantheses to separate the fields.

In Section 2.1 in ISA, we see that there are 32 registers, each 32-bits wide, referred to as registers x0 to x31.

We also see that in register x0, all bits are hardwired to the value zero.

When considering a certain calling convention, registers are often given dedicated roles.

For the registers x0 to x31 in RISC-V, a list of such roles, together with a role-specific, alternative name for each register, is given in Table 25.1 in Chapter 25 in ISA.

For example, register x1 (named ra) is used as return address and register x2 (named sp) is used as stack pointer.

There are also registers that are used for storage of temporary values, such as x5 (named t0, and also serving as alternate link register), and x6 and x7 (named t1 and t2, respectively).

The alternative register names are referred to as ABI names in Table 26.1 in ISA.

ABI names for registers are used in assembly programs.

Using the LUI instruction and the register ABI names, we can create a program that performs actions, as

  1. write three different values to registers t0, t1, and t2.
  2. write the value zero to registers t0, t1, and t2

In assembly language, we could write a program, using lowercase for the instruction name, as

lui t0, 1
lui t1, 2
lui t2, 3
lui t0, 0
lui t1, 0
lui t2, 0

Figure 11. An assembly program, using a LUI instruction to write values to registers.

We recall that the LUI instruction writes the immediate value, which is 1 for the first instruction in our program, to the top 20 bits of the destination register, which for this instruction is t0, while at the same time filling in the lowest 12 bits with zeros.

For the first instruction in Figure 11, which is

lui t0, 1

this means that the number being stored in t0 is 1 followed by 12 zeros. In binary form, this becomes

1000000000000

Counting the bits from right to left, with the rightmost bit having number zero, we know, from the properties of binary numbers that the n:th bit has the weight 2^n.

In this number, all weights are zero except for bit number 12. This gives the corresponding decimal number as

2^12 = 4096

We can write this number also in hexadecimal form. One way of arriving at the hexadecimal representation is to start with the binary representation, in this case

1000000000000

and then group the bits, in groups of four bits in each group. This gives

1 0000 0000 0000

We then let each group of four bits be represented by one hexadecimal digit. Using the prefix 0x, which is commonly used for to indicate that a number is hexadecimal, we get

1 0000 0000 0000 = 0x1000

In a similar way, we can calculate the value that will be stored in register t1, by the instruction

lui t1, 2

as

10 0000 0000 0000 = 0x2000

which, when converted to decimal form, becomes

0x2000 = 8192

For the third instruction in Figure 11,

lui t2, 3

the corresponding calculation yields

0x3000 = 12288

In order to run the program in Figure 11 on our computer, which will be build in the sections that follow, we need to write the program using binary code.

We saw, in Chapter 24 in ISA, in Table 24.2, that the opcode for LUI is 0110111.

We have also seen, in Figure 9 and Figure 10 , how the top 20 bits of the value to be stored in the destination register are represented in the instruction.

We see, in Chapter 25 in ISA, in Table 25.1, how registers t0, t1, and t2 are ABI names for the registers x5, x6, and x7.

Using the numeric values 5, 6, and 7 for these registers, we can now write the program in Figure 11 in binary code, as

00000000000000000001 00101 0110111
00000000000000000010 00110 0110111
00000000000000000011 00111 0110111
00000000000000000000 00101 0110111
00000000000000000000 00110 0110111
00000000000000000000 00111 0110111

Grouping the binary digits in groups of four gives

0000 0000 0000 0000 0001 0010 1011 0111
0000 0000 0000 0000 0010 0011 0011 0111
0000 0000 0000 0000 0011 0011 1011 0111
0000 0000 0000 0000 0000 0010 1011 0111
0000 0000 0000 0000 0000 0011 0011 0111
0000 0000 0000 0000 0000 0011 1011 0111

We can convert the program to a representation where we use hexadecimal numbers. This conversion results in the program shown in Figure 12.

0x000012B7
0x00002337
0x000033B7
0x000002B7
0x00000337
0x000003B7

Figure 12. A binary program, using a LUI instruction to write values to registers.

4.2 Addressing a Memory

This is the VHDL layer The other layers are: Verilog SystemC/TLM

We can store a program, like the program shown in Figure 12, in a memory.

The program in Figure 12 consists of instructions. Each instruction is represented by a 32-bit word.

As a first step towards executing the program, we can create a program counter that reads the 32-bit instructions, one by one, from a memory.

Reading an instruction is done by using the program counter value to address the memory. When we are done with reading an instruction, we might want to read the next instruction.

We could imagine a program counter that refers to a specific 32-bit word, stored in the memory. In a program with 32-bit instructions, like the program in Figure 12, this makes it possible to read the next instruction by adding one to the program counter.

Another alternative is to let the program counter represent an address expressed in bytes. In such a situation, we can read the next instruction by incrementing the program counter by four. This type of addressing is referred to as byte-addressing.

This is the VHDL layer The other layers are: Verilog SystemC/TLM

A memory implementation in VHDL is found in the file instruction/memory/vhdl/memory.vhdl in the book repo.

The architecture part of this file is shown in Figure 13.

architecture rtl of memory is

  type memory_type is array(0 to size-1) of
    std_logic_vector(data_width-1  downto 0); 

  impure function init_memory return memory_type is
    file in_file: text is in "../memory_contents.txt";
    variable in_line: line;
    variable s: std_logic_vector(data_width-1 downto 0);
    variable m: memory_type; 
  begin
    for i in 1 to integer(m'length) loop
      if not endfile(in_file) then
        readline(in_file, in_line);
        hread(in_line, s);
        m(i-1) := s; 
      else
        m(i-1) := (others => 'X');
      end if; 
    end loop;
    return m;
  end function;
  
  signal mem: memory_type := init_memory;

begin

  update: process(clk)
  begin
    if rising_edge(clk) then 
      if write_enable = '1' then
        mem(to_integer(unsigned(address))) <= data_in;
      end if;
    end if;
  end process;
  
  data_out <= mem(to_integer(unsigned(address))); 

end rtl; 

memory.vhdl

Figure 13. A memory in VHDL.

The memory implementation in Figure 13 defines a data type, called memory_type, that represents the memory contents.

The data type memory_type is defined as an array, with elements of the type std_logic_vector.

The memory contents are initialized using a VHDL function called init_memory.

The function init_memory reads data from a file named memory_contents.txt, and stores the data into a variable m.

The function init_memory is called, as

  signal mem: memory_type := init_memory;

The result of the call is that the actual memory contents, as represented by the signal mem, are initialized.

A VHDL process called update handles writing of data to the memory.

The process update uses variables defined in the interface part of the VHDL implementation of the memory, which is defined as a VHDL entity as

entity memory is
  generic (address_width: integer := 32;
           data_width: integer := 32;
           size: integer := 256);
  port(
    clk: in std_logic;
    write_enable: in std_logic; 
    address: in std_logic_vector(address_width-1 downto 0); 
    data_in: in std_logic_vector(data_width-1 downto 0);
    data_out: out std_logic_vector(data_width-1 downto 0));

In the process update, implemented as

  update: process(clk)
  begin
    if rising_edge(clk) then 
      if write_enable = '1' then
        mem(to_integer(unsigned(address))) <= data_in;
      end if;
    end if;
  end process;

data is written at the rising edge of the clock, if the write_enable signal is one, by assigning the value of the input data_in to one of the elements in the memory array. The element to be written is defined by the input address.

Data is read from the memory by assigning a value from the memory array, at a position defined by the input address, to the output data_out. This assignment is done outside of the update process, as

  data_out <= mem(to_integer(unsigned(address))); 

A program counter implementation in VHDL is shown in Figure 14.

library ieee;
use ieee.std_logic_1164.all; 
use ieee.numeric_std.all;

entity pc is
  generic (pc_width: integer := 32); 
  port(
    clk: in std_logic;
    pc_out: out std_logic_vector(pc_width-1 downto 0) := (others => '0'));
  end pc;

architecture rtl of pc is

  signal pc_value: std_logic_vector(pc_width-1 downto 0) := (others => '0');

begin

  update: process(clk)
  begin
    if rising_edge(clk) then
      pc_value <= std_logic_vector(unsigned(pc_value) + 4);
    end if;
  end process; 

  pc_out <= pc_value;

end rtl; 

pc.vhdl

Figure 14. A program counter in VHDL.

The current value of the program counter is represented by a signal named pc_value, defined as

  signal pc_value: std_logic_vector(pc_width-1 downto 0) := (others => '0');

The signal pc_value is updated on the rising edge of the clock, by adding the value 4 to pc_value, as

      pc_value <= std_logic_vector(unsigned(pc_value) + 4);

An output pc_out is assigned the value of the program counter, as

  pc_out <= pc_value;

We can connect the memory in Figure 13 with the program counter in Figure 14. By doing so, we can use the program counter to address a memory, where a program is stored. We can then read instructions, one by one, by incrementing the program counter.

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The connection of the memory in Figure 13 with the program counter in Figure 14 can be done in a testbench, stored in a file named addressing_tb.vhdl.

In the testbench, we define the clock signal as

  signal clk: std_logic := '0';

  constant clk_half_period: time := 2 ns; 
  constant n_clk_cycles: integer := 7; 

We set the write enable signal and the data being written to the memory to zero, as

  signal write_enable: std_logic := '0'; 
  signal data_in: std_logic_vector(data_width-1 downto 0) :=
    (others => '0'); 

We define signals, for the program counter, as

  signal pc_value: std_logic_vector(address_width-1 downto 0);

and for the data read from the memory, as

  signal data_out: std_logic_vector(data_width-1 downto 0);

The memory address is computed from the program counter, as

  pc_address <= "00" & pc_value(address_width-1 downto 2);

We use these signals in connections, for the memory as

  memory_0: memory
    port map(
      clk => clk,
      write_enable => write_enable,
      address => pc_address,
      data_in => data_in, 
      data_out => data_out);

and for the program counter, as

  pc_0: pc
    port map(
      clk => clk,
      pc_out => pc_value);

The clock signal is generated as before, e.g. as in Section A Testbench, as

  clk_gen: process is
  begin
    for i in 1 to n_clk_cycles loop
      clk <= '0';
      wait for clk_half_period;
      clk <= '1';
      wait for clk_half_period; 
    end loop;
    wait; 
  end process;

We prepare the memory contents in a file, with contents corresponding to the program shown in Figure 12, as

000012B7
00002337
000033B7
000002B7
00000337
000003B7

When we run the simulation, we get

addressing_tb.vhdl:82:8:@2ns:(report note): 
pc_value=0000, data_out=000012b7
addressing_tb.vhdl:82:8:@6ns:(report note): 
pc_value=0004, data_out=00002337
addressing_tb.vhdl:82:8:@10ns:(report note): 
pc_value=0008, data_out=000033b7
addressing_tb.vhdl:82:8:@14ns:(report note): 
pc_value=000c, data_out=000002b7
addressing_tb.vhdl:82:8:@18ns:(report note): 
pc_value=0010, data_out=00000337
addressing_tb.vhdl:82:8:@22ns:(report note): 
pc_value=0014, data_out=000003b7
addressing_tb.vhdl:82:8:@26ns:(report note): 
pc_value=0018, data_out=xxxxxxxx

We see that the memory contents, written in binary from when we run the simulation, correspond to the program in Figure 12.

4.3 Decoding the Instruction

This is the VHDL layer The other layers are: Verilog SystemC/TLM

From Section A Program, we know that the LUI instruction, which we use in the program in Figure 11, has a format that consists of three parts, as illustrated in Figure 10.

We can create a simple instruction decoder that, given an input in the form of an instruction having the same format as the LUI instruction, generates output data, in the form of

  1. a 32-bit immediate value, with bits 31:12 given by the corresponding bits in the instruction, and with bits 11:0 set to zero.

  2. A 5-bit register id, as given by bits 11:7 in the instruction.

An instruction decoder for instructions with instruction format as specified in Figure 10 is shown in Figure 15.

library ieee;
use ieee.std_logic_1164.all; 

entity idecode is
  port(
    instr: in std_logic_vector(31 downto 0);
    rd: out std_logic_vector(4 downto 0);
    imm_value: out std_logic_vector(31 downto 0));
  end idecode;

architecture rtl of idecode is

  constant zero_12: std_logic_vector(11 downto 0) := (others => '0');
  
begin

  rd <= instr(11 downto 7); 
  imm_value <= instr(31 downto 12) & zero_12;

end rtl; 

idecode.vhdl

Figure 15. An instruction decoder for instructions having the same format as the LUI instruction.

The instruction decoder in Figure 15 has an output named rd. This output, which represents the register id, is assigned, using bits 11:7 from the instruction, as

  rd <= instr(11 downto 7); 

There is also an output named named imm_value, which stores the immediate value inferred bythe instruction. This output is assigned, using the uppermost 20 bits of the instruction in combination with 12 zero bits, as

  imm_value <= instr(31 downto 12) & zero_12;

Registers, as described in Section A Register, can be combined into a register file.

A register file implementation, with three registers, is shown in Figure 16.

library ieee;
use ieee.std_logic_1164.all; 
use ieee.numeric_std.all;

entity registers is
  port(
    clk: in std_logic;
    write_enable: in std_logic;
    rd: in std_logic_vector(4 downto 0);
    rd_value: in std_logic_vector(31 downto 0);
    r0_value: out std_logic_vector(31 downto 0);
    r1_value: out std_logic_vector(31 downto 0);
    r2_value: out std_logic_vector(31 downto 0));
  end registers;

architecture rtl of registers is

  signal reg_0: std_logic_vector(31 downto 0);
  signal reg_1: std_logic_vector(31 downto 0);
  signal reg_2: std_logic_vector(31 downto 0);

begin

  update: process(clk)
  begin
    if rising_edge(clk) then
      if write_enable = '1' then
        case to_integer(unsigned(rd) - 5) is
          when 0 =>
            reg_0 <= rd_value;
          when 1 =>
            reg_1 <= rd_value;
          when 2 =>
            reg_2 <= rd_value;
          when others =>
        end case; 
      end if;
    end if;
  end process; 

  r0_value <= reg_0;
  r1_value <= reg_1;
  r2_value <= reg_2;
  
end rtl; 

registers.vhdl

Figure 16. A register file with three registers.

A computer capable of running the program in Figure 11 can now be constructed, by connecting the memory in Figure 13, the program counter in Figure 14, the instruction decoder in Figure 15, and the register file in Figure 16.

We can do these connections in a testbench, stored in a file named one_instruction_tb.vhdl.

In the testbench, we define the clock signal as

  signal clk: std_logic := '0';

which is used in the clock generation, as

  clk_gen: process is
  begin
    for i in 1 to n_clk_cycles loop
      clk <= '0';
      wait for clk_half_period;
      clk <= '1';
      wait for clk_half_period; 
    end loop;
    wait; 
  end process;

We set the write enable signal and the data being written to the memory to zero, as

  signal mem_write_enable: std_logic := '0'; 
  signal data_in: std_logic_vector(31 downto 0) :=
    (others => '0'); 

We define signals, for the program counter, and for the data read from the memory, as

  signal pc_value: std_logic_vector(31 downto 0);
  signal data_out: std_logic_vector(31 downto 0);

The program counter, which is defined as

  signal pc_address: std_logic_vector(31 downto 0) := (others => '0');

is used when computing the memory address, as

  pc_address <= "00" & pc_value(31 downto 2);

We instantiate the program counter as

  pc_0: pc
    port map(
      clk => clk,
      pc_out => pc_value);

We instantiate the memory, with its connection to the program counter, via the signal pc_address, as

  memory_0: memory
    port map(
      clk => clk,
      write_enable => mem_write_enable,
      address => pc_address,
      data_in => data_in, 
      data_out => data_out);

We instantiate the instruction decoder, with its connection to the memory, via the signal data_out, as

  idecode_0: idecode
    port map(
      instr => data_out,
      rd => rd,
      imm_value => imm_value);

The register file is instantiated, with its connections to the instruction decoder, via the signals rd and imm_value, as

  registers_0: registers
    port map(
      clk => clk,
      write_enable => reg_write_enable,
      rd => rd,
      rd_value => imm_value,
      r0_value => r0_value, 
      r1_value => r1_value, 
      r2_value => r2_value);

A block diagram of the design is shown in Figure 17.

vhdl_dia_one_instruction Figure 17. A block diagram of our first computer, capable of running programs with LUI instructions.

The block diagram in Figure 17 shows the program counter, in a block labelled PC. The program counter addresses the memory, which results in an instruction being read. The instruction is used as input to the instruction decoder, in a block labelled Idecode.

The instruction decoder decodes the instruction, which in this case results in the fields imm and reg id, shown also in Figure 10, being extracted from the instruction, and used as input to the register file, here represented by a block labelled Registers.

4.4 Running the program

This is the VHDL layer The other layers are: Verilog SystemC/TLM

We store the memory contents, corresponding to the program in Figure 12, in a file memory_contents.txt.

This file will be read, during startup, and stored in the memory shown in Figure 13.

We use reporting statements, in the testbench in one_instruction_tb.vhdl, to illustrate the execution of the program, as

  reporter: process(clk) is
  begin
    if (rising_edge(clk)) then
       report LF & "pc_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(pc_value))) & 
         ", data_out=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(data_out))) & 
         LF & "rd=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(rd))) & 
         ", imm_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(imm_value))) & 
         LF & "r0_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(r0_value))) & 
         ", r1_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(r1_value))) & 
         ", r2_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(r2_value))) &
         LF;
    end if; 
  end process; 

These statements give a printout of

Running the program gives, for the first three instructions, a printout as

one_instruction_tb.vhdl:122:8:@2ns:(report note): 
pc_value=00000000, data_out=000012b7
rd=05, imm_value=00001000
r0_value=xxxxxxxx, r1_value=xxxxxxxx, r2_value=xxxxxxxx

one_instruction_tb.vhdl:122:8:@6ns:(report note): 
pc_value=00000004, data_out=00002337
rd=06, imm_value=00002000
r0_value=00001000, r1_value=xxxxxxxx, r2_value=xxxxxxxx

one_instruction_tb.vhdl:122:8:@10ns:(report note): 
pc_value=00000008, data_out=000033b7
rd=07, imm_value=00003000
r0_value=00001000, r1_value=00002000, r2_value=xxxxxxxx

one_instruction_tb.vhdl:122:8:@14ns:(report note): 
pc_value=0000000c, data_out=000002b7
rd=05, imm_value=00000000
r0_value=00001000, r1_value=00002000, r2_value=00003000

We see, at times 6 ns, 10 ns, and 14 ns, that the three registers have values as expected from the first three instructions in the program in Figure 11.

For the last three instructions, we get

one_instruction_tb.vhdl:122:8:@18ns:(report note): 
pc_value=00000010, data_out=00000337
rd=06, imm_value=00000000
r0_value=00000000, r1_value=00002000, r2_value=00003000

one_instruction_tb.vhdl:122:8:@22ns:(report note): 
pc_value=00000014, data_out=000003b7
rd=07, imm_value=00000000
r0_value=00000000, r1_value=00000000, r2_value=00003000

one_instruction_tb.vhdl:122:8:@26ns:(report note): 
pc_value=00000018, data_out=00000000
rd=00, imm_value=00000000
r0_value=00000000, r1_value=00000000, r2_value=00000000

We see, at times 18 ns, 22 ns, and 26 ns, that the three registers have values as expected from the last three instructions in the program in Figure 11.

5 Hello Assembly World

The program in Figure 11 has only one type of instruction. We can create a larger program, for example a hello world program, and then use that program to determined which instructions to add to our computer.

5.1 The Program

We create a program, that we aim to run on our computer.

To begin with, we run the program on a computer simulated in QEMU.

We can download and build a version of QEMU that simulates RISC-V, by following the instructions in the book software repo, for Ubuntu and Mac.

A program, printing Hello, is listed in Figure 18.

.global _start

_start:

    # the value loaded in t0 is the upper 20 bits of the base for
    # SIFIVE_U_DEV_UART0 in sifive_u_memmap struct in
    # https://git.qemu.org/?p=qemu.git;a=blob;f=hw/riscv/sifive_u.c
    lui t0, 0x10010

    andi t1, t1, 0
    addi t1, t1, 72
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 101
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 108
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 108
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 111
    sw t1, 0(t0)

    andi t1, t1, 0
    addi t1, t1, 10
    sw t1, 0(t0)

finish:
    beq t1, t1, finish

hello.s

Figure 18. A hello world program in RISC-V assembly.

The program in Figure 18 uses the lui instruction. A constant value, here chosen as 0x10010, is used as operand. The value 0x10010 is the base address for one of the UARTs, in the hardware that we are simulating with this configuration of QEMU.

We use the sw instruction to write a character to the UART. We store each character in a register, in this case t1, and we write the characters to the UART, using the sw instruction repeatedly, as can be seen in Figure 18.

We store a character in t1 by first storing the value zero in t1, using the andi instruction. When this is done, we use the addi instruction to store the ASCII code of the character we want to write, in t1. This can be seen in Figure 18, where the first character, an ‘H’ with ASCII code 72, is stored using andi and addi.

The last instruction in the program in Figure 18 is beq, which does a branch if its operands are equal. We use the beq instruction to create an infinite loop, by branching to the address of the beq instruction. The purpose is to prevent the computer from incrementing the program counter to a value that points to an address outside of our program. In this way, we prevent the computer from executing possibly illegal instructions.

5.2 Tools

We use a GNU toolchain, for assembling and linking our program

We can download and install the toolchain, using instructions in the book software repo, for Ubuntu and Mac.

5.3 Testing in QEMU

We can build and run the program in Figure 18, using files available in the book repo.

We do the build by navigating, from the base of the repo, to the directory of these files, as

cd hello_asm/asm

and then issue the make command, as

make

The program can be run, using the script run_interactive.sh, as

./run_interactive.sh 

which should lead to QEMU being started, and the string Hello being printed.

We note that the string Hello is printed twice. The reason for this is that the computer we simulate in QEMU has two processors, and the program is executed on both of these processors.

QEMU can be closed down, using the key combination C-a x.

We can also run the program using expect, where a string for closing QEMU is automatically sent to the program.

We run the program using expect, via the script run.sh, as

./run.sh

This should result in printouts, as

spawn qemu-system-riscv32 -machine sifive_u -nographic -bios none -kernel hello -echr 69
Hello
Hello

5.4 And Immediate

5.4.1 Instruction Format

In Section 2.4 of ISA, we find a description of the instruction format for Integer Register-Immediate Instructions.

The bits in this instruction format are

We also see that that the and immediate instruction ANDI has this format.

The instruction performs a bitwise and operation, between an immediate value and a value stored in a register. The result of the operation is stored in a destination register.

More specifically, as stated in Section 2.4 of ISA, we note that ANDI is a logical operation that performs a bitwise and on “register rs1 and the sign-extended 12-bit immediate” and places the result in rd.

We conclude that

5.4.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for ANDI is 0010011.

We also see that the value of the funct3 field, for ANDI, is 111.

We can write the ANDI instruction, with the fields described above, as a 32-bit binary word. The resulting format is illustrated in Figure 19, as

imm [11:0], rs1[4:0], funct3[(2:0] = 111, rd[4:0], opcode[6:0] = 0010011

Figure 19. Instruction format for the ANDI instruction, adapted from Table 24.2 in ISA.

Alternatively, we can represent the ANDI instruction, using the bit numbers in the instruction, as shown in Figure 20, as

imm (31:20), rs1(19:15), funct3(14:12) = 111, rd (11:7), opcode(6:0) = 0010011

Figure 20. Instruction format for the ANDI instruction, using bit numbers from the instruction.

5.4.3 A Program

As an example program that uses the ANDI instruction, we can use the first two instructions in the program in Figure 18.

This results in a program, as

    lui t0, 0x10010
    andi t1, t1, 0

Translating the first instruction to binary, with the hex value 0x10010 expressed in binary, as

0001 0000 0000 0001 0000

together with the instruction format for the LUI instruction, as shown in Figure 10, and the register number for t0, which according to Table 25.1 in ISA is 5, gives, for the LUI instruction

0001 0000 0000 0001 0000 00101 0110111

which, when grouped into eight groups of four binary digits each, becomes

0001 0000 0000 0001 0000 0010 1011 0111

which we can write in heximal notation, as

100102B7

For the ANDI instruction, using the instruction format in Figure 20 and the register number for t1, which according to Table 25.1 in ISA is 6, we get

000000000000 00110 111 00110 0010011

which, when grouped into eight groups of four binary digits each, becomes

0000 0000 0000 0011 0111 0011 0001 0011

which we can write in hexadecimal notation, as

00037313

We can write the program, in hexadecimal format, as shown in Figure 21.

100102B7
00037313

Figure 21. Program code, in hexadecimal format, for the first two instructions in the program in Figure 18.

5.4.4 Extending our Computer

In Section 2.4 of ISA, instructions of type OP-IMM are described.

Among these, we find the ANDI instruction, with an instruction format as illustrated in in Figure 19 and Figure 20.

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The instruction decoder in Figure 15 is extended, so that also instructions of type OP-IMM are decoded.

The extended instruction decoder is shown in Figure 22.

entity idecode is
  port(
    instr: in std_logic_vector(31 downto 0);
    imm_value: out std_logic_vector(31 downto 0);
    rs1: out std_logic_vector(4 downto 0);
    rd: out std_logic_vector(4 downto 0);
    opcode: out std_logic_vector(6 downto 0));
  end idecode;

architecture rtl of idecode is

  constant LUI: std_logic_vector(6 downto 0) := b"0110111";
  constant OP_IMM: std_logic_vector(6 downto 0) := b"0010011";

  signal op_s: std_logic_vector(6 downto 0);

  constant zero_12: std_logic_vector(11 downto 0) := (others => '0');
  constant zero_32: std_logic_vector(31 downto 0) := (others => '0');
  
begin

  opcode <= instr(6 downto 0); 

  op_s <= instr(6 downto 0);
  
  imm_value <= instr(31 downto 12) & zero_12 when op_s = LUI else
               (31 downto 11 => instr(31)) & instr(30 downto 20) when op_s = OP_IMM
               else zero_32;
               
  rs1 <= instr(19 downto 15); 

  rd <= instr(11 downto 7); 

end rtl; 

idecode.vhdl

Figure 22. An instruction decoder for LUI and OP-IMM instructions (of which ANDI is one).

We can see, in Figure 22 how the opcode value for OP-IMM instructions is defined, as

  constant OP_IMM: std_logic_vector(6 downto 0) := b"0010011";

The opcode is extracted from the instruction, as

  opcode <= instr(6 downto 0); 

and used in the assignment of the immediate value, as

  imm_value <= instr(31 downto 12) & zero_12 when op_s = LUI else
               (31 downto 11 => instr(31)) & instr(30 downto 20) when op_s = OP_IMM
               else zero_32;

where, for the instruction type OP-IMM, we see how the 32-bit immediate value is computed, by sign-extending the 12-bit imm field in Figure 19.

We also see, in Figure 22, how the register identity rs1 for the source register, and the register identity rd for the destination register, are assigned, as

  rs1 <= instr(19 downto 15); 

  rd <= instr(11 downto 7); 

Figure 22 also shows how the LUI instruction, with format according to Figure 10 is decoded, in the same way as it is decoded in Figure 15.

The ANDI instruction shall perform an and operation.

We introduce an ALU, for performing logical and arithmetic operations.

An ALU, capable of performing an and operation, is shown in Figure 23.

entity alu is
  port(
    a: in std_logic_vector(31 downto 0);
    b: in std_logic_vector(31 downto 0);
    opcode: in std_logic_vector(6 downto 0);
    result: out std_logic_vector(31 downto 0));
  end alu;

architecture rtl of alu is

  constant LUI: std_logic_vector(6 downto 0) := b"0110111";
  constant OP_IMM: std_logic_vector(6 downto 0) := b"0010011";

  constant zero_32: std_logic_vector(31 downto 0) := (others => '0');
  
begin

  result <= a when opcode = LUI else
            a and b when opcode = OP_IMM
            else zero_32;

end rtl; 

alu.vhdl

Figure 23. An ALU with an and operation.

We see, in Figure 23 how the and operation is performed, when the opcode is representing an OP-IMM instruction.

We also see, in Figure 23, that for the LUI instruction, no operation is performed.

The computer executing the program in Figure 11 has registers defined as in Figure 16.

We extend the registers, to a bank of 32 registers, as shown in Figure 24.

entity registers is
  port(
    clk: in std_logic;

    rs1: in std_logic_vector(4 downto 0);
    rs2: in std_logic_vector(4 downto 0);
    rd: in std_logic_vector(4 downto 0);

    rd_value: in std_logic_vector(31 downto 0);

    write_enable: in std_logic;

    rs1_value: out std_logic_vector(31 downto 0);
    rs2_value: out std_logic_vector(31 downto 0));
  end registers;

architecture rtl of registers is

  type reg_file_array is array(0 to 31) of std_logic_vector(31 downto 0);
  signal reg_file: reg_file_array;

begin

  update: process(clk)
  begin
    if rising_edge(clk) then
      if write_enable = '1' then
        reg_file(to_integer(unsigned(rd))) <= rd_value;
      end if;
    end if;
  end process; 

  rs1_value <= reg_file(to_integer(unsigned(rs1)));
  rs2_value <= reg_file(to_integer(unsigned(rs2)));
  
end rtl; 

registers.vhdl

Figure 24. A register bank with 32 registers.

We can see, in Figure 24, how the register indicated by rd is updated, if the write enable signal we is one, in a process called update, as

  update: process(clk)
  begin
    if rising_edge(clk) then
      if write_enable = '1' then
        reg_file(to_integer(unsigned(rd))) <= rd_value;
      end if;
    end if;
  end process; 

To complete our computer, for this step of our development, we need a memory and a program counter.

We choose to re-use the memory, as shown in Figure 13, and the program counter, as shown in Figure 14.

A block diagram of the design is shown in Figure 25.

vhdl_dia_andi Figure 25. A block diagram of our computer, capable of running programs with LUI and ANDI instructions.

A computer, designed according to the block diagram in Figure 25 and capable of running a program consisting of the first two instructions in the program in Figure 18, with hexadecimal representation according to Figure 21, can now be implemented.

We do the implementation in a testbench, stored in a file named andi_tb.vhdl.

The testbench in andi_tb.vhdl instantiates

It can be seen, in andi_tb.vhdl, how the instantiated components are connected, in order to implement the design shown in Figure 25.

5.4.5 Running the program

This is the VHDL layer The other layers are: Verilog SystemC/TLM

The testbench in andi_tb.vhdl contains report statements, as

  reporter: process(clk) is
  begin
    if (rising_edge(clk)) then
       report LF & "pc_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(pc_value))) & 
         ", data_out=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(data_out))) & 
         ", rs1=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(rs1))) & 
         LF & "rd=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(rd))) & 
         ", imm_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(imm_value))) & 
         ", rd_value=" &
         bin_to_hex(reverse_string(std_logic_vector_to_string(rd_value))) &
         LF;
    end if; 
  end process; 

Running our program, which consists of the first two instructions in the program in Figure 18, with hexadecimal representation according to Figure 21, gives a printout as

andi_tb.vhdl:151:8:@2ns:(report note): 
pc_value=00000000, data_out=100102b7, rs1=02
rd=05, imm_value=10010000, rd_value=10010000

andi_tb.vhdl:151:8:@6ns:(report note): 
pc_value=00000004, data_out=00037313, rs1=06
rd=06, imm_value=00000000, rd_value=00000000

In the printout, the name rd_value refers, with reference to Figure 25, to the result output signal from the ALU which is connected to the rd_value input signal in the register file, with implementation according to Figure 24.

We see, at time 2 ns, how the value 10010000 is written to register 5.

Using the fact, according to Chapter 25 in ISA, in Table 25.1, that register 5 has the ABI name t0, we can conclude that the printout at 2 ns seems consistent with the first instruction in Figure 18.

In a similar way, by observing the rd_value 00000000 at time 6 ns in the printout, and noting that register 6 has the ABI name t1, we can conclude that the printout at 6 ns seems consistent with the second instruction in Figure 18.

5.5 Add Immediate

5.5.1 Instruction Format

In Section 2.4 of ISA, in the description of the instruction format for Integer Register-Immediate Instructions, we see that that the and immediate instruction ANDI has this instruction format.

The bits in this instruction format are described in Section And Immediate

The ADDI instruction performs an addition, between an immediate value and a value stored in a register. The result of the operation is stored in a destination register.

More specifically, as stated in Section 2.4 of ISA, we note that “ADDI adds the sign-extended 12-bit immediate to register rs1”.

5.5.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for ADDI is 0010011.

We also see that the value of the funct3 field, for ADDI, is 000.

We can write the ADDI instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in Figure 26, as

imm [11:0], rs1[4:0], funct3[(2:0] = 000, rd[4:0], opcode[6:0] = 0010011

Figure 26. Instruction format for the ADDI instruction, adapted from Table 24.2 in ISA.

Alternatively, we can represent the ADDI instruction, using the bit numbers in the instruction, as shown in Figure 27, as

imm (31:20), rs1(19:15), funct3(14:12) = 000, rd (11:7), opcode(6:0) = 0010011

Figure 27. Instruction format for the ADDI instruction, using bit numbers from the instruction.

5.5.3 A Program

As an example program that uses the ADDI instruction, we can use the first three instructions in the program in Figure 18.

This results in a program, as

    lui t0, 0x10013
    andi t1, t1, 0
    addi t1, t1, 72

The first two instructions in this program are translated to binary in Section And Immediate and shown in hexadecimal format in Figure 21.

A translation to binary of the third instruction

    addi t1, t1, 72

can be done, using the instruction format in Figure 27 and the register number for t1, which according to Table 25.1 in ISA is 6.

The result, using the binary value for 72, which is 1001000, is

000001001000 00110 000 00110 0010011

Grouping the binary digits into groups of four binary digits each, gives

0000 0100 1000 0011 0000 0011 0001 0011

which can be written in hexadecimal format, as

04830313

The complete program, in hexadecimal format, using the hexadecimal format of the first two instructions from Figure 21 is shown in Figure 28.

100132B7
00037313
04830313

Figure 28. Program code, in hexadecimal format, for the first three instructions in the program in Figure 18.

5.5.4 Extending our Computer

This is the VHDL layer The other layers are: Verilog SystemC/TLM

In Chapter 24 in ISA, in Table 24.2, and in Figure 20 and Figure 27, we see that the opcode for both ANDI and ADDI is 0010011.

We also see that ANDI and ADDI have different values for the funct3 field.

We update the instruction decoder, shown in Figure 22, so that funct3 is supported.

We define funct3 as an output, as

    funct3: out std_logic_vector(2 downto 0));

and assign a value to funct3, by extracting, from the instruction, the bits corresponding to the funct3 field, as

  funct3 <= instr(14 downto 12); 

The updated instruction decoder can be found in the book repo.

The ALU, shown in Figure 23, is updated so that funct3 is respected, for the opcode OP_IMM.

We add funct3 as input, as

    funct3: in std_logic_vector(2 downto 0);

and use it, when selecting and performing the ALU operation, as

  result <= a when opcode = LUI else
            a and b when opcode = OP_IMM and funct3 = b"111" else
            std_logic_vector(unsigned(a) + unsigned(b)) when
              opcode = OP_IMM and funct3 = b"000"
            else zero_32;

The updated ALU can be found in the book repo.

To complete our computer, for this step of our development, we re-use the memory, the program counter, and the register file from Section And Immediate.

A block diagram of the design is shown in Figure 29.

vhdl_dia_addi Figure 29. A block diagram of our computer, capable of running programs with LUI, ANDI, and ADDI instructions.

A computer, designed according to the block diagram in Figure 29 and capable of running a program consisting of the first three instructions in the program in Figure 18, with hexadecimal representation according to Figure 28, can now be implemented.

We do the implementation in a testbench, stored in a file named addi_tb.vhdl.

The testbench in addi_tb.vhdl instantiates

It can be seen, in addi_tb.vhdl, how the instantiated components are connected, in order to implement the design shown in Figure 29.

5.5.5 Running the program

This is the VHDL layer The other layers are: Verilog SystemC/TLM

Running our program, which consists of the first three instructions in the program in Figure 18, with hexadecimal representation according to Figure 28, gives a printout, from the report statements in addi_tb.vhdl, as

addi_tb.vhdl:156:8:@2ns:(report note): 
pc_value=00000000, data_out=100102b7, rs1=02
rd=05, imm_value=10010000, rd_value=10010000

addi_tb.vhdl:156:8:@6ns:(report note): 
pc_value=00000004, data_out=00037313, rs1=06
rd=06, imm_value=00000000, rd_value=00000000

addi_tb.vhdl:156:8:@10ns:(report note): 
pc_value=00000008, data_out=04830313, rs1=06
rd=06, imm_value=00000048, rd_value=00000048

We see, at time 2 ns, how the value 10010000 is written to register 5.

Using the fact that register 5 has the ABI name t0, we can conclude that the printout at 2 ns seems consistent with the first instruction in Figure 18.

In a similar way, by observing, at time 6 ns, how the value 00000000 is written to register 6 and noting that register 6 has the ABI name t1, we can conclude that the printout at 6 ns seems consistent with the second instruction in Figure 18.

For the third instruction, at time 10 ns, we see that the hexadecimal value 48 is written to register t1.

Noting that the hexadecimal value 48 corresponds to the decimal value 72, and that the value in register t1 was 0 after the second instruction, we can conclude that the printout at time 10 ns seems consistent with the addition of 72 to t1, as done in the third instruction in Figure 18.

5.6 Store to Memory

FROM HERE ON THE BOOK IS IN A MORE WORK-IN-PROGRESS STATE

WORK IS ONGOING TO COMPLETE THE BOOK, AND RELEASE IT

5.6.1 Instruction Format

In Section 2.6 of ISA, we find a description of the instruction formats for Load and Store Instructions.

The instruction format for store instructions has

Store instructions store bits from the rs2 register to memory.

In addition, we note that the imm[11:5] field and the imm[4:0] represent an offset.

The rs1 field is used, together with the offset, to calculate the address where data shall be stored. The address is calculated as the sum of the rs1 field and the sign-extended offset.

5.6.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for SW is 0100011.

We also see that the value of the funct3 field, for SW, is 010.

We can write the SW instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in [Figure 30][vhdl_fig_op_sw_format], as

imm[11:5] (31:25), rs2(24:20), rs1(19:15), funct3(14:12) = 010, imm[4:0] (11:7), opcode(6:0) = 0100011

Figure 30. Instruction format for the SW instruction, adapted from Table 24.2 in ISA.

5.6.3 A Program

As an example program that uses the SW instruction, we can use the first four instructions in the program in Figure 18.

This results in a program, as

    lui t0, 0x10013
    andi t1, t1, 0
    addi t1, t1, 72
    sw t1, 0(t0)

The first three instructions in this program are translated to binary in Section TBD, and shown in hexadecimal format in Figure 28.

A translation to binary of the fourth instruction

    sw t1, 0(t0)

can be done, using the instruction format in [Figure 30][vhdl_fig_op_sw_format] and the numbers for t0, which according to Table 25.1 in (ref) is 5, and t1, which according to Table 25.1 in (ref) is 6.

The result is written in the format of [Figure 30][vhdl_fig_op_sw_format] as

0000000 00110 00101 010 00000 0100011

Grouping the binary digits into groups of four binary digits each, give

0000 0000 0110 0010 1010 0000 0010 0011

which can be written in hexadecimal format, as

0062A023

The complete program, in hexadecimal format, using the hexadecimal format of the first three instructions from Figure 28 is shown in Figure 31.

100132B7
00037313
04830313
0062A023

Figure 31. Program code, in hexadecimal format, for the first four instructions in the program in Figure 18.

5.6.4 Extending our Computer

5.6.5 Running the program

We create a testbench.

We put prints in the testbench, so we can follow the execution of the program.

We put some extra nops (all zeros) after the two instructions.

Here you can see the result.

5.7 Branch if Equal

5.7.1 Instruction Format

In Section 2.5 of ISA, we find a description of the instruction formats for Control Transfer Instructions.

The instruction format for conditional branch instructions has

Branch instructions compare registers rs1 and rs2. The BEQ instruction, which is used in the program in Figure 18, branches if the registers rs1 and rs2 are equal.

The 12-bit immediate value imm[12:1] encodes the branch offset, as a signed multiple of 2 bytes.

5.7.2 Instruction Code

In Chapter 24 in ISA, in Table 24.2, we see that the opcode for BEQ is 1100011.

We also see that the value of the funct3 field, for BEQ, is 000.

We can write the BEQ instruction, with the fields as described above, as a 32-bit binary word. The resulting format is illustrated in Figure 32, as

imm[12] (31), imm[10:5] (30:25), rs2(24:20), rs1(19:15), funct3(14:12)=000, imm[4:1] (11:8), imm[11] (7), opcode(6:0) = 1100011

Figure 32. Instruction format for the BEQ instruction, adapted from Table 24.2 in ISA.

5.7.3 A Program

As an example program that uses the BEQ instruction, we can use the first four instructions, and the last instruction, in the program in Figure 18.

This results in a program, as

    lui t0, 0x10013
    andi t1, t1, 0
    addi t1, t1, 72
    sw t1, 0(t0)
finish:
    beq t1, t1, finish

The first four instructions in this program are translated to binary in Section TBD, and shown in hexadecimal format in Figure 31.

A translation to binary of the fifth instruction

    beq t1, t1, finish

can be done, using the instruction format in Figure 32 and the number for t1, which according to Table 25.1 in (ref) is 6.

The result is written in the format of Figure 32 as

0 000000 00110 00110 000 0000 0 1100011

Grouping the binary digits into groups of four binary digits each, give

0000 0000 0110 0011 0000 0000 0110 0011

which can be written in hexadecimal format, as

00630063

The complete program, in hexadecimal format, using the hexadecimal format of the first four instructions from Figure 31 is shown in Figure 33.

100132B7
00037313
04830313
0062A023
00630063

Figure 33. Program code, in hexadecimal format, for the first five instructions in the program in Figure 18.

5.7.4 Extending our Computer

5.7.5 Running the program

We create a testbench.

We put prints in the testbench, so we can follow the execution of the program.

We put some extra nops (all zeros) after the two instructions.

Here you can see the result.

5.8 Running the complete program

5.8.1 A hand-written version

5.8.2 Using the RISC-V toolchain

6 Hello C World

6.1 The Program

6.2 Tools

6.3 Testing in QEMU

6.4 Extending our Computer

6.5 Running the Program

7 References

[ISA], The RISC-V Instruction Set Manual Volume I: Unprivileged ISA, available at this RISC-V ISA Specification page