Lab 4: TinyRV1 Processor
Part B: TinyRV1 Processor
Lab 4 will give you experience designing, implementing, testing, and prototyping a single-cycle processor microarchitecture and a specialized accelerator. The processor will implement the TinyRV1 instruction set. The instruction set manual is located here:
The lab will continue to leverage concepts from Topic 2: Combinational Logic, Topic 3: Boolean Algebra, Topic 4: Combinational Building Blocks, Topic 6: Sequential Logic, Topic 7: Finite-State Machines, and Topic 8: Sequential Building Blocks. The lab will also leverage concepts from Topic 9: Instruction Set Architecture and Topic 10: Single-Cycle Processors The lab will continue to provide opportunities to leverage the three key abstraction principles: modularity, hierarchy, and regularity.
The lab includes seven parts:
-
Part A: Processor Components
- Due 11/6 @ 11:59pm via GitHub
- Students should work on Part A before, during, and after your assigned lab section during the week of 11/3
- Pre-lab survey on Canvas is (roughly) due by end of lab section during the week of 11/3
-
Part B: TinyRV1 Processor
- Due 11/13 @ 11:59pm via GitHub
- Students should work on Part B before, during, and after your assigned lab section during the week of 11/10
-
Part C: Accumulator Accelerator
- Due 11/24 @ 11:59pm via GitHub
- Students should plan to submit Part C before they leave for
- Thanksgiving Break
-
Part D: FPGA Prototype v1
- Due week of 11/17 during assigned lab section
- This part will focus on prototyping the code developed in Part A+B
- Even though completed with a partner, every student must turn in their own paper check-off sheet in their lab section!
-
Part E: FPGA Prototype v2
- Due week of 12/1 during assigned lab section
- This part will focus on prototyping the code developed in Part A+B+C
- Even though completed with a partner, every student must turn in their own paper check-off sheet in their lab section!
-
Part F: TinyRV1 Assembly
- Due 12/4 @ 11:59pm via GitHub
- This part will include all of the assembly developed during Part D+E
-
Part G: Report
- Due on 12/8 at 11:59pm for all groups!
- Post-lab survey on Canvas is due at the same time as the report
All parts of Lab 4 must be done with a partner. You can confirm your partner on Canvas (Click on People, then Groups, then search for your name to find your lab group).
Both students must contribute to all parts!
It is not acceptable for one student to exclusively work on the code while the other student exclusively works on the report. It is not acceptable for one student to exclusively work on hardware design while the other student exclusively works on testing. Both students must contribute to all parts. Student understanding of Verilog design and testing will be assessed on the prelim exams, final exam, and Verilog coding exam. The instructors will also survey the Git commit log on GitHub to confirm that both students are contributing equally. If you are using pair programming, then both students must take turns using their own account so both students have representative Git commits. Students should create commits after finishing each step of the lab, so their contribution is clear in the Git commit log. A student's whose contribution is limited as represented by the Git commit log will receive a significant deduction to their lab score.
This handout assumes that you have read and understand the course tutorials and that you have attended the discussion sections. This handout assumed you have successfully completed Part A. You should have already cloned your individual remote repository, so use git pull to ensure you have any recent updates before working on your lab assignment.
where XX should be replaced with your group number.
The following table shows all of the hardware modules you will be developing in Lab 4.

1. Interface and Implementation Specification
A complete TinyRV1 embedded system includes the following key components: (1) the TinyRV1 processor itself; (2) 512 bytes of physical memory for storing instructions and data; (3) an SPI unit to enable a host workstation to load assembly programs into physical memory; (4) various external I/O devices (e.g., switches, sevent-segment displays, distance sensor, multi-note player); and (5) a memory bus which interconnects the processor, physical memory, SPI, and external I/O devices.

We will provide you the physical memory module and you have already implemented various external I/O devices in previous labs. In Part B, you will be using the datapath components from Part A along with components from previous labs to implement the TinyRV1 processor, memory bus, and SPI.
1.1. TinyRV1 Single-Cycle Processor
The single-cycle TinyRV1 processor has the following interface:
module ProcScycle
(
(* keep=1 *) input logic clk,
(* keep=1 *) input logic rst,
// Memory Interfaces
(* keep=1 *) output logic imem_val,
(* keep=1 *) input logic imem_wait,
(* keep=1 *) output logic [31:0] imem_addr,
(* keep=1 *) input logic [31:0] imem_rdata,
(* keep=1 *) output logic dmem_val,
(* keep=1 *) input logic dmem_wait,
(* keep=1 *) output logic dmem_type,
(* keep=1 *) output logic [31:0] dmem_addr,
(* keep=1 *) output logic [31:0] dmem_wdata,
(* keep=1 *) input logic [31:0] dmem_rdata,
// Trace Interface
(* keep=1 *) output logic trace_val,
(* keep=1 *) output logic [31:0] trace_addr,
(* keep=1 *) output logic trace_wen,
(* keep=1 *) output logic [4:0] trace_wreg,
(* keep=1 *) output logic [31:0] trace_wdata
);
Memory Interfaces
The instruction memory interface is used to read instructions similar
to how we read notes in Lab 3. To read an instruction set the imem_val
output port high and the imem_addr output port to the desired
instruction address; the instruction will be returned via the
imem_rdata input port combinationally (i.e., in the same cycle). The
data memory interface enables load and store instructions to read and
write memory. It is similar to the instruction memory interface except
now we have an additional dmem_type output port which specifies whether
we want to read memory (i.e., dmem_type is zero) or write memory (i.e.,
dmem_type is one). We also need the dmem_wdata output port for the
write data. Finally, both the instruction and data memory interfaces
include a wait input which enables the memory to tell the processor to
wait and try the memory request again on the next cycle.
Trace Interface
The trace interface is used for verification and should produce a "trace" of all instructions executed by the processor. This interface was not discussed in lecture, but is an important practical aspect of implementing a real processor. We need a simple way to check that each instruction has executed correctly.
Whenever the processor executes an instruction it should set the
trace_val output high. It should also set the trace_addr output port
to the address of the executed instruction. If the instruction writes the
register file, it should set trace_wen to one, trace_wreg to the
corresponding write destination register, and trace_data to the data
written to the register file. If the instruction writes x0 then
trace_wen should be one, trace_wreg should be zero, and trace_data
is undefined. We will be able to write test cases that check the trace
interface to verify the processor is executing a given assembly program
correctly.
Initial Implementation
The TinyRV1 single-cycle processor implementation will be decomposed into
a datapath and a control unit. The datapath must be implemented
structurally meaning we should only instantiate other components without
using any always blocks or combinational logic. The control unit will be
implemented using a single casez in an always_comb block along with
assign statements. We give you an initial processor implementation that
supports the ADDI and ADD instruction but does not support the memory
wait signals. The datapath for this initial processor is shown below.

The control signal table for this initial processor is shown below.
// Localparams for op2_sel control signal
localparam rf = 1'd0;
localparam imm = 1'd1;
// Task for setting control signals
task automatic cs
(
input logic op2_sel_
);
op2_sel = op2_sel_;
endtask
// Control signal table
always_comb begin
casez ( inst )
// op2
// sel
`TINYRV1_INST_ADDI: cs( imm );
`TINYRV1_INST_ADD: cs( rf );
default: cs( 'x );
endcase
end
A control signal table has a column for every control signal and a row
for every instruction. Since the initial implementation only supports two
instructions, the initial control signal table only has two rows. The
first row has all of the control signals required to execute the ADDI
instruction, and the second row has all of the control signals required
to execute the ADD instruction. We are using a casez because we want to
support don't cares in the instruction decoding (i.e., see
TINYRV1_INST_ADDI in tinyrv1.v). In this case, we only need one
column since we currently only have a single control signal: op2_sel
which controls the mux for operand 2 of the ALU. Notice how we are using
a SystemVerilog task to set the control signals. This will make it easy
to add more columns in a clean way. Also notice how we have created
localparams (i.e., rf, imm) for the two values of the mux select.
While not required, these kind of localparams can make your control
signal table more readable.
The following code will connect the initial control unit up to the initial datapath.
// Control Signals (Control Unit -> Datapath)
logic op2_sel;
// Status Signals (Datapath -> Control Unit)
logic [31:0] inst;
// Instantiate/Connect Datapath
ProcSimpleDpath dpath
(
.clk (clk),
.rst (rst),
.imem_addr (imem_addr),
.imem_rdata (imem_rdata),
.dmem_addr (dmem_addr),
.dmem_wdata (dmem_wdata),
.dmem_rdata (dmem_rdata),
.trace_addr (trace_addr),
.trace_wen (trace_wen),
.trace_wreg (trace_wreg),
.trace_wdata (trace_wdata),
.op2_sel (op2_sel),
.inst (inst)
);
// Instantiate/Connect Datapath
ProcSimpleDpath dpath
(
.rst (rst),
.imem_val (imem_val),
.imem_wait (imem_wait),
.dmem_val (dmem_val ),
.dmem_wait (dmem_wait),
.dmem_type (dmem_type),
.trace_val (trace_val),
.op2_sel (op2_sel),
.inst (inst)
);
There are two things to notice: (1) connecting all of these ports is
quite tedious; and (2) every port connects to a signal with the exact
same name. Take a quick look at the code in ProcScycle which actually
connects the datapath to the control unit.
// Control Signals (Control Unit -> Datapath)
logic op2_sel;
// Status Signals (Datapath -> Control Unit)
logic [31:0] inst;
// Insantiate/Connect Datapath and Control Unit
ProcSimpleDpath dpath
(
.*
);
ProcSimpleCtrl ctrl
(
.*
);
Notice how much more compact this is! We are using the SystemVerilog .*
operator which connects every port to a signal with the exact same name.
This way whenever we want to add a new control or status signal all we
need to do is declare a signal with the right name and we are done.
This initial implementation does not handle the memory wait signals and indeed we recommend waiting until all eight instructions are fully working without memory waits before incorporating this support into your processor.
Final Implementation
The final datapath is shown below.

This is the same as in lecture except with the trace interface.
The final control signal table will have eight columns and eight rows just like in lecture. This is an example of what the first two rows might look like, but keep in mind that your implementation might look different depending on how you implement your processor.
task automatic cs
(
input logic [1:0] pc_sel_,
input logic [1:0] imm_type_,
input logic op2_sel_,
input logic alu_func_,
input logic [1:0] wb_sel_,
input logic rf_wen_pre_,
input logic dmem_val_pre_,
input logic dmem_type_
);
pc_sel = pc_sel_;
imm_type = imm_type_;
op2_sel = op2_sel_;
alu_func = alu_func_;
wb_sel = wb_sel_;
rf_wen_pre = rf_wen_pre_;
dmem_val_pre = dmem_val_pre_;
dmem_type = dmem_type_;
endtask
// Control signal table
always_comb begin
casez ( inst )
// pc imm op2 alu wb rf dmem dmem
// sel type sel func sel wen val type
`TINYRV1_INST_ADDI: cs( pc_plus4, I, imm, add, alu, 1, 0, 'x );
`TINYRV1_INST_ADD: cs( pc_plus4, 'x, rf, add, alu, 1, 0, 'x );
...
default: cs( 'x, 'x, 'x, 'x, 'x, 'x, 'x, 'x );
endcase
end
Again, we are using a SystemVerilog task to enable cleanly setting eight
signals in one line of the casez, and we are using localparams for the
various entries in the control signal table (e.g., I, imm, rf,
add, alu, etc). While not required, these localparams can improved
the readability of your control signal table.
Integrating the Memory Wait Signals
Once you have all eight instructions thoroughly tested without any memory
waits, you can try to incorporate the imem_wait and dmem_wait
signals. The first step is to add a new pc_en control signal that is
connected to the enable input of the PC. Then make sure the PC is not
enabled if we are waiting on memory.
We also need to make sure we do not write garabage data to the register
file if we are waiting on either the instruction memory or the data
memory. This is why the control signal table writes a "preliminary"
version of the rf_wen control signal; we use the _pre suffix to
indicate it is a preliminary version. We will need to add some additional
logic (in an assign statement after the always_comb) to incorporate
the memory wait signals. If either imem_wait or dmem_wait are one we
need to ensure that rf_wen is zero so we do not accidently write
garbage to a register in the register file; if both imem_wait and
dmem_wait are zero then rf_wen is the same as the preliminary
version. So for example, the additional combinational logic for rf_wen
to factor in the memory wait signals might look like this:
So rf_wen is one if: we are not in reset and we are not waiting on
the instruction memory and we are not waiting on the data memory and
the current instruction really does want to write the register file.
Finally, we need to make sure we do not write garbage data to memory if
we are waiting on the instruction memory. Again, this is why the control
signal table writes a "preliminary" version of dmem_val. We can then
combine the imem_wait signal and dmem_val_pre to determine
dmem_val. Do not make dmem_val depend on dmem_wait since this
could cause a combinational loop!
Use an incremental design approach!
Implementing the processor is the culmination of many months of work on the lab assignments. It is a complex hardware module so we strongly discourage using a big bang design approach. Do not just start writing code, try to implement the entire datapath and control unit in one shot, and then see if you can get it to work. This approach can easily take hours and hours.
Your goal instead should be incrementally modify the control unit and datapath to add support for each of the remaining six instructions (i.e., MUL, LW, SW, JAL, JR, BNE) one at a time. Adding each instruction should involve the following seven steps:
- Step 1: Modify the datapath
- Step 2: Add ports to the dpath and ctrl for new control or status signals
- Step 3: Add new signals in
ProcScycleto connect the dpath and ctrl - Step 4: Add a column in the control signal table for each new control signal
- Step 5: Add a new row in the control signal table for the new instruction
- Step 6: Fill in all empty entries in the control signal table
- Step 7: Thoroughly test the instruction
Use this seven step process. Implement one instruction, thoroughly test that instruction, then move on to the next instruction. Once you have finished implementing all eight instructions and thoroughly tested each one, then you can work on adding support for the memory wait signals.
Let's illustrate this process by adding the MUL instruction. First, we need to modify the datapath to include a multiplier and a new write-back multiplexor which will choose whether the ALU or the multiplier writes the register file (Step 1). Here is the modified datapath.

Notice this means we have a new control signal wb_sel. So we need to
add a new one-bit output port to ProcScycleCtrl named wb_sel and a
new one-bit input port to ProcScycleDpath named wb_sel (Step 2). We
also need to add a wb_sel signal to ProcScycle to connect these two
ports together (Step 3) like this:
// Control Signals (Control Unit -> Datapath)
logic op2_sel;
logic wb_sel;
// Status Signals (Datapath -> Control Unit)
logic [31:0] inst;
// Insantiate/Connect Datapath and Control Unit
ProcSimpleDpath dpath
(
.*
);
ProcSimpleCtrl ctrl
(
.*
);
The .* makes these top-level connections very simple! In the control
unit, we need to add a new column to the control signal table (Step 4)
and a new row for the MUL instruction (Step 5) like this:
// Localparams for op2_sel control signal
localparam rf = 1'd0;
localparam imm = 1'd1;
// Localparams for wb_sel control signal
localparam alu = 1'd0;
localparam mul = 1'd1;
// Task for setting control signals
task automatic cs
(
input logic op2_sel_,
input logic wb_sel_,
);
op2_sel = op2_sel_;
wb_sel = wb_sel_;
endtask
// Control signal table
always_comb begin
casez ( inst )
// op2 wb
// sel sel
`TINYRV1_INST_ADDI: cs( imm, ??? );
`TINYRV1_INST_ADD: cs( rf, ??? );
`TINYRV1_INST_MUL: cs( ???, ??? );
default: cs( 'x, 'x );
endcase
end
Now we need to fill in the empty entries marked ??? with the
correct values (Step 6). We have created two new localparams
(alu,mul) to make it easier to understand the values in the
control signal table. Finally, we must thoroughly test this
instruction before moving on to the next instruction (Step 7).
1.3. Memory Bus
To be posted soon!
1.4. SPI
To be posted soon!
2. Testing Strategy
As always, we need a compelling strategy to ensure the correctness of each component: the TinyRV1 processor, the memory bus, and the SPI.
2.2. Testing the Processor
Testing the processor is more complex than testing individual hardware blocks. We have provided you some testing infrastructure to simplify the process, but students should still expect to dedicated significant time to verifying their processor correctly implements the TinyRV1 ISA.
We have provided you a functional-level FL processor model (also called
an instruction set simulator) located in test/ProcFL.v. The FL
processor model executes the instruction semantics behaviorally using
high-level Verilog. It is not meant to model hardware. The FL processor
model can be used to make sure your tests are correct before you run
those tests on your single-cycle processor.
The test cases for the processor is located in these test files:
test/Proc-addi-test-cases.vtest/Proc-add-test-cases.vtest/Proc-mul-test-cases.vtest/Proc-lw-test-cases.vtest/Proc-sw-test-cases.vtest/Proc-jal-test-cases.vtest/Proc-jr-test-cases.vtest/Proc-bne-test-cases.vtest/Proc-wait-test-cases.v
Each file should only test a single instruction with the final wait
test cases used to test if the processor can correctly handle memory
waits. Processor test cases look like this:
task test_case_1_basic();
t.test_case_begin( "test_case_1_basic" );
// Write assembly program into memory
asm( 'h000, "addi x1, x0, 2" );
asm( 'h004, "addi x2, x1, 2" );
// Check each executed instruction
// addr en reg data
check_trace( 'h000, 1, 5'd1, 32'h0000_0002 ); // addi x1, x0, 2
check_trace( 'h004, 1, 5'd2, 32'h0000_0004 ); // addi x2, x1, 2
t.test_case_end();
endtask
Every processor test case includes two parts.
-
asmtasks are used to write instructions into the memory. Theasmtask takes two arguments: the address for the instruction and an assembly instruction represented as a string. Theasmtask will take care of converting the assembly instruction into a machine instruction. Theasmtasks represent the static instruction sequence (i.e., what instructions are stored in memory before the processor starts executing). -
check_tracetasks are like thechecktasks you have seen elsewhere, butcheck_tracetasks will wait for thetrace_valsignal to be high before checking to see if thetrace_addrandtrace_dataoutputs from the processor match the desired values. Thecheck_tracetasks are used to check the dynamic instruction sequence (i.e., what instructions the processor actually executes at runtime).
The above basic test case for the ADDI instruction uses the trace to make sure the first ADDI instruction writes the value 2 to the register file and the second ADDI instruction writes the value 4 to the register file. When writing register X0, the trace data is undefined. We do not want to enforce that the register write data is zero when writing X0 since this would require special hardware to handle this case. Here is how you might test reading and writing register X0.
task test_case_2_regX0();
t.test_case_begin( "test_case_2_regX0" );
// Write assembly program into memory
asm( 'h000, "addi x1, x0, 0" );
asm( 'h004, "addi x0, x1, 0" );
// Check each executed instruction
// addr en reg data
check_trace( 'h000, 1, 5'd1, 32'h0000_0000 ); // addi x1, x0, 0
check_trace( 'h004, 0, 5'd0, 32'hxxxx_xxxx ); // addi x0, x1, 0
t.test_case_end();
endtask
Processor test cases for memory can include an additional part:
task test_case_1_basic();
t.test_case_begin( "test_case_1_basic" );
// Write assembly program into memory
asm( 'h000, "addi x1, x0, 0x100" );
asm( 'h004, "lw x2, 0(x1)" );
// Write data into memory
data( 'h100, 32'hdead_beef );
// Check each executed instruction
check_trace( 'h000, 'h0000_0100 ); // addi x1, x0, 0x100
check_trace( 'h004, 'hdead_beef ); // lw x2, 0(x1)
t.test_case_end();
endtask
In addition to the asm tasks and check_trace tasks, we can also use a
data task to write data into the memory. The above basic test case for
the LW instruction first uses an ADDI instruction to get the memory
address 0x100 into register x1. The test case then performs a LW
instruction to load the data from address 0x100 into register x2. The
check_trace tasks verify that the ADDI instruction correctly writes the
address to the register file, and that the LW instruction correctly loads
the value 0xdeadbeef from memory address 0x100.
The check_trace tasks become particularly important when testing
control flow instructions. The following test case is for the JAL
instruction:
task test_case_1_basic();
t.test_case_begin( "test_case_1_basic" );
// Write assembly program into memory
asm( 'h000, "addi x1, x0, 1" );
asm( 'h004, "jal x2, 0x00c" );
asm( 'h008, "addi x1, x0, 2" );
asm( 'h00c, "addi x1, x0, 3" );
// Check each executed instruction
// addr en reg data
check_trace( 'h000, 1, 5'd1, 32'h0000_0001 ); // addi x1, x0, 1
check_trace( 'h004, 1, 5'd2, 32'h0000_0008 ); // jal x2, 0x00c
check_trace( 'h00c, 1, 5'd1, 32'h0000_0003 ); // addi x1, x0, 3
t.test_case_end();
endtask
Here we can see the static instruction sequence includes four instructions, but the dynamic instruction sequence only includes three instructions because the JAL instruction jumps over the instruction at address 0x008. Note that in the assembly format used for testing our processor, the literal in a JAL and BNE instruction is the absolute address of the target not the actual immediate. The assembler will take care of creating the appropriate PC relative immediate.
You can run the test cases for the ADDI and ADD instructions on the FL processor model like this:
% cd ${HOME}/ece2300/groupXX/build
% make ProcFL-addi-test && ./ProcFL-addi-test
% make ProcFL-add-test && ./ProcFL-add-test
You will need to add more tests cases to the appropriate -test-cases.v
file. Do not simple have a single directed test case (i.e., a single
task); you must have many directed test cases (i.e., many tasks). Each
directed test case should focus on testing a different aspect of the
corresponding instruction. Remember to always make sure your tests pass
on the FL processor model before attempting to run those tests on your
single-cycle processor model!
2.2 Testing the Memory Bus and SPI
We provide tests for the MemoryBusAddrDecoder, MemoryBusDataMux, and
the MemoryBus itself. We also provide tests for the SPI. While you are
free to add more tests for these components it is not required.
3. Lab Code Submission
To submit your code you simply push your code to GitHub. You can push
your code as many times as you like before the deadline. Students are
responsible for going to the GitHub website for your repository, browsing
the source code, and confirming the code on GitHub is the code they want
to submit is on GitHub Be sure to verify your code is passing your tests
both on ecelinux and on GitHub Actions. Your design code will be
assessed both in terms of code quality, verification quality, and
functionality.
3.1. Code Quality
Your code quality score will be based on how well you follow the course coding conventions posted here:
3.2. Verification Quality
Verification quality is based on how well your testing enables making a compelling case for correctness. You will need to write compelling directed test cases, use reasonable randomg testing, and include a simple X-propgation test case. Use comments appropriately to describe your test cases.
3.3. Functionality
Your functionality score will be determined by running your code against a series of tests developed by the instructors to test its correctness. Note that we will be using the automated build system to test your final code submission as shown below.