Session 5 - RTL Design & Verification

Write Verilog for your project's core module (aim for 10-30 lines to start)
Integrate with any provided library modules (e.g., debounce, UART, PWM) create a top-level wrapper
Simulate with a testbench and examine waveforms in GTKWave(I used Surfer)
Run linter (verilator --lint-only) and fix any warnings

Rewiring and Mental Mapping

I started off spending quite a bit of time trying to figure out where to place Verilog within all the new concepts we have learned in the last session. I had just started to make sense of SPICE and how the netlist relates to the standard cells.

After a while, I realized I wasn’t really making progress with the mapping, so I decided to move on and experiment with some Verilog examples instead.

I ended up making this table, which helped me build a overview.

HDL	Domain	Simulation Tool	Abstraction Level	Typical Path to Silicon
Verilog (RTL)	Digital	Icarus Verilog / Verilator	Register-transfer logic	RTL → Synthesis → Standard Cells → Place & Route → GDS → Silicon
Verilog-A	Analog	ngspice (via OpenVAF)	Continuous-time analog	Verilog-A → SPICE model → Circuit design → Layout → Silicon
Verilog-AMS	Mixed-signal	Mixed-signal simulation	Combined digital + analog	Digital path + Analog path merged → Layout → Silicon

Verilog exsamples and recap

To start with I found this example of a flip-flop test.

As I understand it, when working with Verilog you usually need two files:

One file is the module itself (.v)
The other is the testbench (_tb.v) where you write the instructions on how to test the module

You then run this through Icarus Verilog (iverilog), which only compiles it into a .vvp file. That file is not the waveform itself, it is the compiled simulation.

To actually run the simulation you use vvp, which stands for Verilog Virtual Processor. It is the runtime engine that executes the compiled .vvp file.

From there you can, for example, let vvp generate a .vcd file that you then open in a waveform viewer.

Here is the code I tried.

Flip-flop codeflip-flop test bench

// how the module is connected
module D_flipflop (
  input clk, rst_n,
  input d,
  output reg q
  );
 // how should the module behave 
  always@(posedge clk or negedge rst_n) begin
    if(!rst_n) q <= 0;
    else       q <= d;
  end
endmodule

// how to conect it on the testbench
module tb;
  reg clk, rst_n;
  reg d;
  wire q;
  D_flipflop dff(clk, rst_n, d, q);
  always #2 clk = ~clk;
  initial begin
    clk = 0; rst_n = 0;
    d = 0;
 // what sould be done   
    #3 rst_n = 1;
    repeat(6) begin
      d = $urandom_range(0, 1);
      #3;
    end
    rst_n = 0; #3;
    rst_n = 1;
    repeat(6) begin
      d = $urandom_range(0, 1);
      #3;
    end
    $finish;
  end

initial begin
    $dumpfile("dump.vcd"); // instructions to make the waveform file
  $dumpvars(0, tb);     // dump everything under tb
end
endmodule

Behavioral style

This code is written in behavioral style.

There are other ways to describe hardware in Verilog as well:

Behavioral - like this one, using always
Dataflow - using assign
Structural - instantiating gates or other modules)

In this case I am not describing transistors or gates, I am just describing how the flip-flop behaves on a clock edge.

Looking at the waveform

When I looked at the waveform I realized something important. It was only simulating the logical behavior: zeros and ones. No mater how close you zoom in there is no rise time, no fall time, no real propagation delay like we saw when using Ngspice. The transitions are ideal. They just happen.

zoom in

That made things clearer for me. I started to really see the difference between Verilog and SPICE.

In Verilog, we prove that the logic works.
In SPICE, we prove that the electrical behavior works.

It makes sense now that when using Verilog-A, you need to generate a netlist and run it through SPICE if you want to see analog signals. That is not needed in pure digital Verilog, because it only works with logical values.

surfer

I also discovered this great waveform viewer called Surfer. It is available both as a local install and directly in the browser.

Verilog and SPICE the difference

So this is how I currently understand it:

SPICE proves the electrical behavior with real transistor-level models. It is slower, but closer to reality.
Verilog (digital) proves the logic or counter function quickly. It is fast and clean, but ideal.

SPICE solves continuous electrical equations over time. Verilog reacts to events like clock edges.

That difference is becoming much clearer now.

I did a few more examples and exercises from Chipverify, vlsiverify and asic-world that helped me getting more comfortable with the syntax.

Writing Verilog

Writing a Test Bench

I had found some code earlier that was missing a test bench, and that was a good push to start doing something on my own. I decided to take it on and write a test bench from a blank canvas, since I was starting to understand what the purpose of it was. I had the earlier flip-flop code from before as a reference, so I began coding it line by line.

I decided to challenge myself and do this with no internet, just the blank canvas, the reference code and Beethoven.

blank canvas

Here is the test bench I came up with:

Counter Test BenchCounter .v

// Note: this code has some flaws and probably does not follow best practices,
// but it works and was written as a learning exercise.
module tb;
  reg clk, rstn;
  reg up_down;
  wire out;

  ctr dff(up_down, clk, rstn, out);

  always #2 clk = ~clk;
  initial begin
    clk =0; rstn = 0;
    up_down=0;
    #3 rstn = 1;

    repeat (15) begin
      up_down = 1;
      #3;
    end
    rstn = 0; #3;
    rstn = 1;
    repeat (7) begin
      up_down = 0;
      #3;
    end
    $finish;
  end

initial begin
  $dumpfile("count.vcd");
  $dumpvars(0,tb);
end
endmodule

module ctr (
  input up_down,    clk, rstn,
 output reg [2:0]   out);

    always @ (posedge clk)
        if (!rstn)
            out <= 0;
        else begin
            if (up_down)
                out <= out + 1;
            else
                out <= out - 1;
        end
endmodule

After I was done I ran it through iverilog, ending in a statement "I give up". I appreciate comical moments like that, and it also turned out I had just forgotten a couple of semicolons ;. Nothing new, and actually Verilog feels a lot like C and Lua.

error

After fixing that I ran it again. This time it didn’t give up, but it did give some lines of warning and it looked like things hadn’t gone completely smoothly:

up_dw_tb.v:6: warning: Port 4 (out) of module ctr expects 3 bit(s), given 1.
up_dw_tb.v:6:        : Padding 2 high bits of the port.

I did what every person would do. I ignored it and moved on. Next I ran vvp sim.vvp.

VCD info: dumpfile count.vcd opened for output.
up_dw_tb.v:25: $finish called at 72 (1s)

That all looked normal and I now had a .vcd waveform file. I opened that in the Surfer waveform viewer. It didn’t behave exactly like I had intended in the code, but I could clearly see what each section of the test bench was doing in the waveform viewer.

I’ll go over some of the things I learned after connecting back to the internet below the photo.

wave form

What I learned - tb

After looking at the waveform and reading the warnings (and eventually looking them up), it turned out the simulator had already told me what was wrong.

up_dw_tb.v:6: warning: Port 4 (out) of module ctr expects 3 bit(s), given 1.
up_dw_tb.v:6:        : Padding 2 high bits of the port.

The counter module defines the output as a 3-bit signal, but I intended it to be a 4-bit signal:

out[2:0]

In my test bench I had declared it as:

wire out;

which is only 1 bit. The simulator therefore padded the missing bits automatically. That explains the warning and also why the waveform behaved slightly differently than I had expected.

Since I originally thought this was a 4-bit counter, it should have been declared as:

wire [3:0] out;

But that would also require changing the width in the ctr module itself.

Another thing I noticed while looking at the waveform was that the counter never went above 7. At first I thought something in the repeat blocks might be wrong. I intended it to count up to 15 (F) and down again, but it turned out to be much simpler.

A 3-bit counter can only represent numbers from 0 to 7:

000 – 111

So when the counter reaches 7 it simply wraps back to 0. That is why the test bench never showed values higher than that even though the repeat block was running more times.

Another thing I learned while writing the test bench was how the simulation timing works. The # symbol is used to create a delay in simulation time. For example:

always #2 clk = ~clk;

This means that every 2 time units the clock signal is inverted. The ~ operator flips the value, so 0 becomes 1 and 1 becomes 0. This produces a simple square-wave clock that drives the counter during the simulation.

Even though the test bench had a few flaws, it was a good exercise to write it from scratch. Being able to see the behavior clearly in the waveform viewer made it much easier to understand what the code was actually doing.

Writing Verilog and Test Bench

Now that I had gotten a better understanding of the Test Bench and had seen more Verilog code, I was ready to write some code myself. I decided to change the up/down counter into two modules, one that counts up and the other that counts down, and then make a third module that would be a basic ALU module.

I was not sure whether it was good practice to split things up into modules or try to keep everything inside one module. Not knowing better, I decided to just make three modules while I was getting familiar with Verilog and how modules work.

I was relatively fast typing up the code, and it definitely helped that I know some Lua and C. However, I had to remind myself that this was not embedded or software programming where things happen in series. Here things happen in parallel, which was mentioned a couple of times in the lecture.

I decided to just try this approach so I could move forward, even though the ALU ended up being a bit occupied with if/else statements, but no more than four at least. I then made a Test Bench that simply runs through some things that could happen.

Here is the code I wrote:

DUT alutest bench alu

// down counter
module ctr_down (
  input down_en,    clk, rstn,
 output reg [3:0]   out_d);

  always @ (posedge clk) begin
        if (!rstn)
            out_d <= 15;
        else begin
            if (down_en)
                out_d <= out_d - 1;
        end
  end
endmodule
// ALU that can add, multiply and AND
module alu (
  input [3:0]out_u,out_d,
  input [1:0] opcode,
 // output reg [2:0] status, // status option for later
  output reg [7:0]  result);

  always @(*) begin
        if (opcode == 0)
            result = out_d + out_u;
        else
             if (opcode == 1)
                result = out_d * out_u ;
         else
              if (opcode == 2)
                 result = out_d & out_u ;
         else result = 0;  
  end
endmodule

module tb;
reg clk, rstn,up_en,down_en;
reg [1:0] opcode;
wire [3:0] out_u;
wire [3:0] out_d;
wire [7:0] result; 
  ctr_up upc(up_en, clk, rstn, out_u);
  ctr_down dwc(down_en, clk, rstn, out_d);
  alu alu(out_u, out_d, opcode, result);

  always #5 clk = ~clk;
  initial begin
    clk =0; rstn = 0;
    up_en =0; down_en =0;
    opcode =0;

    #10 rstn = 1;
    up_en = 1;
    down_en = 1;

    repeat (20) begin //sum test
      @(posedge clk);      
    end
    rstn = 0; #10;
    rstn = 1;
    opcode =1;
    repeat (20) begin //multip test
      @(posedge clk);     
    end
    rstn = 0; #10;
    rstn = 1;
    opcode = 2;
    repeat (20) begin //add test
      @(posedge clk);       
    end
    rstn = 0; #10;
    rstn = 1;
    opcode = 2;
    repeat (10) begin //opcode 2 test result 0
      @(posedge clk); 
    end
    opcode = 1;
    down_en = 0;
    repeat (15) begin // multiply by 5, 15 times
      @(posedge clk); 
    end
    $finish;
  end
initial begin
  $dumpfile("alu.vcd");
  $dumpvars(0,tb);
end
endmodule

The design consists of three modules. The first module ctr_up is a simple 4-bit counter that increments on every clock edge when up_en is active. The second module ctr_down works similarly but decrements the counter when down_en is active and starts from 15 after reset. Both counters operate synchronously with the clock.

The third module is a small combinational ALU that takes the outputs of the two counters as inputs. Depending on the value of the opcode, the ALU performs different operations on the two values. In this version it can add, multiply, or perform a bitwise AND. The ALU is implemented as combinational logic using always @(*), meaning the result updates immediately when the inputs change.

Basic ALU test

Next I compiled the design and testbench code iverilog -g2012 -o sim.vvp up_dw_alu.v up_dw_alu_tb.v. The compiler returned directly to the next terminal prompt with no messages, which is a good sign. Then I ran the runtime engine vvp sim.vvp and got a message this time that looked good.

VCD info: dumpfile alu.vcd opened for output.
up_dw_alu_tb.v:53: $finish called at 855 (1s)

I then started Surfer to inspect the waveform and it looked pretty close to the image I had in mind and what I wanted to see.

aluow

I immediately switched the view to binary. Who can calculate in hex anyway. In the part where it is doing the Sum operation it was fun to see the result from the Sum and AND operations. In the Sum-op you just get 0000.1111 because I started the up counter at 0 and the down counter at F, so their values mirror each other. In the AND-op you just get 0000.0000.

alusum

Sum-op Bitwise AND-op

The multiplication part also looked nice and I could see how the results stayed the same when certain bits crossed. mult alu

Multiplication-op

After that I started noticing things that I didn’t intend to happen. I meant to test operation 3 (11 bin) but accidentally set it back to 2 (10 bin`), which explains the restart toward the end that I initially didn’t understand when the value suddenly went back to 0.

I had also intended the last operation to disable the down counter at 5 (101 bin) and run the multiply operation, but it stopped at 9 (1001). At that point I realized what had not worked as intended and looking into it I understood what needed to be fixed, Later I realized that if I actually wanted the counter to stop at a specific value, I could use the wait statement in the testbench. Instead of disabling the counter at an arbitrary moment, I could simply wait until the counter reached the value I wanted and then disable it.

For example, if I wanted the down counter to stop at 5, it could be done like this:

wait (out_d == 4'd5);
down_en = 0;

This way the testbench waits until the counter reaches 5, and only then disables down_en, freezing the counter at that value. I didn’t implement this in the current test, but it was a useful thing to learn while working through the simulation. It took a while to find a modern lookning cheet sheet for verilog but I evatualy found one:Verilog Cheatsheet Prepared by: Garima jangid

Top-level wrapper

It took me a while to wrap my head around what a top-level wrapper is and what its job is. Once I spent some time reading and looking at examples I understood that it mainly connects internal modules and exposes the system inputs and outputs, and the concept started to make sense.

topw

With my new ALU design, which might still take some time before I can call it a computer the next question was which library module I should use and how to implement it. Adding a button and using a debounce module does not really make sense for an ALU, but then I realized it might be useful to be able to transmit some data. The only data I currently have at the moment is the result.

UART

I decided to use the UART library, which I found in /foss/designs/examples/lib/uart_tx.v. I took a look a the code you can find it here it was very clean and nice to read trough with lot's of information, I then looked back at my code and some other examples of UART usage, and found this example on EDA Playground.

In the fortune_teller.v example code I saw this in line 318:

    uart_tx #(
        .CLK_FREQ(CLK_FREQ),
        .BAUD(BAUD)
    ) uart_inst (
        .clk(clk),
        .rst_n(rst_n),
        .data(current_char),   // Character to send
        .valid(send_valid),    // Start sending when HIGH
        .ready(uart_ready),    // UART tells us when it's ready
        .tx(tx)                // Serial output
    );

I then updated my code. It was getting longer now, but I basically just added this new module on top and hoped for the best:

module alu_top (
  input        clk, rstn,up_en, down_en,
  input  [1:0] opcode,
  input        tx_valid, 
  output       tx_ready,   
  output       tx);

  wire [3:0] out_u;
  wire [3:0] out_d;
  wire [7:0] result;

  // moduels
  ctr_up   u_up   (.up_en(up_en),     .clk(clk), .rstn(rstn), .out_u(out_u));
  ctr_down u_down (.down_en(down_en), .clk(clk), .rstn(rstn), .out_d(out_d));
  alu      u_alu  (.out_u(out_u), .out_d(out_d), .opcode(opcode), .result(result));

  //  library module
  uart_tx #(
    .CLK_FREQ(1_000_000), 
    .BAUD(100_000)      
  ) u_uart (
    .clk   (clk),
    .rst_n (rstn),    
    .data  (result),
    .valid (tx_valid),
    .ready (tx_ready),
    .tx    (tx)
  );
endmodule

This should make it a top wrapper with the ability to transmit some data. Next I made a new test bench and changed the setup a little. I compiled the code this time with the library:

iverilog -g2012 -o sim.vvp \
  /foss/designs/examples/lib/uart_tx.v \
  up_dw_alu_uart.v up_dw_alu_uart_tb.v

Simulation

I got no errors. Could this really work? I ran vvp sim.vvp and got the same nice message from VVP, and I had a waveform file which was exciting. I opened the waveform viewer and now I could see tx_ready, tx_valid, and tx.

uart ow

When tx_valid makes a pulse it signals tx to transmit the value. It is important to know that tx sends in reverse order, so LSB first. I could see it sending three bytes 0000.1111, 0000.1110, and then 0000.0000. It matched with what the state of the result was, so things were working.

byte send

Running Linter

Linting is basicly a tool that runs over your code and let's you know if good practices are being used. Not knowing best practies in verilog and even if I know good practises in general when it comes to coding they tend can get mixed up when things don't work. It good practice how ever to cleen up after your self so I ran verilator --lint-only \ /foss/designs/examples/lib/uart_tx.v \up_dw_alu_uart.v

And like expected I got some warnings:

   %Warning-WIDTHEXPAND:    77 |       | %Wa    %Warning-WIDTHEXPAND:    83 |     %Error:

href="#__codelineno-19-1"> 87 | wire tick = (clk_ctr == CLKS_PER_BIT - 1); | ^~ ... For warning description see https://verilator.org/warn/WIDTHEXPAND?v=5.044 ... Use "/* verilator lint_off WIDTHEXPAND */" and lint_on around source to disable this message. class="w"> up_dw_alu_uart.v:77:19: Operator ADD expects 8 bits on the LHS, but LHS's VARREF 'out_d' generates 4 bits. : ... note: In instance 'alu_top.u_alu' result = out_d + out_u; ^ rning-WIDTHEXPAND: up_dw_alu_uart.v:77:19: Operator ADD expects 8 bits on the RHS, but RHS's VARREF 'out_u' generates 4 bits. : ... note: In instance 'alu_top.u_alu' 77 | result = out_d + out_u; | ^ class="w"> up_dw_alu_uart.v:83:24: Operator AND expects 8 bits on the LHS, but LHS's VARREF 'out_d' generates 4 bits. : ... note: In instance 'alu_top.u_alu' result = out_d & out_u ; | ^ >%Warning-WIDTHEXPAND: up_dw_alu_uart.v:83:24: Operator AND expects 8 bits on the RHS, but RHS's VARREF 'out_u' generates 4 bits. : ... note: In instance 'alu_top.u_alu' 83 | result = out_d & out_u ; | ^ class="w"> Exiting due to 5 warning(s)

I looked up the error and it turned out it was because I was mixing 4-bit signals into an 8-bit result. Verilator warned about this because the widths were being expanded implicitly.

Fixing

I fixed this by explicitly extending the signals to 8 bits before performing the operation.

result = out_d + out_u;                    // bad practice
result = {4'b0000, out_d} + {4'b0000, out_u}; // good practice

This makes the width conversion explicit and removes the warning from the linter.

The first warning comes from uart_tx.v, which is part of the library module.

Converting RTL Verilog to Gate-Level Logic

I wanted to try converting a behavioral RTL Verilog design into gate-level Verilog. At first I thought there might be an online tool for it, but then I found out that Yosys can do this.

To try it out I made a simple adder module. I then opened yosys and ran these three commands:

read_verilog adder.v
synth
write_verilog adder_gate.v

Yosys generated a gate-level version of the design without any problems. I didn’t have time to simulate or verify the generated netlist, but the conversion itself was very straightforward.

module adder(
    input  [3:0] a,
    input  [3:0] b,
    output [3:0] sum
);

assign sum = a + b;

endmodule

/* Generated by Yosys 0.62 (git sha1 7326bb7d6, g++ 13.3.0-6ubuntu2~24.04 -fPIC -O3) */

(* src = "adder.v:1.1-9.10" *)
module adder(a, b, sum);
  (* src = "adder.v:2.18-2.19" *)
  input [3:0] a;
  wire [3:0] a;
  (* src = "adder.v:3.18-3.19" *)
  input [3:0] b;
  wire [3:0] b;
  (* src = "adder.v:4.18-4.21" *)
  output [3:0] sum;
  wire [3:0] sum;
  wire _00_;
  wire _01_;
  wire _02_;
  wire _03_;
  wire _04_;
  wire _05_;
  wire _06_;
  wire _07_;
  wire _08_;
  wire _09_;
  assign sum[0] = b[0] ^ a[0];
  assign _00_ = b[1] ^ a[1];
  assign _01_ = ~(b[0] & a[0]);
  assign sum[1] = ~(_01_ ^ _00_);
  assign _02_ = b[2] ^ a[2];
  assign _03_ = ~(b[1] & a[1]);
  assign _04_ = _00_ & ~(_01_);
  assign _05_ = _03_ & ~(_04_);
  assign sum[2] = ~(_05_ ^ _02_);
  assign _06_ = ~(b[3] ^ a[3]);
  assign _07_ = ~(b[2] & a[2]);
  assign _08_ = _02_ & ~(_05_);
  assign _09_ = _07_ & ~(_08_);
  assign sum[3] = _09_ ^ _06_;
endmodule