Understand parallelism and concurrency needs for hardware modeling. Simulation semantics of Verilog. Drawing timing diagrams, and basic introduction to Modelsim.


To run the examples accompanying this lecture, please checkout the git repository

$ ssh <username>@eceubuntu.uwaterloo.ca
$ mkdir $HOME/ece327-lectures
$ cd $HOME/ece327-lectures
$ git clone gitlab@git.uwaterloo.ca:ece327-lectures-s19/02-sim.git
$ cd 02-sim/code

Ordering of Execution

When understanding RTL design, we need to first understand the simulation semantics of the language. Hardware simulators are fundamentally managing execution of parallel hardware components described in a sequential manner. Different always blocks will execute in parallel but still produce consistent results independent of which block executed first. In hardware, these components will be continuously in operation.

We first look at how a simple example with two always blocks will be processed by the simulator to give you consistent stable results. The order of selection of the always block does not matter due to the evaluation model of non-blocking statements.

$ cat example.v
    reg  a=1'bX;
    wire b=1'b1;

    always @(b) begin : thread_1
        a <= b + 1;

    always @(a) begin : thread_2
        c <= a + 1;

So in this example we have two blocks thread_1 and thread_2. The output of thread_1 is the signal a. This is an input to the second blocks thread_2. As discussed in the lecture, if they are allowed to execute in an ad-hoc manner, we will get some weird intermediate values of the signals.

If we simulate this code in Modelsim, and look at the output waveform, we will see the exact sequencing of assignments clearly.

$ vsim -do example.tcl

We can generate this waveform by first zooming to 0ps simulation time. Next, we can click on Expanded Time Events Mode. Finally, click on Expand Time at Active Cursor. This should generate the waveform shown below:


From the waveform we can discern the following:

  • 0ps Initially, all signals a, b, and c are in X (uninitialized) state.
  • 0ps+1 Next, we observe b acquires the value 1 due to the declaration wire b=1'b1; in the code. While both blocks executed, the RHS values of a and b were still X and hence a and c are unchanged, remaining at X.
  • 0ps+2 Now, based on the sensitivity list, only the b block will execute. Since a has value X, the value of c will stay at X and that block will not be executed at this point. As the value of b did change to 1 in the previous step, we see that a has gone to 0. This makes sense as 1+1 for 1b result is 0.
  • 0ps+3 Finally, only the a block will execute in this artificial timestep. a got the value 0 in the previous step, hence c will become 1. Value of b was unchanged, so the value of a will not change either and the b block stays untouched in this round.

In the lecture, we showed you a 32b example. In Modelsim, the initial assignment wire b=32'h0001 takes 32 initial deltas as the simulator assigns them one bit at a time. This is a mismatch that we have chosen to ignore for simplicity. For the exam, you can bundle multi-bit assignments into a single delta. The true Modelsim waveform is shown below and can be reproduced with example32.tcl script.


Effect of Input Changes

Always blocks in Verilog are executed by the simulator if the inputs on the sensitivity list are changed. If nothing changes, there is no need to recalculate outputs.

We can understand this with the poly_combi.v example used in the lecture. This code computes y1<=a*x*x in the first block, y2<=y1+b*x in a second block, and y<=y2+c in the third block.

$ cat poly_combi.v
    always @(a or x) begin: axx
        $display("y1<=axx fired");
        y1 <= a * x * x;

    always @(y1 or b or x) begin: bx
        $display("y2<=y1+bx fired");
        y2 <= y1 + b * x;

    always @(y2 or c) begin: plusc
        $display("y<=y2+c fired");
        y <= y2 + c;

An accompanying testbench is provided that isolates the effect of each input separately. We use an initial block and timed assignments to ensure updates are staggered at 2 ps intervals.

$ cat poly_combi_tb.v
    initial begin
        #2 a <= 1; b <= 1; c <= 1; x <= 10; $display("x changes");
        #2 a <= 1; b <= 1; c <= 10; x <= 10; $display("c changes");
        #2 a <= 1; b <= 10; c <= 10; x <= 10; $display("b changes");
        #2 a <= 10; b <= 10; c <= 10; x <= 10; $display("a changes");
        #2 $finish;

We can run the bundled poly_combi.tcl script to run Modelsim.

$ vsim -do poly_combi.tcl

We can inspect the timing waveforms again. Now, we can use the Expand Time Deltas Mode instead to avoid showing each update as the adder carry chain propagates through the circuit.

  • For c changing
  • For x changing
    For the waveform above, note the missing +4 transition. This likely an internal Modelsim artifact and can be ignored. For all intents and purposes, the final transition on y can be assumed to occur on +4 rather than +5.
  • For a changing

Delay Modeling

Verilog can also be used to model delays in your RTL design. Once implemented the gates and wires in your design will have physical delays. Verilog offers a rich abstraction for modeling various kinds of delays in your design. The two most important kinds are:

  1. Inertial Delay -- output will be delayed a specified period from input. Input pulses shorter than the specified period will be gobbled up (not transmitted to output).
  2. Transport Delay -- output will always be delayed a specified period from input irrespective of pulse width.

Let's look at two simple examples of inertial and transport delay in inertial.v and transport.v

$ cat inertial.v

Here we introduce inertial delay on the LHS of an assign statement.

  assign #2 b = a;
$ cat transport.v

Here, we introduce transport delay on the RHS of a non-blocking assignment.

  always @(a) begin
    b <= #2 a;

The above two styles are the two prescribed ways to get the exact delays needed to model hardware components. DO NOT mess up RHS and LHS, non-blocking and blocking combinations. History and grading will not be kind to offenders.

We can run a Modelsim simulation of the above two examples

$ vsim -do delay.tcl

It is also possible to provide a rich description of pin-to-pin delays as showed in the example p2p.v

$ cat p2p.v
  assign c = a ^ b; // xor

    (a => c) = (1);
    (b => c) = (2);

In this example, we use the specify block to supply timing parameters to input=>output path combinations. If a changes, c will change after 1ns. If b changes, the output c will only change after 2ns. Thus both inputs have different ways to influence the output transition.

We can simulate this to confirm this behavior

$ vsim -do p2p.tcl

Clocked Circuit Simulation

Finally, we will investigate the effect of clocking on the timing diagram. We use poly_reg.v and poly_reg_tb.v to generate this waveform. We use the testbench to produce a clk (clock) and rst (rest) signal. You will find this pattern in pretty much every other RTL design in the future. We must now use Registers (the R in RTL) in our design. These are storage elements that hold their values stable for a clock cycle even if inputs change. This is unlike the purely combinational blocks we have seen so far.

In poly_reg.v, the computation is performed in a single clock cycle. A programmer must do this with care to ensure that the clock period is large enough to complete the task assigned to that cycle. In this example, we must be able to compute a*x*x + b*x +c all within a single clock period. The rst if-then statement is used to ensure that the system starts off with a stable reset state for all registers. Also note that the signals are only updated on posedge (positive edge) of the clock. At that instant, the RHS values of the signals used in that always block are sampled (snapshotted) and then used for computations within the block. Also note that the rst here is a synchronous reset and will only take effect on a clock edge. If you want asynchronous resets, then you must modify the sensitivity list to (posedge clk or posedge rst). This practice is frowned upon for FPGA desig, but may be fine if you're building your own ASICs.

$ cat poly_reg.v

    always @(posedge clk) begin
        if(rst) begin
            y <= 0;
            a_r <= 0;
            b_r <= 0;
            c_r <= 0;
            x_r <= 0;
        end else begin
            a_r <= a;
            b_r <= b;
            c_r <= c;
            x_r <= x;
            y <= a_r * x_r * x_r + b_r * x_r + c_r;

If you observe carefully, most of the testbench is almost identical to the one used before in poly_combi_tb.v.

We are now generating clk and rst from the test bench. We generate clock using delayed assignment as shown below. The clock signal is declared initially with a starting value of 0 and the toggled every 1 ps to give us a period of 2 ps.

$ cat poly_reg_tb.v
    reg clk = 1'b0;
    always #1 clk <= !clk;

We can again simulate this code in Modelsim, and look at the output waveform.

$ vsim -do poly_reg.tcl

The waveform looks similar to earlier. However, there is now a clock signal that ensures outputs only change on posedge (positive edge).


Unlike the cascade of executions of always blocks from poly_combi.v there is only one timestep when using registers. See the figure below with expanded time deltas.. Previously, x changing cause many always blocks to execute and multiple times. Now, the value of x is first captured in x_r in the first cycle. And in the following cycle, we see the output y getting updated to 111. We have now introduced a synchronous discipline to the computation so that we only have to reason about things changing around clock edges. This model of computation is fundamental to RTL design and key to organizing logic in hardware.


Simulations with Feedback Loops

When you have combinational feedback loops in the design with non-blocking statements, the simulator will not converge. To understand why this happens, consider how we decompose the computation. If an LHS changes, we recompute all RHS values. At this point we are not assigning the newly computed values to LHS signals. This happens, after all RHS evaluations are completed. Next, we assign the newly computed RHS values to LHS. In a feedback loop, the newly updated LHS entries will trigger a new round of RHS evaluations. This in turn will update the LHS, and this cycle will continue.

Take a look at a ring oscillator example shown below:

$ cat ring_osc_bad.v
    reg a=1'b0; // initial value

    always @(a) begin: invert
        $display("Time=%0d, a=%0d", $time, a);
        a <= ~(a);

Note the use of non-blocking assignment to update the same LHS signal as the one that appears in the RHS.

Run a Verilog Modelsim simulation:

$ vsim -c -do ring_osc_bad.tcl

You will find that simulation time never advances and the simulator quits after an internal iteration limit is reached.

# Start time: 11:13:34 on Apr 26,2019
# Loading work.ring_osc_tb
# Loading work.ring_osc
# Time=0, a=0
# Time=0, a=1
# Time=0, a=0
# Time=0, a=1
# Time=0, a=0
# Time=0, a=1
# Time=0, a=1
# Time=0, a=0
# Time=0, a=1
# Time=0, a=0
# Time=0, a=1
# Time=0, a=0
# Time=0, a=1
# Time=0, a=0
# Time=0, a=1
# Time=0, a=0
# ** Error (suppressible): (vsim-3601) Iteration limit 5000 reached at time 0 ps.

How can we resolve this error? Ring oscillators are valid circuits we may want in our design!

$ cat ring_osc_good.v
    always @(a) begin: invert
        $display("Time=%0d, a=%0d", $time, a);
        a <= #1 ~(a);

We simply use a transport delayed assignment.

$ vsim -c -do ring_osc_good.tcl

This generates the proper oscillation trace with advancing physical time.

# Start time: 11:16:06 on Apr 26,2019
# Loading work.ring_osc_tb
# Loading work.ring_osc
# Time=0, a=0
# Time=1, a=1
# Time=2, a=0
# Time=3, a=1
# Time=4, a=0
# Time=5, a=1
# Time=6, a=0
# Time=7, a=1
# Time=8, a=0
# Time=9, a=1
# ** Note: $finish    : ring_osc_tb.v(14)