To run the examples accompanying this lecture, please checkout the git repository
$ ssh <username>@eceubuntu.uwaterloo.ca $ mkdir $HOME/ece327-lectures $ cd $HOME/ece327-lectures $ git clone email@example.com:ece327-lectures-s19/02-sim.git $ cd 02-sim/code
Ordering of Execution
When understanding RTL design, we need to first understand the simulation semantics of the language. Hardware simulators are fundamentally managing execution of parallel hardware components described in a sequential manner. Different
always blocks will execute in parallel but still produce consistent results independent of which block executed first. In hardware, these components will be continuously in operation.
We first look at how a simple example with two
always blocks will be processed by the simulator to give you consistent stable results. The order of selection of the
always block does not matter due to the evaluation model of non-blocking statements.
$ cat example.v
... reg a=1'bX; wire b=1'b1; always @(b) begin : thread_1 a <= b + 1; end always @(a) begin : thread_2 c <= a + 1; end
So in this example we have two blocks
thread_2. The output of
thread_1 is the signal
a. This is an input to the second blocks
thread_2. As discussed in the lecture, if they are allowed to execute in an ad-hoc manner, we will get some weird intermediate values of the signals.
If we simulate this code in Modelsim, and look at the output waveform, we will see the exact sequencing of assignments clearly.
$ vsim -do example.tcl
We can generate this waveform by first zooming to 0ps simulation time. Next, we can click on Expanded Time Events Mode. Finally, click on Expand Time at Active Cursor. This should generate the waveform shown below:
From the waveform we can discern the following:
0ps Initially, all signals
0ps+1 Next, we observe
bacquires the value
1due to the declaration
wire b=1'b1;in the code. While both blocks executed, the RHS values of
care unchanged, remaining at
0ps+2 Now, based on the sensitivity list, only the
bblock will execute. Since
X, the value of
cwill stay at
Xand that block will not be executed at this point. As the value of
bdid change to
1in the previous step, we see that
ahas gone to
0. This makes sense as
1+1for 1b result is
0ps+3 Finally, only the
ablock will execute in this artificial timestep.
agot the value
0in the previous step, hence
1. Value of
bwas unchanged, so the value of
awill not change either and the
bblock stays untouched in this round.
In the lecture, we showed you a 32b example. In Modelsim, the initial assignment
wire b=32'h0001 takes 32 initial deltas as the simulator assigns them one bit at a time. This is a mismatch that we have chosen to ignore for simplicity. For the exam, you can bundle multi-bit assignments into a single delta. The true Modelsim waveform is shown below and can be reproduced with
Effect of Input Changes
Always blocks in Verilog are executed by the simulator if the inputs on the sensitivity list are changed. If nothing changes, there is no need to recalculate outputs.
We can understand this with the
poly_combi.v example used in the lecture. This code computes
y1<=a*x*x in the first block,
y2<=y1+b*x in a second block, and
y<=y2+c in the third block.
$ cat poly_combi.v ... always @(a or x) begin: axx $display("y1<=axx fired"); y1 <= a * x * x; end always @(y1 or b or x) begin: bx $display("y2<=y1+bx fired"); y2 <= y1 + b * x; end always @(y2 or c) begin: plusc $display("y<=y2+c fired"); y <= y2 + c; end
An accompanying testbench is provided that isolates the effect of each input separately. We use an
initial block and
timed assignments to ensure updates are staggered at 2 ps intervals.
$ cat poly_combi_tb.v ... initial begin #2 a <= 1; b <= 1; c <= 1; x <= 10; $display("x changes"); #2 a <= 1; b <= 1; c <= 10; x <= 10; $display("c changes"); #2 a <= 1; b <= 10; c <= 10; x <= 10; $display("b changes"); #2 a <= 10; b <= 10; c <= 10; x <= 10; $display("a changes"); #2 $finish; end
We can run the bundled
poly_combi.tcl script to run Modelsim.
$ vsim -do poly_combi.tcl
We can inspect the timing waveforms again. Now, we can use the Expand Time Deltas Mode instead to avoid showing each update as the adder carry chain propagates through the circuit.
- For c changing
- For x changing
- For a changing
Verilog can also be used to model delays in your RTL design. Once implemented the gates and wires in your design will have physical delays. Verilog offers a rich abstraction for modeling various kinds of delays in your design. The two most important kinds are:
- Inertial Delay -- output will be delayed a specified period from input. Input pulses shorter than the specified period will be gobbled up (not transmitted to output).
- Transport Delay -- output will always be delayed a specified period from input irrespective of pulse width.
Let's look at two simple examples of inertial and transport delay in
$ cat inertial.v
Here we introduce inertial delay on the LHS of an
... assign #2 b = a;
$ cat transport.v
Here, we introduce transport delay on the RHS of a
... always @(a) begin b <= #2 a; end
The above two styles are the two prescribed ways to get the exact delays needed to model hardware components. DO NOT mess up RHS and LHS, non-blocking and blocking combinations. History and grading will not be kind to offenders.
We can run a Modelsim simulation of the above two examples
$ vsim -do delay.tcl
It is also possible to provide a rich description of pin-to-pin delays as showed in the example
$ cat p2p.v
... assign c = a ^ b; // xor specify (a => c) = (1); (b => c) = (2); endspecify
In this example, we use the
specify block to supply timing parameters to input=>output path combinations. If
c will change after 1ns. If
b changes, the output
c will only change after 2ns. Thus both inputs have different ways to influence the output transition.
We can simulate this to confirm this behavior
$ vsim -do p2p.tcl
Clocked Circuit Simulation
Finally, we will investigate the effect of clocking on the timing diagram. We use
poly_reg_tb.v to generate this waveform. We use the testbench to produce a
clk (clock) and
rst (rest) signal. You will find this pattern in pretty much every other RTL design in the future. We must now use Registers (the R in RTL) in our design. These are storage elements that hold their values stable for a clock cycle even if inputs change. This is unlike the purely combinational blocks we have seen so far.
poly_reg.v, the computation is performed in a single clock cycle. A programmer must do this with care to ensure that the clock period is large enough to complete the task assigned to that cycle. In this example, we must be able to compute
a*x*x + b*x +c all within a single clock period. The
rst if-then statement is used to ensure that the system starts off with a stable reset state for all registers. Also note that the signals are only updated on
posedge (positive edge) of the clock. At that instant, the RHS values of the signals used in that always block are sampled (snapshotted) and then used for computations within the block. Also note that the
rst here is a synchronous reset and will only take effect on a clock edge. If you want asynchronous resets, then you must modify the sensitivity list to
(posedge clk or posedge rst). This practice is frowned upon for FPGA desig, but may be fine if you're building your own ASICs.
$ cat poly_reg.v ... always @(posedge clk) begin if(rst) begin y <= 0; a_r <= 0; b_r <= 0; c_r <= 0; x_r <= 0; end else begin a_r <= a; b_r <= b; c_r <= c; x_r <= x; y <= a_r * x_r * x_r + b_r * x_r + c_r; end end
If you observe carefully, most of the testbench is almost identical to the one used before in
We are now generating
rst from the test bench. We generate clock using delayed assignment as shown below.
The clock signal is declared initially with a starting value of 0 and the toggled every 1 ps to give us a period of 2 ps.
$ cat poly_reg_tb.v ... reg clk = 1'b0; always #1 clk <= !clk; ...
We can again simulate this code in Modelsim, and look at the output waveform.
$ vsim -do poly_reg.tcl
The waveform looks similar to earlier. However, there is now a clock signal that ensures outputs only change on
posedge (positive edge).
Unlike the cascade of executions of always blocks from
poly_combi.v there is only one timestep when using registers. See the figure below with expanded time deltas.. Previously,
x changing cause many always blocks to execute and multiple times. Now, the value of
x is first captured in
x_r in the first cycle. And in the following cycle, we see the output
y getting updated to 111. We have now introduced a synchronous discipline to the computation so that we only have to reason about things changing around clock edges. This model of computation is fundamental to RTL design and key to organizing logic in hardware.
Simulations with Feedback Loops
When you have combinational feedback loops in the design with non-blocking statements, the simulator will not converge. To understand why this happens, consider how we decompose the computation. If an LHS changes, we recompute all RHS values. At this point we are not assigning the newly computed values to LHS signals. This happens, after all RHS evaluations are completed. Next, we assign the newly computed RHS values to LHS. In a feedback loop, the newly updated LHS entries will trigger a new round of RHS evaluations. This in turn will update the LHS, and this cycle will continue.
Take a look at a ring oscillator example shown below:
$ cat ring_osc_bad.v
... reg a=1'b0; // initial value always @(a) begin: invert $display("Time=%0d, a=%0d", $time, a); a <= ~(a); end
Note the use of non-blocking assignment to update the same LHS signal as the one that appears in the RHS.
Run a Verilog Modelsim simulation:
$ vsim -c -do ring_osc_bad.tcl
You will find that simulation time never advances and the simulator quits after an internal iteration limit is reached.
... # Start time: 11:13:34 on Apr 26,2019 # Loading work.ring_osc_tb # Loading work.ring_osc # Time=0, a=0 # Time=0, a=1 # Time=0, a=0 # Time=0, a=1 # Time=0, a=0 # Time=0, a=1 . . . # Time=0, a=1 # Time=0, a=0 # Time=0, a=1 # Time=0, a=0 # Time=0, a=1 # Time=0, a=0 # Time=0, a=1 # Time=0, a=0 # Time=0, a=1 # Time=0, a=0 # ** Error (suppressible): (vsim-3601) Iteration limit 5000 reached at time 0 ps. VSIM 2>
How can we resolve this error? Ring oscillators are valid circuits we may want in our design!
$ cat ring_osc_good.v
... always @(a) begin: invert $display("Time=%0d, a=%0d", $time, a); a <= #1 ~(a); end
We simply use a transport delayed assignment.
$ vsim -c -do ring_osc_good.tcl
This generates the proper oscillation trace with advancing physical time.
# Start time: 11:16:06 on Apr 26,2019 # Loading work.ring_osc_tb # Loading work.ring_osc # Time=0, a=0 # Time=1, a=1 # Time=2, a=0 # Time=3, a=1 # Time=4, a=0 # Time=5, a=1 # Time=6, a=0 # Time=7, a=1 # Time=8, a=0 # Time=9, a=1 # ** Note: $finish : ring_osc_tb.v(14)