To run the examples accompanying this lecture, please checkout the git repository. Also, please make sure to run through the
fpga-impl lab manual to ensure you understand the way Vivado synthesis tool works.
$ ssh -X <username>@eceubuntu.uwaterloo.ca $ mkdir $HOME/ece327-lectures $ cd $HOME/ece327-lectures $ git clone email@example.com:ece327-lectures-s19/03-synth.git $ cd 03-synth/code
Your first synthesis
Congratulations you have managed to simulate your code! Now its time to synthesize it into hardware blocks.
Let's look at a simple polynomial example.
$ cat poly_combi.v
... always @(*) begin y0 <= a * x; y1 <= y0 + b; y2 <= y1 * x; y_c <= y2 + c; end
In this example, we have manually refactored (a.x.x+b.x+c) into (a.x+b).x+c to reduce one multiplication.
Run the following command to synthesize hardware for this example.
$ vivado -source poly_combi.tcl
It will launch a GUI, synthesize the netlist, and generate a schematic of the compiled design.
This datapath is stitched together by considering the dataflow dependencies between the signals, inferring adders and multipliers for the arithmetic operations, and creating IO ports to feed data into the hardware.
The synthesis task is to map your computation into logic blocks. In this example, we generate an arithmetic datapath, which internally are built using logic gates.
At this point there are no registers in this design which makes it impossible to do timing analysis. For timing analysis to work, we must have a timing path which starts and ends at a register location.
Next, we compile a registered version of this design.
$ cat poly_reg.v
... always @(posedge clk) begin if(rst) begin y <= 0; a_r <= 0; b_r <= 0; c_r <= 0; x_r <= 0; end else begin a_r <= a; b_r <= b; c_r <= c; x_r <= x; y <= y_c; end
The main difference over the previous combinational design is the introduction of an
always block to register inputs and output.
$ vivado -source poly_reg.tcl
_c notation to represent combinational signals, and
_r notation to represent registered signals.
Now, because we have both input and output registers, the schematic looks a little different
report_timing_summary command will tell you that the design can operate with a clock period of 12.741ns ns. (constraint + slack, which probably wont make a lot of sense to you right now, but hold onto your seats, as the joyride is just beginning). You can inspect the
vivado.log file to check the timing report messages.
Verilog can also construct hardware through structural composition. In this instance, we instantiate copies of the hardware block. Each module instance gets a physical hardware block associated with it.
In software, we cannot trivially have two functions expressed on after another to have a reverse dependency against the flow of execution. Talk to me for a trick in C using
while loop that could achive this outcome.
In hardware, we can easily have a cyclic dependency through the series of module instantiations as long as data is pipelined (registered). In this case, the solution will generate a stable result.
Commmunication structures like rings, torii, meshes can be created in hardware through this approach.
In this toy example, we create two copies of
poly_combi and wire the
y output of
x input of
poly_inst1. We also wire back output
poly_inst1 to the
x input of
poly_inst0. We only take the 8 LSB bits when connecting
$ cat poly_chain.v
... poly_combi poly_inst0(.a(a),.b(b),.c(c),.x(t0[7:0]),.y(t1)); poly_combi poly_inst1(.a(a),.b(b),.c(c),.x(t1[7:0]),.y(t0));
You can synthesize this design to generate a schematic
$ vivado -mode tcl -s poly_chain.tcl
Recursion, Looping for Replication
Unlike software loops that iterate over the same piece of code, hardware loops result in generation of multiple physical copies of the inner body of the loop. This is a key and important distinction about generation of hardware.
If we compile
poly_recurse.v, we will get a schematic that looks like the following:
$ vivado -mode tcl -s poly_loop.tcl
You see multiple physical copies of the same
poly_combi block. Inside each block, you have a complete polynomial datapath.
A key construct that you need to understand is the use of conditional statements. In software, jumps and branches allow if-then-else statements to be executed only when required by the computation. In hardware, the resources for evaluating all branches, as well as condition must be provisioned together. This is key to unlocking parallelism, but also gives rise to higher than expected resource use. To ensure you are generating exactly the hardware you intend, a few simple variations in code, and their impact of resulting hardware is presented.
If you provide incomplete conditional information to the synthesis tool, the tool will generate latches. Latches are multiplexers with feedback. This feedback is implicitly inferred as the incomplete conditions will tell the tool to help the signal retain existing value. This creates a feedback edge, hence a latch. Latches are generally bad as they are somewhat hard to analyze. We will aspire to avoid latches in our code.
$ cat ifthen.v ... always @(a or b or c or x) begin if(x > 1000) begin y <= c; end else if(x > 100) begin y <= b; end else if(x > 10) begin y <= a; end else if(x > 0) begin y <= x; end end
Note that the conditions have overlapping ranges -- which means multiple if/then statements can be true at the same time. As per sequential order, the if/then statemets encountered first will have priority.
You can run Vivado synthesis to check the resulting schematic post RTL elaboration. You will notice the
RTL_LATCH component driving
y as we expect.
$ vivado -mode tcl -s ifthen.tcl
If the else clause is specified, then is no ambiguity in the multiplexer choices. Hence, there is no need to use feedback (unless explicitly specified). This is the preferred style of coding.
$ cat ifthenelse.v ... always @(a or b or c or x) begin if(x > 1000) begin y <= c; end else if(x > 100) begin y <= b; end else if(x > 10) begin y <= a; end else if(x > 0) begin y <= x; end else begin y <= 0; end end
Running Vivado will now show you an
RTL_MUX component driving
y. This has no feedback.
$ cd code $ vivado -mode tcl -s ifthenelse.tcl
However, notice that the tool has created a chain of multiplexers for each if/then condition. This is due to prioritization. The first if statement has the highest priority and will drive the outputs at the very end (thus determining its value). Other if statements that come later follow the order of nesting of if statements and feed into each other. While priority is not inherently wrong, the long chain of multiplexers will affect you clock period and reduce maximum operating frequency of the design.
If the condition codes are mutually exclusive, then we have additional information that can guide the synthesis process. Instead of an if statment, you can use a case statement instead where no priority is implied. The absence of priority generates a balanced multiplexer structure instead of a long chain. This helps reduce clock period and improves the clock frequency of your design.
Note that the modified example has mutually exclusive matching for values of
$ cat casestmt.v ... always @(a or b or c or x) begin case(x) 8'h00 : begin y <= c; end 8'h01 : begin y <= b; end 8'h02 : begin y <= a; end 8'h03 : begin y <= x; end default : begin y <= 0; end endcase end
You will see a large flat multiplexer in the Vivado schematic for this example.
$ cd code $ vivado -mode tcl -s casestmt.tcl
The schematic can be opened by clicking
A careless coder may forget the
default clause in the case statement and still end up with a latch. This scenario is illustrated below:
$ cat casealone.v ... always @(a or b or c or x) begin case(x) 8'h00 : begin y <= c; end 8'h01 : begin y <= b; end 8'h02 : begin y <= a; end 8'h03 : begin y <= x; end endcase end
Here, we notice the re-introduction of an
RTL_LATCH to handle the
default input to the multiplexer. It is even worse as there is also an
RTL_ROM component that drives the
g input of the
RTL_LATCH. This ROM contains a table of values telling the latch when to use feedback -- which for us is all values of
x that exclude the range 0-3. For an 8-bit input, this translates to 28 -4 entries in the table -- a needlessly large hardware structure.
$ cd code $ vivado -mode tcl -s casealone.tcl
Advice: Observe tool warnings during synthesis to determine if you are generating correct code as you anticipated.