 # 03-synth

Hardware synthesis of digital circuits from RTL descriptions. Differences over traditional software design. Understanding how hardware is generated from RTL code. Arithmetic circuits like adders and multipliers. Matrix Multiplication example.

# Synthesis

To run the examples accompanying this lecture, please checkout the git repository. Also, please make sure to run through the fpga-impl lab manual to ensure you understand the way Vivado synthesis tool works.

$ssh -X <username>@eceubuntu.uwaterloo.ca$ mkdir $HOME/ece327-lectures$ cd $HOME/ece327-lectures$ git clone gitlab@git.uwaterloo.ca:ece327-lectures-s19/03-synth.git
$cd 03-synth/code  ## Your first synthesis Congratulations you have managed to simulate your code! Now its time to synthesize it into hardware blocks. Let's look at a simple polynomial example. $ cat poly_combi.v

...
always @(*) begin
y0 <= a * x;
y1 <= y0 + b;
y2 <= y1 * x;
y_c <= y2 + c;
end


In this example, we have manually refactored (a.x.x+b.x+c) into (a.x+b).x+c to reduce one multiplication.

Run the following command to synthesize hardware for this example.

$vivado -source poly_combi.tcl  It will launch a GUI, synthesize the netlist, and generate a schematic of the compiled design. This datapath is stitched together by considering the dataflow dependencies between the signals, inferring adders and multipliers for the arithmetic operations, and creating IO ports to feed data into the hardware. The synthesis task is to map your computation into logic blocks. In this example, we generate an arithmetic datapath, which internally are built using logic gates. At this point there are no registers in this design which makes it impossible to do timing analysis. For timing analysis to work, we must have a timing path which starts and ends at a register location. Next, we compile a registered version of this design. $ cat poly_reg.v

...
always @(posedge clk) begin
if(rst) begin
y <= 0;
a_r <= 0;
b_r <= 0;
c_r <= 0;
x_r <= 0;
end else begin
a_r <= a;
b_r <= b;
c_r <= c;
x_r <= x;
y <= y_c;
end


The main difference over the previous combinational design is the introduction of an always block to register inputs and output.

$vivado -source poly_reg.tcl  We use _c notation to represent combinational signals, and _r notation to represent registered signals. Now, because we have both input and output registers, the schematic looks a little different The report_timing_summary command will tell you that the design can operate with a clock period of 12.741ns ns. (constraint + slack, which probably wont make a lot of sense to you right now, but hold onto your seats, as the joyride is just beginning). You can inspect the vivado.log file to check the timing report messages. ## Structural Composition Verilog can also construct hardware through structural composition. In this instance, we instantiate copies of the hardware block. Each module instance gets a physical hardware block associated with it. In software, we cannot trivially have two functions expressed on after another to have a reverse dependency against the flow of execution. Talk to me for a trick in C using static + while loop that could achive this outcome. In hardware, we can easily have a cyclic dependency through the series of module instantiations as long as data is pipelined (registered). In this case, the solution will generate a stable result. Commmunication structures like rings, torii, meshes can be created in hardware through this approach. In this toy example, we create two copies of poly_combi and wire the y output of poly_inst0 as x input of poly_inst1. We also wire back output y of poly_inst1 to the x input of poly_inst0. We only take the 8 LSB bits when connecting x. $ cat poly_chain.v

...
poly_combi poly_inst0(.a(a),.b(b),.c(c),.x(t0[7:0]),.y(t1));
poly_combi poly_inst1(.a(a),.b(b),.c(c),.x(t1[7:0]),.y(t0));


You can synthesize this design to generate a schematic

$vivado -mode tcl -s poly_chain.tcl  ## Recursion, Looping for Replication Unlike software loops that iterate over the same piece of code, hardware loops result in generation of multiple physical copies of the inner body of the loop. This is a key and important distinction about generation of hardware. If we compile poly_loop.v or poly_recurse.v, we will get a schematic that looks like the following: $ vivado -mode tcl -s poly_loop.tcl


You see multiple physical copies of the same poly_combi block. Inside each block, you have a complete polynomial datapath.

## Conditional Statements

A key construct that you need to understand is the use of conditional statements. In software, jumps and branches allow if-then-else statements to be executed only when required by the computation. In hardware, the resources for evaluating all branches, as well as condition must be provisioned together. This is key to unlocking parallelism, but also gives rise to higher than expected resource use. To ensure you are generating exactly the hardware you intend, a few simple variations in code, and their impact of resulting hardware is presented.

### If-then-else statements

If you provide incomplete conditional information to the synthesis tool, the tool will generate latches. Latches are multiplexers with feedback. This feedback is implicitly inferred as the incomplete conditions will tell the tool to help the signal retain existing value. This creates a feedback edge, hence a latch. Latches are generally bad as they are somewhat hard to analyze. We will aspire to avoid latches in our code.

$cat ifthen.v ... always @(a or b or c or x) begin if(x > 1000) begin y <= c; end else if(x > 100) begin y <= b; end else if(x > 10) begin y <= a; end else if(x > 0) begin y <= x; end end  Note that the conditions have overlapping ranges -- which means multiple if/then statements can be true at the same time. As per sequential order, the if/then statemets encountered first will have priority. You can run Vivado synthesis to check the resulting schematic post RTL elaboration. You will notice the RTL_LATCH component driving y as we expect. $ vivado -mode tcl -s ifthen.tcl


If the else clause is specified, then is no ambiguity in the multiplexer choices. Hence, there is no need to use feedback (unless explicitly specified). This is the preferred style of coding.

$cat ifthenelse.v ... always @(a or b or c or x) begin if(x > 1000) begin y <= c; end else if(x > 100) begin y <= b; end else if(x > 10) begin y <= a; end else if(x > 0) begin y <= x; end else begin y <= 0; end end  Running Vivado will now show you an RTL_MUX component driving y. This has no feedback. $ cd code
$vivado -mode tcl -s ifthenelse.tcl  However, notice that the tool has created a chain of multiplexers for each if/then condition. This is due to prioritization. The first if statement has the highest priority and will drive the outputs at the very end (thus determining its value). Other if statements that come later follow the order of nesting of if statements and feed into each other. While priority is not inherently wrong, the long chain of multiplexers will affect you clock period and reduce maximum operating frequency of the design. ### Case statements If the condition codes are mutually exclusive, then we have additional information that can guide the synthesis process. Instead of an if statment, you can use a case statement instead where no priority is implied. The absence of priority generates a balanced multiplexer structure instead of a long chain. This helps reduce clock period and improves the clock frequency of your design. Note that the modified example has mutually exclusive matching for values of x. $ cat casestmt.v
...
always @(a or b or c or x) begin
case(x)
8'h00 : begin
y <= c;
end
8'h01 : begin
y <= b;
end
8'h02 : begin
y <= a;
end
8'h03 : begin
y <= x;
end
default : begin
y <= 0;
end
endcase
end


You will see a large flat multiplexer in the Vivado schematic for this example.

$cd code$ vivado -mode tcl -s casestmt.tcl


The schematic can be opened by clicking

A careless coder may forget the default clause in the case statement and still end up with a latch. This scenario is illustrated below:

$cat casealone.v ... always @(a or b or c or x) begin case(x) 8'h00 : begin y <= c; end 8'h01 : begin y <= b; end 8'h02 : begin y <= a; end 8'h03 : begin y <= x; end endcase end  Here, we notice the re-introduction of an RTL_LATCH to handle the default input to the multiplexer. It is even worse as there is also an RTL_ROM component that drives the g input of the RTL_LATCH. This ROM contains a table of values telling the latch when to use feedback -- which for us is all values of x that exclude the range 0-3. For an 8-bit input, this translates to 28 -4 entries in the table -- a needlessly large hardware structure. $ cd code
\$ vivado -mode tcl -s casealone.tcl


Advice: Observe tool warnings during synthesis to determine if you are generating correct code as you anticipated.