Post-Implementation Functional Simulation

This tutorial describes how to verify the functional correctness of a circuit implemented by VPR using Verilator, an open-source hardware simulator.

Functional simulation is useful for:

  • Verifying that the VTR flow has correctly implemented the circuit’s intended logic

  • Quickly checking correctness before running a full timing simulation

  • Debugging unexpected circuit behaviour after implementation

Unlike timing simulation, functional simulation does not require a Standard Delay Format (SDF) file; only the post-implementation Verilog netlist and VTR’s primitive definitions are needed.

Note

This tutorial requires Verilator 5.006 or later. Verilator can be installed from your distribution’s package manager or built from source: https://github.com/verilator/verilator

On Ubuntu, install it with:

$ sudo apt install verilator

The Design

The circuit for this tutorial is a 4-bit ripple-carry adder with the following ports:

  • Inputs: cin (carry-in), a0-a3 and b0-b3 (4-bit operands, a0 / b0 are the least-significant bits)

  • Outputs: s0-s3 (sum bits, s0 least-significant) and cout (carry-out)

When synthesis compiles this verilog file to a blif file, buses get turned into individual bits rather than ports. Using individual single-bit ports (rather than bus ports such as input [3:0] a) keeps the post-implementation netlist port names simple and easy to reference in the testbench.

Create a working directory and save the design as top.v:

$ mkdir func_sim_tut
$ cd func_sim_tut
Listing 32 4-bit ripple-carry adder top.v.
 1module top (
 2    input  cin,
 3    input  a0, a1, a2, a3,
 4    input  b0, b1, b2, b3,
 5    output s0, s1, s2, s3,
 6    output cout
 7);
 8    wire c0, c1, c2;
 9
10    // Bit 0
11    assign s0 = a0 ^ b0 ^ cin;
12    assign c0 = (a0 & b0) | ((a0 ^ b0) & cin);
13
14    // Bit 1
15    assign s1 = a1 ^ b1 ^ c0;
16    assign c1 = (a1 & b1) | ((a1 ^ b1) & c0);
17
18    // Bit 2
19    assign s2 = a2 ^ b2 ^ c1;
20    assign c2 = (a2 & b2) | ((a2 ^ b2) & c1);
21
22    // Bit 3
23    assign s3 = a3 ^ b3 ^ c2;
24    assign cout = (a3 & b3) | ((a3 ^ b3) & c2);
25endmodule

Generating the Post-Implementation Netlist

Copy the VTR flagship architecture into the working directory:

$ cp $VTR_ROOT/vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml .

Note

Replace $VTR_ROOT with the root directory of the VTR source tree.

Run the full VTR flow using run_vtr_flow.py. The vpr --gen_post_synthesis_netlist option instructs VPR to write the post-implementation Verilog netlist alongside the usual place-and-route output files:

$ source $VTR_ROOT/.venv/bin/activate
$ python3 $VTR_ROOT/vtr_flow/scripts/run_vtr_flow.py \
$     top.v \
$     k6_frac_N10_frac_chain_mem32K_40nm.xml \
$     --gen_post_synthesis_netlist on

After the flow completes you should see the post-implementation netlist:

$ ls temp/top_post_synthesis.v temp/top_post_synthesis.sdf
top_post_synthesis.sdf  top_post_synthesis.v

Note

The SDF file (top_post_synthesis.sdf) contains back-annotated timing delays and is used for timing simulation. Functional simulation does not require it.

Inspecting the Post-Implementation Netlist

The generated netlist expresses the circuit using FPGA primitives defined in $VTR_ROOT/vtr_flow/primitives.v. Below is a representative snippet:

Listing 33 Snippet of top_post_synthesis.v.
module top (
    input \cin ,
    input \a0 ,
    input \a1 ,
    input \a2 ,
    input \a3 ,
    input \b0 ,
    input \b1 ,
    input \b2 ,
    input \b3 ,
    output \s0 ,
    output \s1 ,
    output \s2 ,
    output \s3 ,
    output \cout
);

    // ... wire declarations and interconnect ...

    LUT_K #(
        .K(5),
        .LUT_MASK(32'b00000000010000010000000000010100)
    ) \lut_s0  (
        .in({ \lut_s0_input_0_4 , 1'bX, \lut_s0_input_0_2 ,
              \lut_s0_input_0_1 , 1'bX }),
        .out(\lut_s0_output_0_0 )
    );

    // ... more LUT_K cells ...

endmodule

The module is named after the top-level module of the design being packed, placed, and routed by VTR (top in this tutorial). Port names beginning with \ are standard Verilog escaped identifiers; \cin (with a trailing space) is the same identifier as plain cin, so they connect to testbench signals named without the backslash.

The primitives used here, LUT_K (look-up table) and fpga_interconnect (routing wire), are defined in primitives.v. If the architecture’s carry chain is used, adder primitives from primitives.v may also appear.

Creating a Testbench

The testbench tb.sv drives every possible combination of the 9 input bits (29 = 512 vectors), computes the expected sum using ordinary arithmetic, and flags any mismatch:

Listing 34 Testbench tb.sv.
 1`timescale 1ns/1ps
 2module tb;
 3
 4    // DUT I/O
 5    logic cin;
 6    logic a0, a1, a2, a3;
 7    logic b0, b1, b2, b3;
 8    logic s0, s1, s2, s3, cout;
 9
10    logic [4:0] expected, actual;
11    int errors;
12
13    // Instantiate the post-implementation netlist.
14    // The escaped port names (\cin, \a0, ...) are identical to the
15    // unescaped names used below, so the named connections work directly.
16    top dut (
17        .cin (cin),
18        .a0  (a0), .a1 (a1), .a2 (a2), .a3 (a3),
19        .b0  (b0), .b1 (b1), .b2 (b2), .b3 (b3),
20        .s0  (s0), .s1 (s1), .s2 (s2), .s3 (s3),
21        .cout(cout)
22    );
23
24    initial begin
25        errors = 0;
26
27        // Sweep all 2^9 = 512 input combinations
28        for (int c = 0; c < 2; c++) begin
29            cin = c[0];
30            for (int a = 0; a < 16; a++) begin
31                {a3, a2, a1, a0} = 4'(a);
32                for (int b = 0; b < 16; b++) begin
33                    {b3, b2, b1, b0} = 4'(b);
34                    #1; // allow combinational outputs to settle
35
36                    expected = 5'(a + b + c);
37                    actual   = {cout, s3, s2, s1, s0};
38
39                    if (actual !== expected) begin
40                        $display("FAIL: a=%0d b=%0d cin=%0d  expected=%0d  got=%0d",
41                                 a, b, c, expected, actual);
42                        errors++;
43                    end
44                end
45            end
46        end
47
48        if (errors == 0) begin
49            $display("All 512 tests PASSED.");
50            $finish;          // exit 0 — success
51        end else begin
52            $display("%0d test(s) FAILED.", errors);
53            $fatal(1);        // exit 1 — failure
54        end
55    end
56
57endmodule

Lines 16-22 instantiate the DUT (Device Under Test), the post-implementation netlist module top, using named port connections. Line 36 computes the expected 5-bit result using integer arithmetic, and line 37 assembles the actual 5-bit result from the individual DUT output bits. Lines 39-43 use $display (not $error) to report each mismatch. $display is non-fatal, so every one of the 512 vectors is checked and all failures are reported before the simulation exits. Lines 50-56 select the final exit path: $finish exits with code 0 (success) and $fatal(1) exits with code 1 (failure), making the simulation suitable for use as a CI test.

Simulating with Verilator

Compile the testbench, post-implementation netlist, and VTR primitives together. Verilator’s --binary flag produces a standalone simulation executable in a single step:

$ verilator --binary -sv \
$     tb.sv \
$     temp/top_post_synthesis.v \
$     $VTR_ROOT/vtr_flow/primitives.v \
$     --top-module tb \
$     --Mdir sim_build \
$     -j $(nproc)

This compiles tb.sv, the post-implementation netlist, and the VTR primitives into a standalone simulation binary under sim_build/. The -j $(nproc) flag parallelises the build across all available CPU cores. This generates the simulation binary sim_build/Vtb. Run it to execute all 512 test vectors:

$ ./sim_build/Vtb
All 512 tests PASSED.

Note

primitives.v uses Verilog specify blocks to model timing. Verilator ignores these for functional simulation, so no timing information is back-annotated, which is the desired behaviour here.

If any test vector fails, the simulation prints each failure, reports the total count, and exits with a non-zero return code:

FAIL: a=3 b=5 cin=0  expected=8  got=7
1 test(s) FAILED.
%Fatal: tb.sv:53: $fatal called

Such an error would indicate a bug in VPR’s implementation of the circuit (assuming the input Verilog design and the testbench are coded correctly/match), which can then be investigated by examining the post-implementation netlist or running the full timing simulation with waveform capture.