.. _func_simulation_tutorial: Post-Implementation Functional Simulation ------------------------------------------ This tutorial describes how to verify the functional correctness of a circuit implemented by :ref:`VPR` using `Verilator `__, an open-source hardware simulator. Functional simulation is useful for: * Verifying that the VTR flow has correctly implemented the circuit's intended logic * Quickly checking correctness before running a full :ref:`timing simulation ` * Debugging unexpected circuit behaviour after implementation Unlike :ref:`timing simulation `, functional simulation does not require a Standard Delay Format (SDF) file; only the post-implementation Verilog netlist and VTR's primitive definitions are needed. .. note:: This tutorial requires Verilator 5.006 or later. Verilator can be installed from your distribution's package manager or built from source: https://github.com/verilator/verilator On Ubuntu, install it with: .. code-block:: console $ sudo apt install verilator The Design ~~~~~~~~~~ The circuit for this tutorial is a **4-bit ripple-carry adder** with the following ports: * **Inputs**: ``cin`` (carry-in), ``a0``-``a3`` and ``b0``-``b3`` (4-bit operands, ``a0`` / ``b0`` are the least-significant bits) * **Outputs**: ``s0``-``s3`` (sum bits, ``s0`` least-significant) and ``cout`` (carry-out) When synthesis compiles this verilog file to a blif file, buses get turned into individual bits rather than ports. Using individual single-bit ports (rather than bus ports such as ``input [3:0] a``) keeps the post-implementation netlist port names simple and easy to reference in the testbench. Create a working directory and save the design as ``top.v``: .. code-block:: console $ mkdir func_sim_tut $ cd func_sim_tut .. code-block:: verilog :linenos: :caption: 4-bit ripple-carry adder ``top.v``. module top ( input cin, input a0, a1, a2, a3, input b0, b1, b2, b3, output s0, s1, s2, s3, output cout ); wire c0, c1, c2; // Bit 0 assign s0 = a0 ^ b0 ^ cin; assign c0 = (a0 & b0) | ((a0 ^ b0) & cin); // Bit 1 assign s1 = a1 ^ b1 ^ c0; assign c1 = (a1 & b1) | ((a1 ^ b1) & c0); // Bit 2 assign s2 = a2 ^ b2 ^ c1; assign c2 = (a2 & b2) | ((a2 ^ b2) & c1); // Bit 3 assign s3 = a3 ^ b3 ^ c2; assign cout = (a3 & b3) | ((a3 ^ b3) & c2); endmodule Generating the Post-Implementation Netlist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Copy the VTR flagship architecture into the working directory: .. code-block:: console $ cp $VTR_ROOT/vtr_flow/arch/timing/k6_frac_N10_frac_chain_mem32K_40nm.xml . .. note:: Replace :term:`$VTR_ROOT` with the root directory of the VTR source tree. Run the full VTR flow using ``run_vtr_flow.py``. The :option:`vpr --gen_post_synthesis_netlist` option instructs VPR to write the post-implementation Verilog netlist alongside the usual place-and-route output files: .. code-block:: console $ source $VTR_ROOT/.venv/bin/activate $ python3 $VTR_ROOT/vtr_flow/scripts/run_vtr_flow.py \ $ top.v \ $ k6_frac_N10_frac_chain_mem32K_40nm.xml \ $ --gen_post_synthesis_netlist on After the flow completes you should see the post-implementation netlist: .. code-block:: console $ ls temp/top_post_synthesis.v temp/top_post_synthesis.sdf top_post_synthesis.sdf top_post_synthesis.v .. note:: The SDF file (``top_post_synthesis.sdf``) contains back-annotated timing delays and is used for :ref:`timing simulation `. Functional simulation does not require it. Inspecting the Post-Implementation Netlist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The generated netlist expresses the circuit using FPGA primitives defined in ``$VTR_ROOT/vtr_flow/primitives.v``. Below is a representative snippet: .. code-block:: verilog :caption: Snippet of ``top_post_synthesis.v``. :emphasize-lines: 1,20-23 module top ( input \cin , input \a0 , input \a1 , input \a2 , input \a3 , input \b0 , input \b1 , input \b2 , input \b3 , output \s0 , output \s1 , output \s2 , output \s3 , output \cout ); // ... wire declarations and interconnect ... LUT_K #( .K(5), .LUT_MASK(32'b00000000010000010000000000010100) ) \lut_s0 ( .in({ \lut_s0_input_0_4 , 1'bX, \lut_s0_input_0_2 , \lut_s0_input_0_1 , 1'bX }), .out(\lut_s0_output_0_0 ) ); // ... more LUT_K cells ... endmodule The module is named after the top-level module of the design being packed, placed, and routed by VTR (``top`` in this tutorial). Port names beginning with ``\`` are standard Verilog *escaped identifiers*; ``\cin`` (with a trailing space) is the same identifier as plain ``cin``, so they connect to testbench signals named without the backslash. The primitives used here, ``LUT_K`` (look-up table) and ``fpga_interconnect`` (routing wire), are defined in ``primitives.v``. If the architecture's carry chain is used, ``adder`` primitives from ``primitives.v`` may also appear. Creating a Testbench ~~~~~~~~~~~~~~~~~~~~ The testbench ``tb.sv`` drives every possible combination of the 9 input bits (2\ :sup:`9` = 512 vectors), computes the expected sum using ordinary arithmetic, and flags any mismatch: .. code-block:: systemverilog :linenos: :caption: Testbench ``tb.sv``. :emphasize-lines: 16-22,36-37,39-43,50-56 `timescale 1ns/1ps module tb; // DUT I/O logic cin; logic a0, a1, a2, a3; logic b0, b1, b2, b3; logic s0, s1, s2, s3, cout; logic [4:0] expected, actual; int errors; // Instantiate the post-implementation netlist. // The escaped port names (\cin, \a0, ...) are identical to the // unescaped names used below, so the named connections work directly. top dut ( .cin (cin), .a0 (a0), .a1 (a1), .a2 (a2), .a3 (a3), .b0 (b0), .b1 (b1), .b2 (b2), .b3 (b3), .s0 (s0), .s1 (s1), .s2 (s2), .s3 (s3), .cout(cout) ); initial begin errors = 0; // Sweep all 2^9 = 512 input combinations for (int c = 0; c < 2; c++) begin cin = c[0]; for (int a = 0; a < 16; a++) begin {a3, a2, a1, a0} = 4'(a); for (int b = 0; b < 16; b++) begin {b3, b2, b1, b0} = 4'(b); #1; // allow combinational outputs to settle expected = 5'(a + b + c); actual = {cout, s3, s2, s1, s0}; if (actual !== expected) begin $display("FAIL: a=%0d b=%0d cin=%0d expected=%0d got=%0d", a, b, c, expected, actual); errors++; end end end end if (errors == 0) begin $display("All 512 tests PASSED."); $finish; // exit 0 — success end else begin $display("%0d test(s) FAILED.", errors); $fatal(1); // exit 1 — failure end end endmodule Lines 16-22 instantiate the DUT (Device Under Test), the post-implementation netlist module ``top``, using named port connections. Line 36 computes the expected 5-bit result using integer arithmetic, and line 37 assembles the actual 5-bit result from the individual DUT output bits. Lines 39-43 use ``$display`` (not ``$error``) to report each mismatch. ``$display`` is non-fatal, so every one of the 512 vectors is checked and all failures are reported before the simulation exits. Lines 50-56 select the final exit path: ``$finish`` exits with code 0 (success) and ``$fatal(1)`` exits with code 1 (failure), making the simulation suitable for use as a CI test. Simulating with Verilator ~~~~~~~~~~~~~~~~~~~~~~~~~ Compile the testbench, post-implementation netlist, and VTR primitives together. Verilator's ``--binary`` flag produces a standalone simulation executable in a single step: .. code-block:: console $ verilator --binary -sv \ $ tb.sv \ $ temp/top_post_synthesis.v \ $ $VTR_ROOT/vtr_flow/primitives.v \ $ --top-module tb \ $ --Mdir sim_build \ $ -j $(nproc) This compiles ``tb.sv``, the post-implementation netlist, and the VTR primitives into a standalone simulation binary under ``sim_build/``. The ``-j $(nproc)`` flag parallelises the build across all available CPU cores. This generates the simulation binary ``sim_build/Vtb``. Run it to execute all 512 test vectors: .. code-block:: console $ ./sim_build/Vtb All 512 tests PASSED. .. note:: ``primitives.v`` uses Verilog ``specify`` blocks to model timing. Verilator ignores these for functional simulation, so no timing information is back-annotated, which is the desired behaviour here. If any test vector fails, the simulation prints each failure, reports the total count, and exits with a non-zero return code: .. code-block:: console FAIL: a=3 b=5 cin=0 expected=8 got=7 1 test(s) FAILED. %Fatal: tb.sv:53: $fatal called Such an error would indicate a bug in VPR's implementation of the circuit (assuming the input Verilog design and the testbench are coded correctly/match), which can then be investigated by examining the post-implementation netlist or running the full :ref:`timing simulation ` with waveform capture.