Gate Level Simulation: A Comprehensive View

Your manager has decided that post-layout netlist verification using gate level simulation (GLS) will be a gating task on your chip design project, and has assigned you to accomplish it. Where do you begin?  Let's take a look at the basic requirements for getting a GLS test bench up and running and closing timing verification in time for tape out.


What is GLS?

The term "gate level" refers to the netlist view of a circuit, usually produced by logic synthesis. So while RTL simulation is pre-synthesis, GLS is post-synthesis.  The netlist view is a complete connection list consisting of gates and IP models with full functional and timing behavior.  RTL simulation is a zero delay environment and events generally occur on the active clock edge.  GLS can be zero delay also, but is more often used in unit delay or full timing mode.  Events may be triggered by the clock, but will propagate according to the delays on each element.  The loading and wiring delay models of the netlist can be estimated by the synthesis tools, or can be output from the layout tools.  These delay models usually come in the form of an SDF (standard delay format) file.


Resource requirements

Modern logic simulators are event based.  This means that the simulator engine only updates the state of the DUT when an event occurs, such as a clock edge or input toggle.  In RTL simulation, this is generally once per clock cycle and RTL simulations are relatively fast.

In GLS, because of the greater complexity of each element, there are many more events to calculate, and even in zero or unit delay modes, simulations will take much longer to run than RTL. When you add actual timing delays from the layout, the number of events can grow exponentially.  A single clock edge driving a dozen flip-flops can spawn hundreds or thousands of events all at different time points.  The same test that completed RTL simulation in 5 minutes can easily take 3 hours or more to run in GLS with full timing.  Memory requirements will also be considerably larger.  If the RTL job consumed 5GB of RAM, expect the equivalent GLS job to consume at least 40GB.

You need to plan your compute resources to accommodate these longer run times. Reserve a few servers in the compute farm to be dedicated to GLS jobs.  Otherwise, you can have a situation where too many jobs get submitted to a server and not enough RAM is available for the GLS job to finish, causing it to die after several hours or days of runtime.

What about manpower needs? This greatly depends on the size of the chip, the maturity of the GLS environment and process, and the experience of the engineers.  If this is your group's first GLS effort, you need at least one person dedicated to getting the environment up and running and defining the process.  Once this is stable, and before you get your first layout results, you need at least one or two additional engineers to help run and debug the tests.  Three or four people can handle GLS for a reasonably large design with 30-40 test cases if the process and environment is mature.  Your requirements may vary, but I recommend a minimum of two dedicated engineers for any GLS effort.


Modifying your simulation wrapper script

Your simulation environment for GLS is going to have some non-trivial differences from your RTL environment. You'll probably need to call different memory and IP models and possibly a different version of the cell library.  You'll need to pass GLS specific plusargs and defines to the testbench.  You need a way to annotate the timing delays, and choose which process corner to simulate.  If you have dedicated servers or a queue for GLS jobs, you need a switch for that.  All of this needs to be added into your simulation wrapper.  Plan on spending a lot of time up front to get this working.  Coordinate closely with those running RTL simulations to determine the best way to accommodate everyone.


Preparing the test bench

You will need to make major changes to any pre-existing RTL testbench in order to run GLS. For example, GLS is very sensitive to unknowns (X's) and initial conditions and you probably will need to change your initialization sequences just to get the chip to initialize properly.  Any and all hierarchical references in the design will have to have an RTL version and a GLS version.  To make this work, you're going to have to add plusargs, and probably `ifdefs for GLS specific constructs in your testbench.


Cleaning The Flow

How do you know if your environment is ready for GLS? You need a basic test case.  It should be the shortest possible test that fully initializes the chip and then performs one verifiable action.  For example: come out of reset, boot the processor, and have the processor write a value to a register or memory and then read it back with a pass/fail check.  This is your benchmark, your sanity test.  Run it every time you add a feature, make a change to the environment or get a new netlist or delay file, because if it fails nothing else is going to work either.

For post-layout netlists with full timing annotation, the log files should be examined for any unexplained or un-waived SDF annotation errors or warnings. Often there are lots of SDF warnings that are trivial due to unused ports on IP models for example.  Make sure they have all been reviewed and that your timing models are not compromised.


Creating A Testplan

Your RTL test list may have hundreds or thousands of test cases, especially if you're using randomization (and who isn't anymore?). You cannot run all of these in GLS.  So what do you need to run?  Randomization should not be a priority in GLS.  More precisely, you can and should have random features in your GLS tests, but you won't be running 100 random variations of the same test.  It's just not practical.  Always record the random seed value in the log and have a way to set it on the simulation command line so you can reproduce any failure, but otherwise, ignore the endless random variations.

The focus of GLS should be to verify high priority and high risk features, and modes or sequences that are not easily verified with other methods.  Add to that some tests running the chip in its primary functional mode and some tests to check out any debug features.  Here is a list of the types of things your tests should cover:

  • reset sequences
  • parameter loading from eFuse
  • PLL locking procedures
  • processor boot up
  • exercise each external interface on the chip
  • exercise all major functional modules
  • check various clock ratios
  • verify memory map and register access
  • check DFT test modes and any key debug sequences needed for prototype bringup

The firmware team may also have some requests for testing production ROM code. And there should be at least one fully functional end to end sequence that exercises most of the modules on the chip.

Choose the fewest number of tests that accomplishes the above. Then look carefully at the longest test cases and see if you can split them into shorter tests.  I'd rather have 20 tests that run for 2 days each, than 10 that run for 4 days assuming I have sufficient compute resources to run them in parallel.

And just to state the obvious, all tests chosen for GLS should be stable and pass RTL regression.


Running Regressions

Your test list is likely to include several tests that run for more than a day, and maybe even a week. A production ROM test will run much longer.  Even the shortest tests will require several hours to complete.  So a nightly regression is not practical, nor should it be necessary.  RTL changes can and do occur daily until design freeze, so nightly regressions are necessary for RTL verification.  But for GLS, the netlist is only changing with updated layouts which will occur much less often.  Still, it's very useful to have a well defined regression process for ease of execution, and to allow yourself options to run just the shorter tests, or just the longer tests, for example.  You will be making test bench updates from time to time that should be tested thoroughly before the next netlist release.  Use a regression script or tool to do this for efficiency and consistency.

You should consider regression options with backdoor shortcuts to cut the simulation time by speeding up certain lengthy phases of the chip startup such as DDR training, PCIe linkup or CPU memory initialization. This allows you to more quickly debug the high priority targets of your verification without spending limited resources and time on stable IP and low risk functionality.  Of course, a full regression with no backdoor or shortcut options should still be run at least once on each major netlist release and for the final signoff.


Debugging Failures

In the early stages of GLS, your test failures are going to be mainly due to initialization problems. The first thing to do is to make sure all inputs are known and stable at startup, then look at reset de-assertion, which is usually synchronized with the clock.  A circular state dependency or a timing problem may cause X-propagation to prevent proper chip initialization.  Make sure the proper reset sequence is observed and there are no timing violations or race conditions.  GLS is much less forgiving of initialization issues than RTL.

If your chip has an embedded processor, and most do these days, make sure your memories are properly initialized. A processor that accesses a memory location containing X's is going to lock up in an indeterminate state.  Does your memory have ECC enabled?  If you're backdoor loading the execution code or data for the CPU, make sure you have calculated and loaded the proper ECC value for each entry.  Or you can turn off the ECC checks until you have cleaned up other initialization issues.

In later phases of GLS, test failures are likely to be timing issues. A path that violates setup or hold time on a flip flop will create an X that can propagate through the circuit.  Tracing X's backwards to a source in the wave form viewer is straight forward, but tedious.  Usually, the source of the X can be found in the log file.  Search for the first timing violation warning message in the log file after the chip comes out of reset.  That will usually be a good place to start.  The violation may be a real issue that needs to be fixed in the layout, or it may be an asynchronous path that can be waived.  If you waive a violation, you must also turn off the timing check for that instance in order to prevent the X-propagation.  There are various methods to do this which are beyond the scope of this article.  Please refer to my other articles for a discussion of these and other advanced topics.


Signing Off For Tape Out

Your final netlist has been released, and all regressions have passed. Is there anything else you need to do before final sign off?  Yes!  First, make sure the final regressions were run without any unnecessary backdoor shortcuts or forces.  Unwarranted assumptions about chip behavior are the surest way to overlook a bug.

Second, you should conduct a final review of your tests to verify that you've covered all the items in your verification plan. Be sure to include the design team in this review.

Third, look carefully at all timing violations remaining in the simulation logs to make sure they are benign. Your GLS test plan cannot provide 100% functional coverage, so any timing violation may be a real problem, even if the test passes.  Again, include the design team in this review.



Running GLS is a resource heavy and time consuming process. It can take from 2 to 5 months or more depending on several factors.  It comes as the last verification step before tape out so it has the attention of the entire team and management.  Needless to say, there is considerable pressure to wrap it up quickly.  But with thorough planning, adequate resource allocation, and by setting proper expectations, you can make the process much less stressful and ultimately successful.

We welcome your comments and suggestions for additional topics for future articles.  For this or for more information about Certus consultants and services, or to discuss your project needs, email us at