Optimizing Simulation Run Times With Checkpoints

Gate level simulation (GLS) tests with full timing (SDF annotation) take hours, days or even weeks to run. Even RTL simulations for such things as performance modeling, power estimation or production ROM testing can run for a long time.  If you need to dump waveforms as well, that can easily double or triple the run time.  So how do you minimize run times and still get the data needed for debug?


Save Simulation States

Each of the Big 3 logic simulators have a built in commands to save the state of a simulation for later resumption.  We'll call these saved states checkpoints.  Using a Tcl simulation script, you can save one or more checkpoints during the course of the simulation.  If a problem occurs after a checkpoint is recorded, the test can be restarted and resumed from the saved time point.  This has many uses:

  • Restart the simulation later to dump waveforms for a specific time window
  • Recover from system problems (host reboot, disk full, etc.)
  • Recover from command line errors like a timeout value set too short
  • Recover from a timing violation that causes X-propagation
  • Clone a currently running simulation and use restart to dump waveforms in the clone without interrupting the original simulation
  • Using a loop in your Tcl script, save checkpoints periodically during one or more simulations, as in a regression



For the Cadence Incisive simulator, here are the Tcl commands for saving a checkpoint after 100us:

run 100us

run -clean

save -simulation -overwrite worklib.save_100us

A restart might look something like this:

restart worklib.save_100us

run -absolute 120us

probe top.dut.module_A

run -absolute 150us




Your results may vary, but here's what it took to save a checkpoint on a GLS test for one of my storage controller ASICs with four embedded processors:

  • 5GB storage space
  • 10 to 12 minutes to save the checkpoint

Compare this to the size of a typical waveform dump file, or the impact on run time of saving the waveform file.  In most cases, the save and restart method will result in major efficiency increases.



This checkpoint methodology will not work with many System C models.  Memory models seem to be particularly problematic.  For example, a model of a large memory like DRAM may use a backing store approach where the model puts temporary data in a file in /tmp, then later deletes that file.  If you save a state while the file exists, the restart will also expect that file to exist.  If the file was deleted, or you restart on a different host, the file will not be found and the restart will trigger an Internal Exception error.

If you are using a data type "chandle", it will probably cause problems as well.  This is a C pointer directly into user memory.  DPI code dereferencing a chandle that may have changed since the last save can cause unexpected behavior.

Looping the save commands in your Tcl script can be dangerous if you have a "runaway" simulation that never finishes.  It would be easy to fill your disk and kill all your tests.  If you use a loop, be sure to put a timeout value on each simulation as a safeguard.

If a test is less than a couple of hours long, the checkpoint method may not be that beneficial. Depending on how many checkpoints are needed, the breakeven point is probably about 2-3 hours.


Additional Topics

Please see our other articles for more tips and tricks for conquering GLS methodology issues and other verification topics. We welcome your comments and suggestions for additional topics for future articles. For this or for more information about Certus consultants and services, or to discuss your project needs, email us at info@certuscg.com.