Report on WarpKit Perfomance Study and Improvement.

Z. Xiao and B. Unger. Computer Science Department Technical Report: 98/628/19, May 1995.

Abstract: This is a report on the earlier development of WarpKit, a parallel simulation kernel based on shared-memory multi-processor archietectute, as part of the Telesim project. The development is aimed at exploiting shared memory multi-processor paradigm and developing a Parallel Discrete Event Simulation package which is based on shared memory multi-processors and capable of delivering high performance.Three major problems that have great impact on the performance of Time Warp systems are: excessive cost incurred by rollback computation resulting from sole reliance on rollback as a basic synchronization mechanism in a distributed/parallel processing system, large amount of memory space required to run applications, and high system overheads in inter-process communication and global control (e.g. GVT computation and memory management). Shared memory multi-processor architecture provides the potential of delivering much higher performance for Time Warp systems than can be achieved in distributed environment. New approaches could be conceived to address these problems and to realize the potential.

This report covers the results of our effort to improve WarpKit Kernel performance. Incremental State Saving has been implemented on top of existing Kernel which reduces both the time and space spent on state saving, a necessity of Time Warp. Purely asynchronous schemes have been developed and implemented for the global control mechanism. As a result, the system system overhead on global control has been reduced significantly. The new global control mechanism also makes the system overheads insensitivity to to the number of processors as opposed to the distributed situation where system overhead experiences a sharp increase with the number of processors. A global scheduling and load balancing mechanism is expected to restrict the number of rollbacks to a low percentage over net events to be processed by the Kernel. With these new mechanisms in place, one may expect close to linear speedup curve for parallel discrete event simulation on shared memory multi-processors.