Skip to main content


Checkpointing and recovery for distributed shared memory applications


Jinsong Ouyang and Gernot Heiser

    School of Computer Science and Engineering
    Sydney 2052, Australia


This paper proposes an approach for adding fault tolerance, based on consistent checkpointing, to distributed shared memory applications. Two different mechanisms are presented to efficiently address the issue of message losses due to either site failures or unreliable non-FIFO channels. Both guarantee a correct and efficient recovery from a consistent distributed system state following a failure. A variant of the two-phase commit protocol is employed such that the communication overhead required to take a consistent checkpoint is the same as that of systems using a one-phase commit protocol, while our protocol utilises stable storage more efficiently. A consistent checkpoint is committed when the first phase of the protocol finishes.

BibTeX Entry

    author           = {Jinsong Ouyang and Gernot Heiser},
    title            = {Checkpointing and Recovery for Distributed Shared Memory Applications},
    month            = aug,
    year             = {1995},
    booktitle        = {Proceedings of the 4th IEEE International Workshop on Object Orientation in Operating Systems
    pages            = {191--9},
    address          = {Lund, Sweden}


Served by Apache on Linux on seL4.
Served by Apache on Linux on seL4.