Libra: A library for reliable distributed applications
Authors
School of Computer Science and Engineering
UNSW,
Sydney 2052, Australia
Abstract
This paper describes libra, a library to support efficient reliable distributed applications. Libra is designed to meet two objectives: to simplify the development of reliable distributed applications, and to achieve fault-tolerance at low run-time cost. The first objective is met by the provision of fault-tolerance transparency and a simple, easy to use high-level message passing interface. Fault-tolerance is provided to applications transparently by libra and is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level network communication protocol. The second objective is met by the use of protocols which minimise communication overhead for taking a consistent distributed checkpoint and catching messages in transit, and impose low overhead in terms of running times. The paper presents measurements backing up these claims.
BibTeX Entry
@inproceedings{Ouyang_Heiser_96, author = {Jinsong Ouyang and Gernot Heiser}, month = aug, year = {1996}, title = {Libra: A Library for Reliable Distributed Applications}, address = {Sunnyvale, CA, USA}, pages = {801--810}, booktitle = {International Conference on Parallel and Distributed Processing Techniques and Applications}, paperurl = {https://ts.data61.csiro.au/publications/papers/Ouyang_Heiser_96.ps.gz} }