Fault tolerance through redundant execution on COTS multicores: Exploring trade-offs


Yanyan Shen, Gernot Heiser and Kevin Elphinstone


UNSW Sydney


High availability and integrity are paramount in systems deployed in life- and mission-critical scenarios. Such fault-tolerance can be achieved through redundant co-execution (RCoE) on replicated hardware, now cheaply available with multicore processors. RCoE replicates almost all software, including OS kernel, drivers, and applications, achieving a sphere of replication that covers everything except the minimal interfaces to non-replicated peripherals. We complement our original, loosely-coupled RCoE by a closely-coupled version that improves transparency of replication to application code, and investigate the functionality, performance, and vulnerability trade-offs.

BibTeX Entry

