Skip to main content

Fault tolerance through redundant execution on COTS multicores: Exploring trade-offs

Authors

Yanyan Shen, Gernot Heiser and Kevin Elphinstone

DATA61

UNSW Sydney

Abstract

High availability and integrity are paramount in systems deployed in life- and mission-critical scenarios. Such fault-tolerance can be achieved through redundant co-execution (RCoE) on replicated hardware, now cheaply available with multicore processors. RCoE replicates almost all software, including OS kernel, drivers, and applications, achieving a sphere of replication that covers everything except the minimal interfaces to non-replicated peripherals. We complement our original, loosely-coupled RCoE by a closely-coupled version that improves transparency of replication to application code, and investigate the functionality, performance, and vulnerability trade-offs.

BibTeX Entry

  @inproceedings{Shen_HE_19,
    author           = {Shen, YanYan and Heiser, Gernot and Elphinstone, Kevin},
    month            = jun,
    date             = {2019-6-24},
    numpages         = {13},
    keywords         = {{seL4}; microkernel; {SEU}; replication; fault tolerance},
    year             = {2019},
    address          = {Portland, Oregon, USA},
    title            = {Fault Tolerance Through Redundant Execution on {COTS} Multicores: Exploring Trade-offs},
    booktitle        = {International Conference on Dependable Systems and Networks (DSN)},
    publisher        = {IEEE}
  }

Download

Served by Apache on Linux on seL4.