António Casimiro

Experiences with Fault-Injection in a Byzantine Fault-Tolerant Protocol

Rolando Martins, Rajeev Gandhi, Priya Narasimhan, Soila Pertet, António Casimiro, Diego Kreutz, Paulo Veríssimo

Proceedings of the 14th ACM/IFIP/USENIX International Middleware Conference, Beijing, China, December 2013


Abstract

The performance improvement in Byzantine fault-tolerant state machine replication algorithms has made them a viable option for critical high-performance systems. However, the construction of the proofs necessary to support these algorithms are complex and often make assumptions that may or may not be true in a particular implementation. Furthermore, the transition from theory to practice is dicult and can lead to the introduction of subtle bugs that may break the assumptions that support these algorithms. To address these issues we have developed Hermes, a fault-injector framework that provides an infrastructure for injecting faults in a Byzantine fault-tolerant state machine. Our main goal with Hermes is to help practitioners in the complex process of debugging their implementations of these algorithms, and at the same time increase the con dence of possible adopters, e.g., systems researchers, industry, by allowing them to test the implementations. In this paper, we discuss our experiences with Hermes to inject faults in BFT-SMaRt, a high-performance Byzantine fault-tolerant state machine replication library.

BibTeX

@incollection{Martins:13a,
  author       = {Martins, Rolando and Gandhi, Rajeev and Narasimhan, Priya and Pertet, Soila and 
                  Casimiro, Ant\'{o}nio and Kreutz, Diego and Ver\´{\i}ssimo}, Paulo},
  title        = {Experiences with Fault-Injection in a Byzantine Fault-Tolerant Protocol},
  booktitle    = {Middleware 2013},
  volume       = {8275},
  series       = {Lecture Notes in Computer Science},
  editor       = {Eyers, David and Schwan, Karsten},
  year         = {2013},
  month        = dec,
  isbn         = {978-3-642-45064-8},
  address      = {Beijing, China},
  pages        = {41--61},
  url          = {http://dx.doi.org/10.1007/978-3-642-45065-5_3},
  doi          = {10.1007/978-3-642-45065-5_3},
  publisher    = {Springer Berlin Heidelberg},
  abstractURL  = {http://www.di.fc.ul.pt/~casim/papers/middleware13/middleware13.html},
  documentURL  = {http://www.di.fc.ul.pt/~casim/papers/middleware13/middleware13.pdf},
  keywords     = {Byzantine fault-injector; failure diagnosis; cloud-computing; Byzantine fault-tolerance; intrusion-tolerance},
}

Paper

Download paper