Talk

Redundancy does not imply fault tolerance: Analysis of distributed storage reactions to single errors and corruptions

  • In English

We analyze how modern distributed storage systems behave in the presence of storage faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to storage fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from storage faults: a single storage fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. Our results have implications for the design of next generation fault-tolerant distributed and cloud storage systems.

  • #durability
  • #fault-tolerance
  • #redundancy
  • #reliability
  • #research

Speakers

Invited experts

Talks