Choosing a fault-tolerant database to support cloud infrastructure

A cloud operating system requires the ability to self-heal in the face of failures. To do so we need a highly resilient database which has some features of a distributed database, and sufficient replication to tolerate single node failures. Additional desirable features include simple setup, and a highly automated recovery process.

We have evaluated two "NoSQL" databases at Nimbula Director to determine the resiliency, performance and suitability for a cloud operating system. In this talk we will discuss Cassandra and MongoDB, two candidates that we selected after an initial cursory evaluation. The key focus areas of the talk discuss how we:

  • Migrated from a PostgreSQL based solution to these databases.

  • Evaluated the performance of these databases within a prototype.

  • Evaluated the resiliency of these databases within a prototype.

We will also talk about some of the issues we experienced in practice and how we have addressed those issues.