This is work in progress..
While it may probably not be exhaustive, my intention is to provide a quick reference to BASE systems (Basically Available, Soft State, Eventually consistent, as opposed to ACID: Atomicity, Consistency, Isolation, Durability) that would offer newcomers an overview of the existing projects in the field.
So far, I've been looking for filling in information about the following characteristics:
- Data model
- Partitioning
- Persistence
- Rebalancing (elasticity)
- Replication (clustering)
I have also included notes about the implementation language and the protocols that can be used with each solution.
If you think I should include other criteria please do let me know.
The projects included so far in the list: Cassandra, CloudBase, CouchDB, Dynomite, HBase, Hypertable, Kai, LightCloud, LucidDB, Memcached, MemcacheDB, MonetDB, MongoDB, Neptune, Redis, Ringo, Scalaris, ThruDB, Tokyo Cabinet + Tyrant, Voldermort.
Alternative Data Storages
| Project | Data model | Partitioning | Persistence | Rebalancing | Replication |
|---|---|---|---|---|---|
| Cassandra | Column-family (BigTable[5], Dynamo6) | Y[n4] | disk | Y | Y |
| CloudBase | HDFS/Hadoop[n3] | Y | disk | Y | Y |
| CouchDB | Doc-oriented | ?[n2] | disk | ?[n2] | ?[n2] |
| Dynomite | Blob (Dynamo6) | Y | pluggable | Y | Y |
| HBase | Column-family (BigTable[5]) | Y | disk | Y | Y |
| Hypertable | Column-family (BigTable[5]) | Y | DFS (HDFS) | ? | Y |
| Kai | Blob | ? | disk | ? | ? |
| LightCloud | check Tokyo Tyrant[n5] | ||||
| LucidDB | Column-based | ? | disk | ? | N |
| Memcached[n1] | Blob | Y | RAM | Y | N |
| MemcacheDB | Blob | ? | BerkleyDB | ? | Y |
| MonetDB | |||||
| MongoDB | Doc-oriented | Y | Y | ||
| Neptune | |||||
| Redis | |||||
| Ringo | Blob | Y | disk | Y | Y |
| Scalaris | Blob | Y | RAM | Y | |
| ThruDB | Doc-oriented | ||||
| Tokyo Cabinet + Tyrant | |||||
| Voldemort | Structured / Blob / Text | Y | pluggable | N | Y |
Notes
- [n1] Memcached: a distributed memory object caching system
- [n2] CouchDB partitioning and replication: according to a 2009 Summer of code proposal:
While distributed deployments have been achieved with the help of proxies and smart external scripting, the core of CouchDB itself does not currently support distributing the database across multiple machines
. More references about CouchDB cluster: - [n3] All other criteria for CloudBase have been deduced based on the HDFS/Hadoop capabilities
- [n4] Cassandra: Consistent hashing vs order-preserving partitioning in distributed databases
- [n5] LightCloud seems to be a set of management scripts (Python) for Tokyo Tyrant
Implementation details
| Project | Impl. | Client protocol | Refs |
|---|---|---|---|
| Cassandra | Java | Thrift[4] | [1], [2], [3] |
| CloudBase | Java | JDBC (Java) | |
| CouchDB | Erlang | HTTP + JSON | [1], [2], [3] |
| Dynomite | Erlang | Thrift[4] | [1], [3] |
| HBase | Java | ||
| Hypertable | C++ | C++ API, Thrift[4] | |
| Kai | Erlang | ||
| LightCloud | Python + Tokyo Tyrant | Python | |
| LucidDB | Java/C++ | JDBC (Java) | |
| Memcached | C | all* | |
| MemcacheDB | C | all* (memcached protocol) | |
| MonetDB | C | ||
| MongoDB | C++ | API (Python, Java, Ruby, PHP, C++, Perl, Erlang) | |
| Neptune | Java | ||
| Redis | C | ||
| Ringo | Erlang | HTTP | |
| Scalaris | Erlang | ||
| ThruDB | C | ||
| Tokyo Cabinet + Tyrant | C | C, Perl, Ruby, Java, Lua | |
| Voldemort | Java | Java |
Performance
I usually do not trust micro-benchmarks. I know that performance measuring is an art. But I also know that some are looking for this sort of data and sometimes even the smallest piece of information is more helpful than nothing.
| Project | reads/s | writes/s | refs |
|---|---|---|---|
| Cassandra | |||
| CloudBase | |||
| CouchDB | |||
| Dynomite | |||
| HBase | |||
| Hypertable | |||
| Kai | |||
| LightCloud | See: Tokyo Tyrant results + this | ||
| LucidDB | |||
| Memcached | here, 2007, here | ||
| MemcacheDB | benchmark data | ||
| MonetDB | |||
| MongoDB | Performance testing | ||
| Neptune | |||
| Redis | |||
| Ringo | |||
| Scalaris | |||
| ThruDB | |||
| Tokyo Cabinet + Tyrant | |||
| Voldemort | |||
Other projects
I have found a couple of other projects, but I couldn't decide if they fit in or not. In case you consider that I should include them please do let me know (a helpful argument is also highly appreciated)
I'd like to also mention the FriendFeed usage of MySQL, which while not being a new system in itself it was conceived to behave like a BASE .

