This is work in progress..
While it may probably not be exhaustive, my intention is to provide a quick reference to BASE systems (Basically Available, Soft State, Eventually consistent, as opposed to ACID: Atomicity, Consistency, Isolation, Durability) that would offer newcomers an overview of the existing projects in the field.
So far, I've been looking for filling in information about the following characteristics:
- Data model
- Rebalancing (elasticity)
- Replication (clustering)
I have also included notes about the implementation language and the protocols that can be used with each solution.
If you think I should include other criteria please do let me know.
The projects included so far in the list: Cassandra, CloudBase, CouchDB, Dynomite, HBase, Hypertable, Kai, LightCloud, LucidDB, Memcached, MemcacheDB, MonetDB, MongoDB, Neptune, Redis, Ringo, Scalaris, ThruDB, Tokyo Cabinet + Tyrant, Voldermort.
Alternative Data Storages
|Cassandra||Column-family (BigTable, Dynamo6)||Y[n4]||disk||Y||Y|
|Hypertable||Column-family (BigTable)||Y||DFS (HDFS)||?||Y|
|LightCloud||check Tokyo Tyrant[n5]|
|Tokyo Cabinet + Tyrant|
|Voldemort||Structured / Blob / Text||Y||pluggable||N||Y|
- [n1] Memcached: a distributed memory object caching system
- [n2] CouchDB partitioning and replication: according to a 2009 Summer of code proposal:
While distributed deployments have been achieved with the help of proxies and smart external scripting, the core of CouchDB itself does not currently support distributing the database across multiple machines. More references about CouchDB cluster:
- [n3] All other criteria for CloudBase have been deduced based on the HDFS/Hadoop capabilities
- [n4] Cassandra: Consistent hashing vs order-preserving partitioning in distributed databases
- [n5] LightCloud seems to be a set of management scripts (Python) for Tokyo Tyrant
|Cassandra||Java||Thrift||, , |
|CouchDB||Erlang||HTTP + JSON||, , |
|Hypertable||C++||C++ API, Thrift|
|LightCloud||Python + Tokyo Tyrant||Python|
|MemcacheDB||C||all* (memcached protocol)|
|MongoDB||C++||API (Python, Java, Ruby, PHP, C++, Perl, Erlang)|
|Tokyo Cabinet + Tyrant||C||C, Perl, Ruby, Java, Lua|
I usually do not trust micro-benchmarks. I know that performance measuring is an art. But I also know that some are looking for this sort of data and sometimes even the smallest piece of information is more helpful than nothing.
|LightCloud||See: Tokyo Tyrant results + this|
|Memcached||here, 2007, here|
|Tokyo Cabinet + Tyrant|
I have found a couple of other projects, but I couldn't decide if they fit in or not. In case you consider that I should include them please do let me know (a helpful argument is also highly appreciated)
I'd like to also mention the FriendFeed usage of MySQL, which while not being a new system in itself it was conceived to behave like a BASE .