Quick Reference to Alternative data storages

| | bookmark | email | 17 comments
Make sure you check myNoSQL a NoSQL blog featuring the best daily NoSQL news, articles and links covering all major NoSQL projects and following closely all things related to NoSQL ecosystem. Everything you need and want to know about NoSQL.

Collaborative effort: Please help me fill in the gaps in the tables below by providing missing data, references to interesting articles, metrics, etc.. Please feel free to suggest new criteria to be included.

This is work in progress..

While it may probably not be exhaustive, my intention is to provide a quick reference to BASE systems (Basically Available, Soft State, Eventually consistent, as opposed to ACID: Atomicity, Consistency, Isolation, Durability) that would offer newcomers an overview of the existing projects in the field.

So far, I've been looking for filling in information about the following characteristics:

  • Data model
  • Partitioning
  • Persistence
  • Rebalancing (elasticity)
  • Replication (clustering)

I have also included notes about the implementation language and the protocols that can be used with each solution.

If you think I should include other criteria please do let me know.

The projects included so far in the list: Cassandra, CloudBase, CouchDB, Dynomite, HBase, Hypertable, Kai, LightCloud, LucidDB, Memcached, MemcacheDB, MonetDB, MongoDB, Neptune, Redis, Ringo, Scalaris, ThruDB, Tokyo Cabinet + Tyrant, Voldermort.

Alternative Data Storages

Project Data model Partitioning Persistence Rebalancing Replication
Cassandra Column-family (BigTable[5], Dynamo6) Y[n4] disk Y Y
CloudBase HDFS/Hadoop[n3] Y disk Y Y
CouchDB Doc-oriented ?[n2] disk ?[n2] ?[n2]
Dynomite Blob (Dynamo6) Y pluggable Y Y
HBase Column-family (BigTable[5]) Y disk Y Y
Hypertable Column-family (BigTable[5]) Y DFS (HDFS) ? Y
Kai Blob ? disk ? ?
LightCloud check Tokyo Tyrant[n5]
LucidDB Column-based ? disk ? N
Memcached[n1] Blob Y RAM Y N
MemcacheDB Blob ? BerkleyDB ? Y
MongoDB Doc-oriented Y Y
Ringo Blob Y disk Y Y
Scalaris Blob Y RAM Y
ThruDB Doc-oriented
Tokyo Cabinet + Tyrant
Voldemort Structured / Blob / Text Y pluggable N Y


Implementation details

Project Impl. Client protocol Refs
Cassandra Java Thrift[4] [1], [2], [3]
CloudBase Java JDBC (Java)
CouchDB Erlang HTTP + JSON [1], [2], [3]
Dynomite Erlang Thrift[4] [1], [3]
HBase Java
Hypertable C++ C++ API, Thrift[4]
Kai Erlang
LightCloud Python + Tokyo Tyrant Python
LucidDB Java/C++ JDBC (Java)
Memcached C all*
MemcacheDB C all* (memcached protocol)
MonetDB C
MongoDB C++ API (Python, Java, Ruby, PHP, C++, Perl, Erlang)
Neptune Java
Redis C
Ringo Erlang HTTP
Scalaris Erlang
ThruDB C
Tokyo Cabinet + Tyrant C C, Perl, Ruby, Java, Lua
Voldemort Java Java


I usually do not trust micro-benchmarks. I know that performance measuring is an art. But I also know that some are looking for this sort of data and sometimes even the smallest piece of information is more helpful than nothing.

Project reads/s writes/s refs
LightCloud See: Tokyo Tyrant results + this
Memcached here, 2007, here
MemcacheDB benchmark data
MongoDB Performance testing
Tokyo Cabinet + Tyrant

Other projects

I have found a couple of other projects, but I couldn't decide if they fit in or not. In case you consider that I should include them please do let me know (a helpful argument is also highly appreciated)

I'd like to also mention the FriendFeed usage of MySQL, which while not being a new system in itself it was conceived to behave like a BASE .