A Schema-less Relational Database

| | bookmark | email | 13 comments
Make sure you check myNoSQL a NoSQL blog featuring the best daily NoSQL news, articles and links covering all major NoSQL projects and following closely all things related to NoSQL ecosystem. Everything you need and want to know about NoSQL.

A relational database management system (RDBMS) imposes a fixed schema, so why would I use schema-less and relational database in the same sentence?

I will not write an introduction about what relational database are and what are the differences between them and other storage alternatives and I'll jump directly to the subject: is it be possible to use a row-based storage in a schema-less mode? Or to make it more explicit: is it possible to store free format objects/data in a fixed format?

We have to agree that at the first glance this looks like the eternal hammer hypothesis: give me a hammer and everything will look like a nail. But in this case things are a lot simpler.

So, what is the solution?

Most of the languages support one form or another of object serialization in binary or pure text format. Java has serialization in binary format, Python has marshaling and (c)pickle. Ruby has marshaling too and the list can go on and on. Both Google with protobuf and Facebook with Thrift have proposed their own high performance binary serialization mechanisms. Besides all these existing mechanisms there is also an universal object marshaling protocol: JSON.

By now the solution should become quite obvious: instead of persisting the object by storing it field by field, we would just serialize the whole object with all its properties. Basically, the table will need just 2 fields: an ID and a field that can store our serialized object. Like FriendFeed is doing it (nb Jackrabbit -- the open source implementation of Java Content Repository spec -- was using this solution for quite a while). And if you think this is not the best idea, I'll just say that most (if not all) of the alternative storage solution are using similar storage solutions.

What do you lose?

Referential integrity

Referential integrity in the sense of foreign keys will be lost. You can maintain it at the application level though.

Indexes

Serialized object in either binary or text format are pretty useless in indexes. So, if you need to search based on specific object properties then you'll have to implement your own solution. For example, Jackrabbit is using Lucene to keep an external index, while FriendFeed is using the database by creating indexes on-request.

What do you win?

A solid storage engine

You don't have to recreate or implement yourself the storage mechanism. The solution will use one that is proved to be solid.

Drivers and Tools

Most of the existing RDBMS are providing additional administration and visualization tools. Not to mention their drivers tested by tons of applications.

Last, but not least, you'll keep any other feature offered by your relational database (f.e. clustering).

I'll not end the post without throwing out the real question:

When is a relational database schema-less approach appropriate?