Friday, November 23, 2012

NOSQL and ACID : NoSQL == Not(only) SQL


Freedom for the tyranny of schemas!!!
As mentioned earlier, my main reason for venturing in to the data was the (preceived)democratization of the field because of the terms NoSQL. Which basically means that I need not have to learn a completely new language, technology as well as methodology to understand this. In a span of a week, I am already proving myself wrong again on this one: but I will explain a bit more in another post !! 

But still it makes sense to actually understand the things as well as the two parties before taking sides Smile.

When the noSQL proponents mention SQL, it is actually a strict RDBMS they are referring to. One of the things that these systems follow is ACID properties http://en.wikipedia.org/wiki/ACID. NOSQL thinks it is not necessary to make the data store follow all these properties. in fact adherence to these properties actually makes their solutions difficult to implement and maintain and scale. So they selectively choose some of the features of the data, depending on the properties that are important semantically for the data.

Having data completely separate from the implementation looks like a great idea from a high altitude. But this has just ended up in making database development as a completely different and an independent branch of software engineering. with the DB having its own tiers and the development team being split in two far early in the development cycle. This compartmentalization is definitely not acceptable where agility and cost are major considerations i.e. with the startups. in my not-so-educated opinion that is the reason for emergence of NOSQL.

It is definitely attractive if all the code looks almost similar for every thing instead of having a different partition and technology for maintaining the database. The ACID properties at least are used as and when they are actually necessary.

There are definite technical differences and advantages but in my opinion that is not the major reason for the emergence of nosql. The reason also has cultural and economy related shades.
This really means that there can not be a single solution that works for any context but there will be implementation dependent behavior of the data store. This really helps the edges and fringes where these things really matter because of multiple reasons. The usage where economy and speed matter most – the startups - is in my opinion one of the biggest validators of practicalities of this approach.
here are some of the reasons that I cam think of. (caveat: As I am still a novice, definitely I am either misinterpreting some of these or missing a lot more)
  1. Exceptionally large size of data where the concept of a normal relational data base break.
  2. Monochromatic behavior of data(remember I coined this word here Winking smile) : meaning the unique way data is generated, stored and its relevance calculated. These can be: data which comes only in append mode, data which comes from real-time streams,data where age changes the relevance etc
  3. Geographic location where data is stored as well as the speed and method the computing nodes use to reach it.

But the biggest take away for me from all this is really this:
NoSQL is a loosely coupled way of thinking about your data. There are techniques which you use along with the strictly relational ACID data. That’s why most of the high scale data startups use both the approaches, depending on need. Any further advocacy of one way over other is just fanboism Smile 
 

No comments: