December 11, 2009

NoSQL Q and A

Neal Leavitt is writing an article for IEEE on NoSQL. So he’s circulated a long list of questions, encouraging people to answer as many or few as they choose. Unfortunately, most of the questions are technically meaningless, in that they implicitly rely on the false assumption that there is such a thing as a single or at least reasonably well-defined NoSQL technology. (I imagine most of his questions are really about key-value stores.) Nonetheless, I took a crack at a number of them before getting bored. Anybody else want to pitch in too? Questions are in italics, my answers are in plain text, and I apologize for formatting weirdnesses like some small font sizes.

  1. INTRODUCTION

1. Why is it called a “NoSQL database” and is this a relatively new type of database?

Actually, NoSQL is a flippant term for a collection of technologies, some of which are DBMS (DataBase Management Systems) and some of which aren’t. All they have in common is:

Technology like that has existed since the 1960s (consider, for example, IBM’s VSAM file system).

2. Provide a brief basic description of what NoSQL databases are, how they work and how they differ from relational databases.

That’s an unanswerable question, due to its false premises. See #1 above.

3. Are all NoSQL databases open source and if so, why?

I suppose that would depend on who’s using the term. But it is certainly not the case that all closed-source DBMS rely on SQL.

4. Read that ‘NoSQL is a database movement that began in early to mid-2009 which promotes non-relational data stores that do not need a fixed schema’ and that usually avoids ‘join’ operations – what is meant by ‘fixed schema’ and ‘join’ operations here, and per latter why is avoiding them a big deal?

Avoiding joins is a big deal because a lot of programmers didn’t learn SQL in school. Also, joins can be computationally expensive. I wrote about some of the problems with fixed schemas here and, specifically in connection with NoSQL, here.

5. What types are data are NoSQL databases good for?

I suppose that would depend on what particular NoSQL products you’re referring to. That said, I’ve written quite a bit about use cases for non-relational database management systems and/or HDFS (Hadoop Distributed File System).

6. Read that the movement’s chief champions are web and Java developers – why these in particular?

It’s tough to think of a lot of programmers who don’t fit into one or both of those categories.

II. IN THE BEGINNING

7. Provide a brief statement on when relational databases were first developed and by whom, when they were first introduced and by which organization.

Here is some DBMS history. You can find more here. In general, the first RDBMS projects were at IBM, with “System R” being a name to search on. The other early research project was INGRES; INGRES spawned a whole lot of products. Oracle, influenced by both, was also very early.

8. Explains what relational databases are, how they work, what they do.

9. Explain why relational databases work only with structured data and what exactly structured data is and provide a few brief examples.

10. What problems – such as speed — do organizations experience using relational databases because the databases work only with structured data? Am I correct in assuming that this has been the driving force behind NoSQL databases’ development/adoption?

If the data is structured in tabular format, then the main reasons not to use an RDBMS are performance. software license fees, and/or a reluctance to write SQL. If the data is structured in some other format, SQL-oriented DBMS may be a bad fit in general. “Unstructured data” is an oxymoronic phrase that shouldn’t be taken too literally.

11. Read that relational database strictures often make it hard to create big databases that use the cycles of a room full of machines. What specific strictures are we talking about and why do they make it hard to do this. Why would you want to even do this?

Whoever said that was probably informed only about certain brands of relational database management system.

12. Why are relational databases slow and expensive?

Sometimes, when that is the case, it is because enterprises use the wrong brand of relational DBMS. Or the problem may truly not be suited to a relational DBMS. Also, DBMS vendors like to get revenue, which causes a certain expense in and of itself.

13. Read that SQL is an “awkward fit for procedural code and almost all code is procedural.” Why is this the case and what’s it used for?

Assuming the “it” is SQL – well, SQL is a language that, in principle, operates on sets of data rather than individual records, to write or retrieve them. Procedural code, by way of contrast, most naturally operates on a single field or record at a time.

14. Read that for data upon which users expect to do heavy, repeated manipulations, cost of mapping data into SQL is well worth paying – why is this so?

15. But if database structure is very simple, is this still the case with SQL?

16. Read that relational databases offer a big feature set and data integrity, but NoSQL proponents say the features can be much more than they need. What features, why is it more than they need and what problems – if any – does this cause?

III. A LOOK INSIDE

A. History

17. Provide a brief history of the NoSQL database.  Which organizations and researchers developed it and when? When was the first NoSQL database released for public use and when was the first NoSQL database released for commercial use?

The premises behind this question are false.

B. The Technology

18. How are NoSQL databases able to handle structured data?

I suppose that would depend on which one you’re asking about.

19. What do they do that enables them to handle unstructured data, when relational bases can’t?

I suppose that would depend on which one you’re asking about. In many cases, it wouldn’t be true.

20. Why can NoSQL databases process data faster than relational databases? Need to explain clearly NoSQL databases’ advantages and the uses for which they are superior.

I suppose that would depend on which one you’re asking about. In many cases, it wouldn’t be true.

21. Read that relational databases have controls that NoSQL databases don’t. What types of controls and what do they accomplish?

Referential integrity” is a great example. Read up on that.

22. How can they function without these controls and why does the lack of controls help NoSQL databases process data faster?

If you do less, you require fewer compute cycles. That’s a fundamental premise behind key-value stores.

23. Read this but don’t know what it means: “NoSQL systems often provide weak inconsistency guarantees such as eventual consistency and transactions restricted to single data items, even though one can impose full ACID guarantees by adding a supplementary middle layer.”  Please provide a brief analysis here.

Inconsistency” may well be a typo for “consistency”.

24. Read that many NoSQL databases promote highly distributed, scalable data storage techniques – what techniques and why is this important?

24. What do NoSQL databases run on – clusters of inexpensive servers?  What do SQL databases run on?

Many SQL DBMS run on clusters of inexpensive servers. Many others run on clusters of expensive servers, or single inexpensive servers, or single expensive servers.

25. Also read this but don’t understand it – please clarify: “NoSQL databases can be easily and cheaply expanded without the complexity and cost of ‘sharding’- which involves cutting up databases into multiple tables to run on large clusters or grids.  By sidestepping the time-consuming toil of translating web or Java apps and data into a SQL-friendly format, NoSQL architectures perform faster.”  Why must SQL databases do the translation and why can NoSQL databases avoid it?

C.  Implementation

26. I want to mention various NoSQL databases. Just need a sentence or two on each but for each database, IEEE Computer which organization developed it, how it is used, and how it clearly differs from the others.

IV.CONCERNS AND DOUBTS

27. What are the technical and marketplace challenges that NoSQL databases face?  Some possible factors:

28. Do NoSQL databases come with any support?  If not, because they’re open source, does this turn off potential users?

V.  FUTURE

29. How will the technology change during the next five years?

30. How will it be used?

31. How will it do in the marketplace?

CONCLUDING SECTION

Provide concluding remarks, whether optimistic or cautionary

Comments

10 Responses to “NoSQL Q and A”

  1. Fresh From Twitter | Eddie Awad's Blog on December 12th, 2009 12:14 am

    [...] Q and A http://bit.ly/84jBMI Wondering what the unprecedented change is RT @oracletechnet: In 2010 OTN will enter a phase of [...]

  2. The legit part of the NoSQL idea | DBMS2 -- DataBase Management System Services on December 12th, 2009 2:07 am

    [...] written some snarky things about the “NoSQL” concept – or at least the moniker. (Carl Olofson’s term “non-schematic [...]

  3. Ivan Novick on December 12th, 2009 2:02 pm

    Whenever I here an advertisement that specifically mentions a competitor I always think if they went to all the trouble to mention the competitor, the competitor must be better. NoSQL says nothing about a specific product. It focuses on SQL which clearly must have a lot to offer to a lot of people, otherwise the term NoSQL would not even exist.

    Cheers,
    Ivan

  4. Nati Shalom on December 12th, 2009 4:15 pm

    Recent research about disk failure that was published in the past years also indicate that school of thought behind the design of many of the existing databases is broken. Instead of relying that failure could be prevented through expesive hardware setup, NOSQL alternatives where built under the assumption that failure are inevitable and its better to cope with it then trying to prevent it from happening.

    You can seem more details about it on my recent post: Why Existing Databases (RAC) are So Breakable! http://natishalom.typepad.com/nati_shaloms_blog/2009/11/why-existing-databases-rac-are-so-breakable.html

  5. Fresh From Twitter | Eddie Awad's Blog on December 12th, 2009 11:59 pm

    [...] cheat sheets for some of the most widely used tools on the web http://ff.im/-cNt4b NoSQL Q and A http://bit.ly/84jBMI Wondering what the unprecedented change is RT @oracletechnet: In 2010 OTN will enter a phase of [...]

  6. Doug Little on December 14th, 2009 10:21 am

    Looks like someone is working on a book and looking for arguements. The questions don’t make much sense, is this a bash against relational or something about Non-relational. If it’s non-relational there are plenty of examples if people look at history. IBM’s TPF, IMS, Adabas, the whole 4gl movement (sas, spss, natural, focus, model 204) and the newer olap databases (tm1, essbase, MS SSAS, etc). SQL and Relational db’s are solving data management problems.

    My view on the current hadoop-map/reduce (yes, we’re making investments), is about a bunch of junior java programmers discovering the need to query data and enthralled with their approach.

    yes, cheap disk and hardware can be used, but businesses don’t really like to trust their data to gateway pc’s. I can barely get engineering to buy SATA drives instead of SAS. So most of the performance limitations are policy and license issues not technology.

    Still I did see a demo of splunk and it was interesting. My question was. cool you can query 9 billion facts, but what are you going to do with 490m results. And within the relational world, we can do that without writing java code.

    bunch of noisemakers. let em get burned, I say.

  7. Curt Monash on December 14th, 2009 10:43 am

    Doug,

    I said in my first sentence what Neal is working on. Or, if you prefer, what he’s trying to get people to do his work for him on.

  8. Mike on December 15th, 2009 4:44 pm

    Wow, you have far more patience than myself. I would have bailed out by number 7. Is this for real?

  9. Curt Monash on December 15th, 2009 4:52 pm

    Mike,

    April Fool’s Day is still many months in the future.

  10. Fresh From Twitter | Eddie Awad's Blog on December 16th, 2009 10:28 pm

    [...] cheat sheets for some of the most widely used tools on the web http://ff.im/-cNt4b NoSQL Q and A http://bit.ly/84jBMI Powered by Fresh [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.