March 13, 2010

The Naming of the Foo

Let’s start from some reasonable premises.

*Sure, if you strain you can talk yourself into exceptions. But the point stands.

So we need a name for Foo, where Foo is what happens when lots of people want to get small amounts each of information in or out of a database at the same time. Thus, three major subcategories of more-or-less disk-based Foo are:

There may be some more purely memory-centric versions too, but let’s put those aside for the moment.

Absent a better idea, I can squeeze Foo into yet another four-letter acronym:

HVSP (High-Volume Simple Processing)

That’s as imperfect as any other category name, and an awkward mouthful to boot. So I’d love to hear a better one; if you have such, please share it! In the mean time, I think “HVSP” has merit because:

*Assuming, of course, that rows-and-tables are a good metaphor for your data structure in the first place.

Systems I’m leaving out of the HVSP and hence also NoSQL categories include:

But hey – what good is a categorization if it doesn’t leave some things out?

Comments

37 Responses to “The Naming of the Foo”

  1. Richard Tibbetts on March 13th, 2010 8:24 pm

    Good work here. I agree the category needs a positive name that says what it is rather than what it is not. However, in my experience a big part of the reluctance to define a category is because it restricts the solution space.

    In this case, how would you respond to systems like voltDB or hstore which try to move or remove the boundary between the data management tier and the apparently tier? If you express you business logic in language the DB understands (but not SQL) then the DB can do you a lot of favors.

    I think a lot of what is driving NoSQL is developers who learned to treat their DB as a dumb key-value store, and are now realizing that if all they want is key-value plus a bit, there are better options than MySQL. That’s a fine conclusion, but maybe they’re not solving the right problem.

  2. Curt Monash on March 14th, 2010 2:16 am

    In-memory DBMS that only use disk as an afterthought would probably be a fourth subcategory. I ducked that subject because I’m not confident I know the full range of emerging contenders out there.

    VoltDB is the one I’m most familiar with.

  3. Mark Callaghan on March 14th, 2010 12:28 pm

    Sharded MySQL implies no joins over your entire data set. But there are still joins done within the shard.

    There are dramatic differences between members of the NoSQL family. Some require sharding, others (HBase, Cassandra) do not. Most are crash safe but a few are not.

    MongoDB looks a lot like sharded MyISAM with better replication (not crash safe, tables support single-writer or multiple-readers). Is this a radical change?

    I wish there were a better way to describe systems. NoSQL versus SQL doesn’t do it. But a ‘geek code’ for the attributes of a transaction processing system might not catch on — sharded/unsharded, async/sync replication, strong/eventual consistency, …

  4. Unholyguy on March 14th, 2010 1:05 pm

    Massively Distributed Eventually Consistent Processing (MDECP)

    maybe?

    To me the key characteristics are the built in distributness and the eventual consistency model.

  5. Curt Monash on March 14th, 2010 4:58 pm

    Eventual consistency is only one of the options. That’s true even w/in NoSQL (I’m just working on that post now), let alone when we also include RDBMS possibilities.

  6. Toward a NoSQL taxonomy | DBMS2 -- DataBase Management System Services on March 14th, 2010 7:25 pm

    […] = HVSP (High Volume Simple Processing) without joins or explicit […]

  7. Neil Hepburn on March 14th, 2010 9:04 pm

    I think the real issue here in nailing down NoSQL, is that I wonder if over time these DBMSs will evolve the same or similar capabilities as a traditional RDBMS.
    Application developers tend to view the RDBMS as a “bit bucket” to persist their application data. The problem is, that developers don’t have the perspective to see the bigger picture. In particular, they don’t really concern themselves with reporting or data interoperability. Developers also don’t think in a declarative set-based way. At one of the previous places I worked, it took me 6 months to convince the developers to use an ETL tool. They were adament, and even coded the ETL in Java. It took a long time to develop and was eventually deemed a failure. It was only then that they decided to look at the ETL approach.
    I had the same problem with Hibernate (and ORM layer). They wanted to create all the data models in Hibernate. I allowed this since the applications were all one-offs and the data was never going to be shared outside of the app. But this approach would be problematic once new applications had to share data elements which were tailored to a legacy app.

    My big concern with the NoSQL approach is that developers will make a beeline for it, and it will become the defacto way of developing applications. Sure it’s great if you’re the next Facebook (fat chance), but for most applications, this means putting a tremendous amount of data management back into the developer.

    To whit, over the summer I developed a data entry application for a small pharmaceutical. They had hundreds of medical records, with a fairly complex schema that needed entering. However, I was able to develop this application far faster than any Java or C# developer since I let the RDBMS do all the work for me purely through modeling in the RDMBS (i.e. 3NF, integrity constraints, cascading deletes, etc.). The front-end MS Access forms were purely configured with not a single line of procedural code. Users could enter data, delete records, search by field, etc. etc. And, I was able to run all sorts of reports to monitor data quality, not to mention the extracts that would go to the statistician for analysis.

    I look at something like Cassandra, and think that that it’s a huge step backwards for all but the biggest web sites out there.

    It will be interesting to see what happens. I can see a lot of different scenarios playing out.

  8. Categorizing the “Foo” fighters – making sense of NoSQL — Too much information on March 15th, 2010 1:43 pm

    […] is interesting to see fellow analyst Curt Monash facing the same problem. As he notes, while there seems to be a common theme that “NoSQL is Foo without joins and transactions,” no […]

  9. Matt Corgan on March 15th, 2010 2:19 pm

    I think the most fundamental requirement for these new systems is that they can be partitioned across multiple nodes. As Neil said, after they get the partitioning ironed out, it will be interesting to see them continue to add the same features that make relational database popular.

    Maybe they are all trying to become a free version of Oracle RAC?

  10. Curt Monash on March 15th, 2010 2:27 pm

    Point of order:

    Partitioning data is trivially easy. What’s hard is getting the system behavior you want after you’ve partitioned it.

    GLENDOWER

    I can call spirits from the vasty deep.

    HOTSPUR

    Why, so can I, or so can any man;
    But will they come when you do call for them?

  11. Vlad Rodionov on March 15th, 2010 2:29 pm

    I would replace “Simple” in your definition with “Data”. “Simple” just reflects the current state of NoSQL technology. The key word here is “current”.

  12. Nuno Job on March 15th, 2010 2:29 pm

    MarkLogic Server is designed for scale. Sharding is the technique used for that – which is commonly considered the best approach in the NoSQL community. Of course there’s a lot more to it’s architecture, but it’s quite obvious after reading your post that you are not aware of any of this.

    So how can you call someone that got informed and wrote an article a “train-wreck” when they are only guilty of doing a better work in research than you did.

  13. Nuno Job on March 15th, 2010 2:30 pm

    @Vlad – but it’s not data anymore, it’s about information. That’s part of the change – it’s documents, it’s many things, but mostly it’s not about tables (at least not exclusively)

  14. Curt Monash on March 15th, 2010 2:36 pm

    @Vlad,

    If it’s not simple, then SQL or a substitute is harder to live without.

  15. Curt Monash on March 15th, 2010 2:41 pm

    From memory, when last I talked with Mark Logic they weren’t doing much in the way of high throughput, at least on the levels commonly associated with NoSQL. Mark Logic seemed more focused on doing complex things with decent performance than doing simple things with great performance.

    If that’s no longer true, Dave and the gang have done an uncharacteristically poor job of marketing MarkLogic’s new capabilities.

  16. Nuno Job on March 15th, 2010 3:39 pm

    @Curt I’m available to clarify, but I think these keywords will help you understand how MarkLogic Server works right now:

    sharding, high availability, strict consistency, mvcc, fragmentation, failover.

    A lot of information for one line – but probably for you it all makes sense quite easily. Take care

  17. Curt Monash on March 15th, 2010 4:49 pm

    @Nuno,

    Do you think any part of http://www.dbms2.com/2008/10/05/marklogic-architecture-deep-dive/ is wrong or out of date? If so, which?

    But upon rereading that — on the one hand, I’m wondering whether I was a little too quick to dismiss Dave’s claim. On the other, I still don’t know of Mark Logic pursuing the kinds of applications we’d normally associate w/ MySQL.

  18. Nuno Job on March 15th, 2010 5:39 pm

    Hi Curt,

    I don’t think the problem is on the analysis you made but the focus of the presentation you saw. It was not focused on scaling.

    The focus was on the search, the indexing, getting information out of the database really fast. I give you that integrated full text search is not something that is common in NoSQL, it normally requires integration with third party solutions like lucene and solr. But I just recently saw someone from MongoDB claiming that it is in their goals to get some further degree of control on the search. This is probably they recognize how much faster their searchs would be if they had an “universal index” in a sharding architecture. Like MarkLogic Server does 🙂

    I can tell you an example application in dead sure would run fantastic in MarkLogic Server: Twitter

    Some key differentiators would be:
    – integrated full text search
    – enrichment of the status with common things people query
    – geospatial queries in the same indexing space
    – sharding
    – collections
    – reverse queries to find un-anticipated relations.

    I think they would actually be able to do things they assume impossible and choose not to support.

  19. Dave Kellogg on March 15th, 2010 8:59 pm

    Train wreck was a little harsh to describe the IEEE article, Curt and I praised it to say it was a good article to hand the CIO — you know those folks who don’t spend all day worrying about database internals like some of us do.

    If you know of a better CIO-level NoSQL article, please share it’s URL. I don’t doubt that you could write one.

    And it would start with a good taxonomy of NoSQL systems. And it’s simply illogical to not include XQuery systems in an [un]category called NoSQL.

    Yes, NoSQL is about CAP, but not entirely. And I kind of like your HVSP but I think it will also be about analytics.

    Check out the Wikipedia “structured storage” page — perhaps a definition, as opposed to an un-definition — would solve the problem.

  20. Curt Monash on March 15th, 2010 9:30 pm

    Dave,

    I admit to some prejudice because of the horrible process that led up to the article. But:

    1. He claims that handling unstructured data better than relational systems do is central to NoSQL. Huh?

    2. He claims that it’s hard to join across nodes of an MPP RDBMS. Huh? It might be slow, but it’s not hard. Similar errors in that vein abound.

    3. He repeats the too-common columnar can’t be relational error.

    4. He calls constraints restraints, even though I corrected that error for him during the research process.

    5. He’s totally confused as to whether SQL is complicated or querying without SQL is complicated.

    6. He randomly calls NoSQL systems “applications”.

    7. He doesn’t acknowledge that some NoSQL systems — notably MongoDB — have companies behind them offering support contracts.

    Yes, the article is a train wreck. And even if that’s over-harsh, your praise was way over-effusive. 😉

  21. Vlad Rodionov on March 16th, 2010 1:50 am

    Curt:
    “If it’s not simple, then SQL or a substitute is harder to live without.”

    So what? Surely it is going to be not SQL, think about it as a low-level language-agnostic API, which will allow developers to get full access to a distributed storage internals, but there will be no joins and transactions, so technically it is still noSQL. I just do not agree with “simplicity” in your definition of noSQL.

  22. RC on March 16th, 2010 4:57 am

    @Vlad Rodionov

    Nothing wrong with a low level language-agnostic API. However I do think that some people will use that API to develop a declarative-set-based query language on top of that API.

    I spent some time writing MapReduce functions on MongoDB (in the weekends I dabble in MongoDB) to find duplicates. It is certainly possible but time consuming to write all that code.

    select x, count(*)
    from mytable
    group by x
    having count(*) > 1

    isn’t so bad at all.

  23. Curt Monash on March 16th, 2010 1:02 pm

    @Vlad,

    Ultimately, a DBMS or substitute technology is a big DML interpreter. So one of the top criteria is that the language(s) supported be conducive to efficient and effective programming. In many use cases, SQL is a fine language for that purpose. E.g., when joins are inherent to the problem (and not just artifacts of low-benefit normalization), SQL is apt to be a great choice.

    Some relational advocates would say “low-benefit normalization” is a contradiction in terms. Fine. But even if one disagrees with them, there are plenty of cases where joins come in very handy.

  24. Dave Kellogg on March 18th, 2010 9:06 am

    Thanks Curt.

    I was previously unaware of your post about the IEEE article and the process behind it.

    I do think one of the things NoSQL is about is unstructured data. The value in key, value pairs might be a tweet or a profile entry or a webpage.

    I think I will try to write a better CIO’s Guide to NoSQL myself when I get some time.

    Best,
    Dave

  25. Curt Monash on March 18th, 2010 9:43 am

    Dave,

    I thought my link to that post, in the bullet point that mentioned your name — and indeed in the very words you disputed — would have been your first clue. 😉

    I look forward to your guide.

    Best,

    CAM

  26. Search Facets » The hyping of the NoSQL foo on March 22nd, 2010 11:47 am

    […] Monash took up this naming issue in his recent “naming of the foo” post, one of three that he published in quick succession about NoSQL.  In that post he […]

  27. RYW (Read-Your-Writes) Consistency explained | DBMS2 -- DataBase Management System Services on May 1st, 2010 12:57 am

    […] And that, folks, is a big part of why the NoSQL folks are so negative about joins. […]

  28. Dominique De Vito on May 18th, 2010 3:01 pm

    Interesting post, in order to see clearer into NoSQL offers.
    Thanks.

    Here’s how I see NoSQL: IMHO, I simply put that NoSQL databases are disguised object databases! With relaxed contraints (e.g. relaxed consistency, no transaction, no join, etc.).

    Here is my post:
    http://www.jroller.com/dmdevito/entry/thinking_about_nosql_databases_classification
    for more details.

  29. I’m collecting data points on NoSQL and HVSP adoption | DBMS2 -- DataBase Management System Services on August 18th, 2010 9:09 am

    […] is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in general, NoSQL and SQL […]

  30. Marton Trencseni on August 25th, 2010 10:57 am

    How about

    OLRP = Online Request Processing systems.

    (As in Web request processing.)

  31. Curt Monash on August 25th, 2010 2:06 pm

    That’s pretty good, actually!

  32. More on NoSQL and HVSP (or OLRP) | DBMS 2 : DataBase Management System Services on August 26th, 2010 5:15 am

    […] the column-group-architecture guys — have probably had the most bang-in-lots-of-writes HVSP production […]

  33. NoSQL Daily – Wed Sep 22 › PHP App Engine on September 22nd, 2010 4:16 am

    […] The Naming of the Foo | DBMS2 : DataBase Management System Services […]

  34. Scott R. on September 30th, 2010 10:17 pm

    Curt,

    Sorry if I missed your references to this term in other postings.

    Daniel Abadi has an interesting post (http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html) that contains the quote: “In other words, NoSQL really means NoACID.”

    I don’t know if NoACID is a better term than your proposed HVSP, but in my opinion NoACID does a better job than NoSQL at implicitly conveying the premise behind the cloud (pardon the pun) of NoSQL.

    One problem with the terms NoSQL and NoACID is that they try to tell you how that product category is NOT like another product category, rather than how it is similar to yet other product categories, as Dave K. mentions above.

    Your thoughts?

    Scott R.

  35. Curt Monash on September 30th, 2010 11:15 pm

    Hi Scott,

    I certainly agree that ACID is a central issue, as per — for example — http://www.dbms2.com/2010/09/21/acid-compliant-transaction-integrity/ , especially the D.

    Marton Trensceni also had a good idea, as per http://www.dbms2.com/2010/08/26/nosql-hvsp-olrp/

    Best,

    CAM

  36. 数据仓库工作负载分类 | Alex的个人Blog on October 14th, 2011 11:13 pm

    […] high-volume simple processing […]

  37. karim on November 26th, 2011 5:27 am

    The SQL language is very powerful and much of that power depends upon table joins. I wouldn’t advise using a db without joins unless you have no choice.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.