Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

March 14, 2010

Toward a NoSQL taxonomy

I talked Friday with Dwight Merriman, founder of 10gen (the MongoDB company). He more or less convinced me of his definition of NoSQL systems, which in my adaptation goes:

NoSQL = HVSP (High Volume Simple Processing) without joins or explicit transactions

Within that realm, Dwight offered a two-part taxonomy of NoSQL systems, according to their data model and replication/sharding strategy. I’d be happier, however, with at least three parts to the taxonomy:

How data looks logically on a single node
How data is stored physically on a single node
How data is distributed, replicated, and reconciled across multiple nodes, and whether applications have to be aware of how the data is partitioned among nodes/shards. Read more

Categories: Cassandra, Data models and architecture, NoSQL, Parallelization, RDF and graphs, Structured documents, Theory and architecture

13 Comments

March 13, 2010

The Naming of the Foo

Let’s start from some reasonable premises. Read more

Categories: Data models and architecture, Database diversity, Hadoop, MapReduce, MarkLogic, NoSQL, OLTP, Theory and architecture

37 Comments

March 12, 2010

Some NoSQL links

I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I’m poking around a bit reading stuff on the subjects. Here are some links I found. Read more

Categories: Amazon and its cloud, Cassandra, Continuent, Google, MySQL, NoSQL, Open source, RDF and graphs, Tokutek and TokuDB

5 Comments

March 2, 2010

Cassandra and the NoSQL scalable OLTP argument

Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:

Facebook invented and is adopting Cassandra.
Twitter is adopting Cassandra.
Digg is adopting Cassandra.
LinkedIn invented and is adopting Voldemort.
Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.

But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. Read more

Categories: Cassandra, Data models and architecture, NoSQL, OLTP, Open source, Parallelization, Specific users, Theory and architecture

16 Comments

February 26, 2010

Another reason to expect number-crunching and big-data management to converge

Dan Olds argues that Oracle is likely to pursue commercially-substantive high performance computing (HPC), emphasis mine: Read more

Categories: Analytic technologies, Data warehousing, Exadata, Oracle, Theory and architecture

Chris Bird’s blog is brilliant, and update-in-place is increasingly passe’

I wouldn’t say every post in Chris Bird’s occasionally-updated blog is brilliant. I wouldn’t even say every post is readable. But I’d still recommend his blog to just about anybody who reads here as, at a minimum, a consciousness-raiser.

One of the two posts inspiring me to mention this is a high-level one on “technical debt“, reminding us why things don’t always get done right the first time, and further reminding us that circling back to fix them sooner rather than later is usually wise. The other connects two observations that individually have great merit (at least if you don’t take them to extremes):

Update-in-place is passe’
So is elaborate up-front database design

Specific points of interest here include: Read more

Categories: Theory and architecture

7 Comments

February 22, 2010

Vertica 4.0

Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it’s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.

For starters, there’s a lot of new analytic functionality. This isn’t Aster/Netezza-style ambitious. Rather, there’s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven’t yet gotten detail about those.

Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told: Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Vertica Systems

12 Comments

February 1, 2010

Open issues in database and analytic technology

The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I’ve posted here. So I’ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points. Read more

Categories: Analytic technologies, Business intelligence, Cloud computing, Data warehousing, Presentations, RDF and graphs, Software as a Service (SaaS), Solid-state memory, Theory and architecture

4 Comments

January 31, 2010

Interesting trends in database and analytic technology

My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the union of database and analytic technologies – the intersection of those two sectors is an area of particular focus, but is far from the whole of my coverage.)

One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including: Read more

Categories: Analytic technologies, Business intelligence, Data models and architecture, Data warehousing, MapReduce, Memory-centric data management, NoSQL, Parallelization, Presentations, Solid-state memory, Storage

9 Comments

January 31, 2010

Flash, other solid-state memory, and disk

If there’s one subject on which the New England Database Summit changed or at least clarified my thinking,* it’s future storage technologies. Here’s what I now think:

Solid-state memory will soon be the right storage technology for a large fraction of databases, OLTP and analytic alike. I’m not sure whether the initial cutoff in database size is best thought of as terabytes or 10s of terabytes, but it’s in that range. And it will increase over time, for the usual cheaper-parts reasons.
That doesn’t necessarily mean flash. PCM (Phase-Change Memory) is coming down the pike, with perhaps 100X the durability of flash, in terms of the total number of writes it can tolerate. On the other hand, PCM has issues in the face of heat. More futuristically, IBM is also high on magnetic racetrack memory. IBM likes the term storage-class memory to cover all this — which I find regrettable, since the acronym SCM is way overloaded already. 🙂
Putting a disk controller in front of solid-state memory is really wasteful. It wreaks havoc on I/O rates.
Generic PCIe interfaces don’t suffice either, in many analytic use cases. Their I/O is better, but still not good enough. (Doing better yet is where Petascan – the stealth-mode company I keep teasing about – comes in.)
Disk will long be useful for very large databases. Kryder’s Law, about disk capacity, has at least as high an annual improvement as Moore’s Law shows for chip capacity, the disk rotation speed bottleneck notwithstanding. Disk will long be much cheaper than silicon for data storage. And cheaper silicon in sensors will lead to ever more machine-generated data that fills up a lot of disks.
Disk will long be useful for archiving. Disk is the new tape.

*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.

Related links

A slide deck by Mohan of IBM similar to the one he presented at the NEDB Summit about storage-class memories.
A much more detailed IBM presentation on storage-class memories.
Oracle’s and Teradata’s beliefs about the importance of solid-state memory.

Other posts based on my January, 2010 New England Database Summit keynote address

Categories: Data warehousing, Michael Stonebraker, Presentations, Solid-state memory, Storage, Theory and architecture

3 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in