Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

February 20, 2008

ObjectGrid versus H-Store

Billy Newport of IBM sees a lot of similarities between his app-server-based product ObjectGrid and H-Store. In both cases, constrained tree schemas are assumed, and OLTP performance goodness ensues. A couple of points I noted on a quick skim through his blog:

He calls out RAM consumption as a challenge for this kind of architecture.
He points out that it’s a big advantage to have data called and used in the same address space.

Being based in RAM is obviously a huge part of the H-Store scheme. But so is having transaction execution be close to the database.

IBM now has both ObjectGrid and a memory-centric DBMS (solidDB) that they’ve been using as a front end for DBMS. Integration of the two could be pretty interesting.

Categories: Cache, IBM and DB2, Memory-centric data management, OLTP, solidDB, Theory and architecture, VoltDB and H-Store

The architectural assumptions of H-Store

I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.

Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi. Read more

Categories: Database diversity, In-memory DBMS, Memory-centric data management, OLTP, VoltDB and H-Store

5 Comments

February 19, 2008

Mike Stonebraker may be oversimplifying data warehousing just a tad

Mike Stonebraker has now responded to the second post in my five-part database diversity series. Takeaways and rejoinders include: Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, Theory and architecture, Vertica Systems

2 Comments

February 19, 2008

Kalido — CASE for complex data warehouses

Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:

Kalido Business Information Modeler (the newest part)
Kalido Dynamic Information Warehouse
Kalido Universal Information Director
Kalido Master Data Management

But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.

Categories: Data integration and middleware, Data models and architecture, Data warehousing, EAI, EII, ETL, ELT, ETLT, Kalido, Theory and architecture

1 Comment

February 18, 2008

ParAccel technical highlights

I recently caught up with ParAccel’s CTO Barry Zane and Marketing VP Kim Stanick for a long technical discussion, which they have graciously continued by email. It would be impolitic in the extreme to comment on what led up to that. Let’s just note that many things I’ve previously written about ParAccel are now inoperative, and go straight to the highlights.

Categories: Columnar database management, Data warehousing, Emulation, transparency, portability, Microsoft and SQL*Server, ParAccel

5 Comments

February 18, 2008

Mike Stonebraker calls for the complete destruction of the old DBMS order

Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.

Categories: Database diversity, In-memory DBMS, Memory-centric data management, Michael Stonebraker, OLTP, Theory and architecture, VoltDB and H-Store

36 Comments

February 16, 2008

Mike Stonebraker’s DBMS taxonomy

In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog. (Edit: Some good stuff disappeared when Vertica nuked that blog.)

OLTP DBMSs focused on fast, reliable transaction processing

Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance

Science DBMSs — after all MatLab does not scale to disk-sized arrays

RDF stores focused on efficiently storing semi-structured data in this format

XML stores focused on semi-structured data in this format

Search engines — the big players all use proprietary engines in this area

Stream Processing Engines focused on real-time StreamSQL

“Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)

MapReduce and Hadoop — after all Google has enough “throw weight” to define a category

He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS. Read more

Categories: Data types, Database diversity, Michael Stonebraker, Mid-range, OLTP, RDF and graphs, Theory and architecture

6 Comments

February 15, 2008

Database management system choices – beyond relational

This is the fifth of a five-part series on database management system choices. For the first post in the series, please click here.

Relational database management systems have three essential elements:

Rows and columns. Theoretically, rows and columns may be inessential to the relational model. But in reality, they are built into the design of every real-world relational product. If you don’t have rows and columns, you’re not using the product to do what it was well-designed for.
Predicate logic. Theoretically, everything can be fitted into a predicate Procrustean bed. But if you’re looking for relevancy rankings on a text search, binary logic is a highly convoluted way to get them.
Fixed schemas. Database theorists commonly assume that databases have fixed schemas. If this means that 90%+ of all information is null or missing, they have elegant ways of dealing with that. Even so, as computing gets ever more concerned with individuals — each with his/her/its unique “profile(s)” — fixed schemas get ever harder to maintain.

If any of these three elements is missing or inappropriate, then a traditional relational database management system may not be the best choice.

Categories: Data types, Database diversity, Theory and architecture

3 Comments

February 15, 2008

Database management system choices — mid-range-relational

This is the fourth of a five-part series on database management system choices. For the first post in the series, please click here.

The other threat to the high-end relational DBMS vendors aims squarely at the heart of their business. It’s the mid-range relational database management systems, which are doing an ever-larger fraction of what their high-end cousins can. That said, different products do different things well. So if you’re not blindly paying up for the security of an all-things-to-all-people high-end DBMS, there are a number of factors you might want to consider.

Categories: Database diversity, EnterpriseDB and Postgres Plus, Mid-range, MySQL, OLTP, PostgreSQL, Theory and architecture

3 Comments

February 15, 2008

Database management system choices – relational data warehouse

This is the third of a five-part series on database management system choices. For the first post in the series, please click here.

High-end OLTP relational database management system vendors try to offer one-stop shopping for almost all data management needs. But as I noted in my prior post, their product category is facing two major competitive threats. One comes from specialty data warehouse database management system products. I’ve covered those extensively in this blog, with key takeaways including:

Specialty data warehouse products offer huge cost advantages versus less targeted DBMS. This applies to purchase/maintenance and administrative costs alike. And it’s true even when the general-purposed DBMS boast data warehousing features such as star indexes, bitmap indexes, or sophisticated optimizers.
The larger the database, the bigger the difference. It’s almost inconceivable to use Oracle for a 100+ terabyte data warehouse. But if you only have 5 terabytes, Oracle is a perfectly viable – albeit annoying and costly – alternative.
Most specialty data warehouse products have a shared-nothing architecture. Smaller parts are cheaper per unit of capacity. Hence shared nothing/grid architectures are inherently cheaper, at least in theory. In data warehousing, that theoretical possibility has long been made practical.
Specialty data warehouse products with row-based architectures are commonly sold in appliance formats. In particular, this is true of Teradata, Netezza, DATAllegro, and Greenplum. One reason is that they’re optimized to stream data off of disk fairly sequentially, as opposed to relying on random seeks.
Specialty data warehouse products with columnar architectures are commonly available in software-only formats. Even so, Vertica and ParAccel also boast appliance deals, with HP and Sun respectively.
There is tremendous technical diversity and differentiation in the specialty data warehouse system market.

Let me expand on that last point. Different features may or may not be important to you, depending on whether your precise application needs include: Read more

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Database diversity, Theory and architecture

20 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Theory and architecture

ObjectGrid versus H-Store

The architectural assumptions of H-Store

Mike Stonebraker may be oversimplifying data warehousing just a tad

Kalido — CASE for complex data warehouses

ParAccel technical highlights

Mike Stonebraker calls for the complete destruction of the old DBMS order

Mike Stonebraker’s DBMS taxonomy

Database management system choices – beyond relational

Database management system choices — mid-range-relational

Database management system choices – relational data warehouse

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin