Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

December 20, 2008

More grist for the column vs. row mill

Daniel Abadi and Sam Madden are at it again, following up on their blog posts of six months arguing for the general superiority of column stores over row stores (for analytic query processing). The gist is to recite a number of bases for superiority, beyond the two standard ones of less I/O and better compression, and seems to be based largely on Section 5 of a SIGMOD paper they wrote with Neil Hachem.

A big part of their argument is that if you carry the processing of columnar and/or compressed data all the way through in memory, you get lots of advantages, especially because everything’s smaller and hence fits better into Level 2 cache. There also is some kind of join algorithm enhancement, which seems to be based on noticing when the result wound up falling into a range according to some dimension, and perhaps using dictionary encoding in a way that will help induce such an outcome.

The main enemy here is row-store vendors who say, in effect, “Oh, it’s easy to shoehorn almost all the benefits of a column-store into a row-based system.” They also take a swipe — for being insufficiently purely columnar — at unnamed columnar Vertica competitors, described in terms that seemingly apply directly to ParAccel.

Categories: Columnar database management, Data warehousing, Database compression, ParAccel, Vertica Systems

2 Comments

December 16, 2008

Database archiving and information preservation

Two similar companies reached out to me recently – SAND Technology and Clearpace. Their current market focus is somewhat different: Clearpace talks mainly of archiving, and sells first and foremost into the compliance market, while SAND has the most traction providing “near-line” storage for SAP databases.* But both stories boil down to pretty much the same thing: Cheap, trustworthy data storage with good-enough query capabilities. E.g., I think both companies would agree the following is a not-too-misleading first-approximation characterization of their respective products:

Fully functional relational DBMS.
Claims of fast query performance, but that’s not how they’re sold.
Huge compression.
Careful attention to time-stamping and auditability.

Categories: Archiving and information preservation, Database compression, Rainstor, SAND Technology

3 Comments

December 16, 2008

Introduction to SAND Technology

SAND Technology has a confused history. For example:

SAND has been around in some form or other since 1982, starting out as a Hitachi reseller in Canada.
In 1992 SAND acquired a columnar DBMS product called Nucleus, which originally was integrated with hardware (in the form of a card). Notwithstanding what development chief Richard Grodin views as various advantages vs. Sybase IQ, SAND has only had limited success in that market.
Thus, SAND introduced a second, similarly-named product, which could also be viewed as a columnar DBMS. (As best I can tell, both are called SAND/DNA.) But it’s actually focused on archiving, aka the clunkily named “near-line storage.” And it’s evidently not the same code line; e.g., the newer product isn’t bit-mapped, while the older one is.
The near-line product was originally focused on the SAP market. Now it’s moving beyond.
Canada-based SAND had offices in Germany and the UK before it did in the US. This leads to an oddity – SAND is less focused on the SAP aftermarket in Germany than it still is in the US.

SAND is publicly traded, so its numbers are on display. It turns out to be doing $7 million in annual revenue, and losing money.

OK. I just wanted to get all that out of the way. My main thoughts about the DBMS archiving market are in a separate post.

Categories: Archiving and information preservation, Columnar database management, Data warehousing, SAND Technology

6 Comments

December 2, 2008

Data warehouse load speeds in the spotlight

Syncsort and Vertica combined to devise and run a benchmark in which a data warehouse got loaded at 5 ½ terabytes per hour, which is several times faster than the figures used in any other vendors’ similar press releases in the past. Takeaways include:

Syncsort isn’t just a mainframe sort utility company, but also does data integration. Who knew?
Vertica’s design to overcome the traditional slow load speed of columnar DBMS works.

The latter is unsurprising. Back in February, I wrote at length about how Vertica makes rapid columnar updates. I don’t have a lot of subsequent new detail, but it made sense then and now. Read more

Categories: Analytic technologies, Benchmarks and POCs, Columnar database management, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Investment research and trading, OLTP, Telecommunications, Theory and architecture, Vertica Systems, Web analytics

17 Comments

November 26, 2008

Another dubious “end of computer history” argument

In a typically snarky Register article, Chris Mellor raises a caution about the use of future many-cored chips in IT. In essence, he says that today’s apps run in a relatively small number of threads each, and modifying them to run in many threads is too difficult. Hence, most of the IT use for many-cored chips will be via hypervisors that assign apps to cores as makes sense.

Mellor has a point, but he’s overstating it. Read more

Categories: Parallelization, Theory and architecture

3 Comments

October 22, 2008

Update on Aster Data Systems and nCluster

I spent a few hours at Aster Data on my West Coast swing last week, which has now officially put out Version 3 of nCluster. Highlights included: Read more

Categories: Application areas, Aster Data, Data warehousing, Database compression, MapReduce, Market share and customer counts, Parallelization, Specific users, Theory and architecture, Web analytics

3 Comments

October 22, 2008

Introduction to Kickfire

I’ve spent a few hours visiting or otherwise talking with my new clients at Kickfire recently, so I think I have a better feel for their story. A few details are still missing, however, either because I didn’t get around to asking about them, or because an unexplained accident corrupted my notes (and I wasn’t even using Office 2007). Highlights include: Read more

Categories: Columnar database management, Data warehouse appliances, Data warehousing, Kickfire, MySQL, Theory and architecture

Oracle notes

I spent about six hours at Oracle today — talking with Andy Mendelsohn, Ray Roccaforte, Juan Loaiza, Cetin Ozbutun, et al. — and plan to write more later. For now, let me pass along a few quick comments. Read more

Categories: Data warehousing, Exadata, Oracle, Parallelization, Pricing, Storage, Theory and architecture

10 Comments

October 15, 2008

Teradata’s Petabyte Power Players

As previously hinted, Teradata has now announced 4 of the 5 members of its “Petabyte Power Players” club. These are enterprises with 1+ petabyte of data on Teradata equipment. As is commonly the case when Teradata discusses such figures, there’s some confusion as to how they’re actually counting. But as best I can tell, Teradata is counting: Read more

Categories: Data warehousing, eBay, Market share and customer counts, Petabyte-scale data management, Specific users, Teradata

11 Comments

October 5, 2008

Schema flexibility and XML data management

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
The energy industry keeps adding new kinds of highly complex equipment it has to manage.

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. Read more

Categories: Data models and architecture, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents

3 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Theory and architecture

More grist for the column vs. row mill

Database archiving and information preservation

Introduction to SAND Technology

Data warehouse load speeds in the spotlight

Another dubious “end of computer history” argument

Update on Aster Data Systems and nCluster

Introduction to Kickfire

Oracle notes

Teradata’s Petabyte Power Players

Schema flexibility and XML data management

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin