Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

February 22, 2010

TwinFin(i) – Netezza’s version of a parallel analytic platform

Much like Aster Data did in Aster 4.0 and now Aster 4.5, Netezza is announcing a general parallel big data analytic platform strategy. It is called Netezza TwinFin(i), it is a chargeable option for the Netezza TwinFin appliance, and many announced details are on the vague side, with Netezza promising more clarity at or before its Enzee Universe conference in June. At a high level, the Aster and Netezza approaches compare/contrast as follows: Read more

Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadoop, MapReduce, Netezza, Predictive modeling and advanced analytics, SAS Institute, Teradata

10 Comments

February 22, 2010

Aster Data nCluster 4.5

Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:

Aster Data Analytic Foundation, a set of analytic packages prebuilt in Aster’s SQL-MapReduce
Aster Data Developer Express, an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation

And in other Aster news:

Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.
Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm’s-length Fusion I/O certification is Aster’s ultimate solid-state memory strategy.
I had the wrong impression about how far Aster/SAS integration has gotten. So far, it’s just at the connector level.

Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.

But mainly, I want to write about the analytic packages. Read more

Categories: Aster Data, Data warehousing, Investment research and trading, Predictive modeling and advanced analytics, RDF and graphs, SAS Institute, Teradata

9 Comments

February 22, 2010

Vertica 4.0

Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it’s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.

For starters, there’s a lot of new analytic functionality. This isn’t Aster/Netezza-style ambitious. Rather, there’s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven’t yet gotten detail about those.

Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told: Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Vertica Systems

12 Comments

February 11, 2010

Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010

As he has before, Intelligent Enterprise Editor Doug Henschen

Personally selected annual lists of 12 “Most influential” companies and 36 “Companies to watch” in analytics- and database-related sectors.
Made it clear that these are his personal selections.
Nonetheless has called it an Editors’ Choice list, rather than Editor’s Choice. 🙂

(Actually, he’s really called it an “award.”)

Categories: Actian and Ingres, Analytic technologies, Aster Data, Business intelligence, Cloudera, Data warehousing, Greenplum, HP and Neoview, IBM and DB2, Infobright, Intersystems and Cache', Jaspersoft, Kalido, MarkLogic, Microsoft and SQL*Server, Netezza, Open source, Oracle, Pentaho, QlikTech and QlikView, SAP AG, Tableau Software, Talend, Teradata, Vertica Systems

2 Comments

February 10, 2010

Comments on the Gartner 2009/2010 Data Warehouse Database Management System Magic Quadrant

February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.

At intervals of a little over a year, Gartner Group publishes a Data Warehouse Database Management System Magic Quadrant. Gartner’s 2009 data warehouse DBMS Magic Quadrant — actually, January 2010 — is now out.* For many reasons, including those I noted in my comments on Gartner’s 2008 Data Warehouse DBMS Magic Quadrant, the Gartner quadrant pictures are a bad use of good research. Rather than rehash that this year, I’ll merely call out some points in the surrounding commentary that I find interesting or just plain strange. Read more

Categories: Actian and Ingres, Analytic technologies, Aster Data, Data warehouse appliances, Data warehousing, Exadata, Greenplum, HP and Neoview, IBM and DB2, illuminate Solutions, Infobright, Market share and customer counts, Netezza, Open source, Oracle, Pricing, Sybase, Teradata

7 Comments

February 5, 2010

The Sybase Aleri RAP

Well, I got a quick Sybase/Aleri briefing, along with multiple apologies for not being prebriefed. (Main excuse: News was getting out, which accelerated the announcement.) Nothing badly contradicted my prior post on the Sybase/Aleri deal.

To understand Sybase’s plans for Aleri and CEP, it helps to understand Sybase’s current CEP-oriented offering, Sybase RAP. So far as I can tell, Sybase RAP has to date only been sold in the form of Sybase RAP: The Trading Edition. In that guise, Sybase RAP has been sold to >40 outfits since its May, 2008 launch, mainly big names in the investment banking and stock exchange sectors. If I understood correctly, the next target market for Sybase RAP is telcos, for real-time network tuning and management.

In addition to any domain-specific applications, Sybase RAP has three layers:

CEP (Complex Event Processing). Sybase RAP CEP is based on a version of the Coral8 engine Sybase licensed and has been subsequently developing.
In-memory DBMS. Sybase’s IMDB is part of (but I guess separable from) and has the same API as Sybase’s OLTP DBMS Adaptive Server Enterprise (ASE, aka Sybase Classic).
Sybase IQ. Actually, Sybase used the phrase “based on Sybase IQ,” but I’m guessing it’s just Sybase IQ.

Categories: Aleri and Coral8, Analytic technologies, Data warehousing, In-memory DBMS, Investment research and trading, Market share and customer counts, Memory-centric data management, Streaming and complex event processing (CEP), Sybase

9 Comments

February 1, 2010

Open issues in database and analytic technology

The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I’ve posted here. So I’ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points. Read more

Categories: Analytic technologies, Business intelligence, Cloud computing, Data warehousing, Presentations, RDF and graphs, Software as a Service (SaaS), Solid-state memory, Theory and architecture

4 Comments

January 31, 2010

Interesting trends in database and analytic technology

My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the union of database and analytic technologies – the intersection of those two sectors is an area of particular focus, but is far from the whole of my coverage.)

One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including: Read more

Categories: Analytic technologies, Business intelligence, Data models and architecture, Data warehousing, MapReduce, Memory-centric data management, NoSQL, Parallelization, Presentations, Solid-state memory, Storage

9 Comments

January 31, 2010

Flash, other solid-state memory, and disk

If there’s one subject on which the New England Database Summit changed or at least clarified my thinking,* it’s future storage technologies. Here’s what I now think:

Solid-state memory will soon be the right storage technology for a large fraction of databases, OLTP and analytic alike. I’m not sure whether the initial cutoff in database size is best thought of as terabytes or 10s of terabytes, but it’s in that range. And it will increase over time, for the usual cheaper-parts reasons.
That doesn’t necessarily mean flash. PCM (Phase-Change Memory) is coming down the pike, with perhaps 100X the durability of flash, in terms of the total number of writes it can tolerate. On the other hand, PCM has issues in the face of heat. More futuristically, IBM is also high on magnetic racetrack memory. IBM likes the term storage-class memory to cover all this — which I find regrettable, since the acronym SCM is way overloaded already. 🙂
Putting a disk controller in front of solid-state memory is really wasteful. It wreaks havoc on I/O rates.
Generic PCIe interfaces don’t suffice either, in many analytic use cases. Their I/O is better, but still not good enough. (Doing better yet is where Petascan – the stealth-mode company I keep teasing about – comes in.)
Disk will long be useful for very large databases. Kryder’s Law, about disk capacity, has at least as high an annual improvement as Moore’s Law shows for chip capacity, the disk rotation speed bottleneck notwithstanding. Disk will long be much cheaper than silicon for data storage. And cheaper silicon in sensors will lead to ever more machine-generated data that fills up a lot of disks.
Disk will long be useful for archiving. Disk is the new tape.

*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.

Related links

A slide deck by Mohan of IBM similar to the one he presented at the NEDB Summit about storage-class memories.
A much more detailed IBM presentation on storage-class memories.
Oracle’s and Teradata’s beliefs about the importance of solid-state memory.

Other posts based on my January, 2010 New England Database Summit keynote address

Categories: Data warehousing, Michael Stonebraker, Presentations, Solid-state memory, Storage, Theory and architecture

3 Comments

January 31, 2010

The disk rotation speed bottleneck

I’ve been referring to the disk (rotation) speed bottleneck for years, but I don’t really have a clean link for it. Let me fix that right now.

The first hard disks ever were introduced by IBM in 1956. They rotated 1,200 times per minute. Today’s state-of-the-art disk drives rotate 15,000 times per minute. That’s a 12.5-fold improvement since the first term of the Eisenhower Administration. (I understand that the reason for this slow improvement is aerodynamic — a disk that spins too fast literally flies off the spindle.)

Unfortunately, random seek time is bounded below, on average, by 1/2 of a disk’s rotation time. Hence disk seek times can never get below 2 milliseconds.

15,000 RPM = 250 rotations/second, which implies 4 milliseconds/rotation.

From that, much about modern analytic DBMS design follows.

Categories: Analytic technologies, Data warehousing, Storage, Theory and architecture

7 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Data warehousing

TwinFin(i) – Netezza’s version of a parallel analytic platform

Aster Data nCluster 4.5

Vertica 4.0

Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010

Comments on the Gartner 2009/2010 Data Warehouse Database Management System Magic Quadrant

The Sybase Aleri RAP

Open issues in database and analytic technology

Interesting trends in database and analytic technology

Flash, other solid-state memory, and disk

The disk rotation speed bottleneck

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin