Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

June 21, 2010

Netezza’s silicon balance

As I’ve mentioned in a couple of other posts, Netezza is stressing that the most recent wave of its technology is software-only, with no hardware upgrades made or needed. In other words, Netezza boxes already have all the silicon they need. But of course, there are really at least three major aspects to the Netezza silicon story – FPGA (Field-Programmable Gate Array), CPU, and RAM.

Netezza planned to be “generous” in its original TwinFin FPGA capacity, anticipating software upgrades like the ones it’s introducing now. It is satisfied that this strategy worked. More on this below.
The same surely applies to CPU.
What’s more, I get the sense that the CPU turned out in practice to be even more over-provisioned than they anticipated …
… at least when one just considers Netezza’s base NPS software.
However, I suspect that if the advanced analytics capability takes off, Netezza will determine that more CPU is always better.
And by the way, NEC is making versions of Netezza appliances with more advanced chips than Netezza is. So if anybody should really, really need more CPU in their Netezza boxes, there’s a very straightforward way to make that happen. (And if there were nontrivial demand for that, appropriate support plans could surely be structured.)
Everybody needs to be careful about RAM. Netezza is surely no exception.

The major parts of Netezza’s FPGA software are:

Compress Engine 2. This is Netezza’s new way of doing compression.
Compress Engine 1. This is Netezza’s old way of doing compression. It is being kept around so that existing Netezza tables don’t suddenly have to be changed or reloaded.
Project Engine. Guess what this does.
Restrict Engine. Ditto.
Visibility Engine. This enforces ACID and handles row-level security. It is “sort of a corner of” the Restrict Engine (Actually, Netezza seems to waver as to whether to describe “Restrict” and “Visibility” as being two engines or one.)
Miscellaneous plumbing.

If I understood correctly, each Netezza FPGA has two each of the engines in parallel.

Related link

An August, 2009 post on what Netezza does in its FPGA

Categories: Data warehouse appliances, Data warehousing, Database compression, Netezza, Theory and architecture

A partial overview of Netezza database software technology

Netezza is having its user conference Enzee Universe in Boston Monday–Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza Thursday for meetings that included a 3 ½ hour session with 10 or so senior engineers, and have exchanged some clarifying emails since. Read more

Categories: Data warehouse appliances, Data warehousing, Netezza, Theory and architecture, Workload management

15 Comments

June 12, 2010

The underlying technology of QlikView

QlikTech* finally decided both to become a client and, surely not coincidentally, to give me more technical detail about QlikView than it had when last we talked a couple of years ago. Indeed, I got to spend a couple of hours on the phone not just with Anthony Deighton, but also with QlikTech’s Hakan Wolge, who wrote 70-80% of the code in QlikView 1.0, and remains in effect QlikTech’s chief architect to this day.

*Or, as it now appears to be called, Qlik Technologies.

Let’s start with some quick reminders:

QlikTech makes QlikView, a widely popular business intelligence (BI) tool suite.
QlikView is distinguished by the flexibility of navigation through its user interface.
To support this flexibility, QlikView preloads all data you might want to query into memory.

Let’s also dispose of one confusion right up front, namely QlikTech’s use of the word associative: Read more

Categories: Business intelligence, Database compression, Memory-centric data management, QlikTech and QlikView

36 Comments

June 11, 2010

Ingres VectorWise technical highlights

After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I’d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included: Read more

Categories: Actian and Ingres, Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Database compression, Open source, VectorWise

2 Comments

June 5, 2010

Algebraix

I talked Friday with Chris Piedemonte and Gary Sherman, respectively the Cofounder/CTO and Chief Mathematician of Algebraix, who hooked up together for this project back in 2003 or 2004. (Algebraix is the company formerly known as XSPRADA.) Algebraix makes an analytic DBMS, somewhat based on the ideas of extended set theory, that runs on SMP (Symmetric MultiProcessing) boxes. Like all analytic DBMS vendors, Algebraix has on some occasions run some queries orders of magnitude faster than they ran on the systems users were looking to replace.

Algebraix’s secret sauce is that the DBMS keeps reorganizing and recopying the data on disk, to optimize performance in response to expected query patterns (automatically inferred from queries it’s seen so far). This sounds a lot like the Infobright story, with some of the more obvious differences being: Read more

Categories: Algebraix, Data warehousing, Database compression, Infobright, Theory and architecture

3 Comments

June 5, 2010

Extended set theory, aka “What is a tuple anyway?”

The Algebraix folks are trying to repopularize David Childs’ idea of “Extended set theory.” In a nutshell, the extended set theory idea is:

A tuple is a set of (field-name, field-value) pairs.

I’ve been fairly negative about the extended set theory concept – but in fairness, that may be because I misunderstood how other people thought of tuples. Any time I’ve had to formalize what I thought of a tuple as being, I came up with something very much like the above, except that if one wants to be relational one needs a requirement like:

In any one tuple, each field-name must be unique.

In line with that definition, I’d say a table is something like: Read more

Categories: Algebraix, Data models and architecture, Theory and architecture

23 Comments

May 25, 2010

VoltDB finally launches

VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:

VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)

Categories: EAI, EII, ETL, ELT, ETLT, Games and virtual worlds, In-memory DBMS, Investment research and trading, Michael Stonebraker, NoSQL, OLTP, Parallelization, Solid-state memory, Telecommunications, Theory and architecture, VoltDB and H-Store

16 Comments

May 23, 2010

More on Sybase IQ, including Version 15.2

Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I’m finally getting around to doing so. Highlights include but are not limited to:

Slide 2 has some market success figures and so on. (>3100 copies at >1800 users, >200 sales last year)
Slides 6-11 give more detail on Sybase’s indexing and data access methods than I put into my recent technical basics of Sybase IQ post.
Slide 16 reminds us that in-database data mining is quite competitive with what SAS has actually delivered with its DBMS partners, even if it doesn’t have the nice architectural approach of Aster or Netezza. (I.e., Sybase IQ’s more-than-SQL advanced analytics story relies on C++ UDFs — User Defined Functions — running in-process with the DBMS.) In particular, there’s a data mining/predictive analytics library — modeling and scoring both — licensed from a small third party.
A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)

Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven’t gotten around to yet.

More recently, Sybase volunteered permission for me to preannounce Sybase IQ Version 15.2 by a few days (it’s scheduled to come out this week). Read more

Categories: Analytic technologies, Application areas, Columnar database management, Data mart outsourcing, Data warehousing, Database compression, Investment research and trading, Market share and customer counts, Petabyte-scale data management, Sybase, Telecommunications, Text

1 Comment

May 17, 2010

Technical basics of Sybase IQ

The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I’ve been slow about posting based on those briefings. But what with Sybase being acquired by SAP, Sybase having an analyst meeting this week, and other reasons – well, this seems like a good time to post about Sybase IQ. 🙂

For starters, Sybase IQ is not just a bitmapped system, but it’s also not all that closely akin to C-Store or Vertica. In particular,

Sybase IQ stores data in columns – like, for example, Vertica.
Sybase IQ relies on indexes to retrieve data – unlike, for example, Vertica, in which the column pretty much is the index.
However, columns themselves can be used as indexes in the usual Vertica-like way.
Most of Sybase IQ’s indexes are bitmaps, or a lot like bitmaps, ala’ the original IQ product.
Some of Sybase IQ’s indexes are not at all like bitmaps, but more like B-trees.
In general, Sybase recommends that you put multiple indexes on each column because — what the heck – each one of them is pretty small. (In particular, the bitmap-like indexes are highly compressible.) Together, indexes tend to take up <10% of Sybase IQ storage space.

Categories: Columnar database management, Data warehousing, Database compression, Sybase, Theory and architecture

3 Comments

May 13, 2010

Further quick SAP/Sybase reactions

Raj Nathan of Sybase has been calling around to chat quickly about the SAP/Sybase deal and related matters. Talking with Raj didn’t change any of my initial reactions to SAP’s acquisition of Sybase. I also didn’t bother Raj with too many hard questions, as he was clearly in call-and-reassure mode, reaching out to customers and influencers alike.

That said, Read more

Categories: Aleri and Coral8, Analytic technologies, Business intelligence, Columnar database management, In-memory DBMS, Memory-centric data management, Mid-range, SAP AG, Streaming and complex event processing (CEP), Sybase, Theory and architecture

13 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in