Theory and architecture
Analysis of design choices in databases and database management systems. Related subjects include:
- Any subcategory
- Database diversity
- Explicit support for specific data types
- (in Text Technologies) Text search
Netezza’s silicon balance
As I’ve mentioned in a couple of other posts, Netezza is stressing that the most recent wave of its technology is software-only, with no hardware upgrades made or needed. In other words, Netezza boxes already have all the silicon they need. But of course, there are really at least three major aspects to the Netezza silicon story – FPGA (Field-Programmable Gate Array), CPU, and RAM.
- Netezza planned to be “generous” in its original TwinFin FPGA capacity, anticipating software upgrades like the ones it’s introducing now. It is satisfied that this strategy worked. More on this below.
- The same surely applies to CPU.
- What’s more, I get the sense that the CPU turned out in practice to be even more over-provisioned than they anticipated …
- … at least when one just considers Netezza’s base NPS software.
- However, I suspect that if the advanced analytics capability takes off, Netezza will determine that more CPU is always better.
- And by the way, NEC is making versions of Netezza appliances with more advanced chips than Netezza is. So if anybody should really, really need more CPU in their Netezza boxes, there’s a very straightforward way to make that happen. (And if there were nontrivial demand for that, appropriate support plans could surely be structured.)
- Everybody needs to be careful about RAM. Netezza is surely no exception.
The major parts of Netezza’s FPGA software are:
- Compress Engine 2. This is Netezza’s new way of doing compression.
- Compress Engine 1. This is Netezza’s old way of doing compression. It is being kept around so that existing Netezza tables don’t suddenly have to be changed or reloaded.
- Project Engine. Guess what this does.
- Restrict Engine. Ditto.
- Visibility Engine. This enforces ACID and handles row-level security. It is “sort of a corner of” the Restrict Engine (Actually, Netezza seems to waver as to whether to describe “Restrict” and “Visibility” as being two engines or one.)
- Miscellaneous plumbing.
If I understood correctly, each Netezza FPGA has two each of the engines in parallel.
Related link
- An August, 2009 post on what Netezza does in its FPGA
Categories: Data warehouse appliances, Data warehousing, Database compression, Netezza, Theory and architecture | Leave a Comment |
A partial overview of Netezza database software technology
Netezza is having its user conference Enzee Universe in Boston Monday–Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza Thursday for meetings that included a 3 ½ hour session with 10 or so senior engineers, and have exchanged some clarifying emails since. Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Theory and architecture, Workload management | 15 Comments |
The underlying technology of QlikView
QlikTech* finally decided both to become a client and, surely not coincidentally, to give me more technical detail about QlikView than it had when last we talked a couple of years ago. Indeed, I got to spend a couple of hours on the phone not just with Anthony Deighton, but also with QlikTech’s Hakan Wolge, who wrote 70-80% of the code in QlikView 1.0, and remains in effect QlikTech’s chief architect to this day.
*Or, as it now appears to be called, Qlik Technologies.
Let’s start with some quick reminders:
- QlikTech makes QlikView, a widely popular business intelligence (BI) tool suite.
- QlikView is distinguished by the flexibility of navigation through its user interface.
- To support this flexibility, QlikView preloads all data you might want to query into memory.
Let’s also dispose of one confusion right up front, namely QlikTech’s use of the word associative: Read more
Categories: Business intelligence, Database compression, Memory-centric data management, QlikTech and QlikView | 36 Comments |
Ingres VectorWise technical highlights
After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I’d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included: Read more
Categories: Actian and Ingres, Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Database compression, Open source, VectorWise | 2 Comments |
Algebraix
I talked Friday with Chris Piedemonte and Gary Sherman, respectively the Cofounder/CTO and Chief Mathematician of Algebraix, who hooked up together for this project back in 2003 or 2004. (Algebraix is the company formerly known as XSPRADA.) Algebraix makes an analytic DBMS, somewhat based on the ideas of extended set theory, that runs on SMP (Symmetric MultiProcessing) boxes. Like all analytic DBMS vendors, Algebraix has on some occasions run some queries orders of magnitude faster than they ran on the systems users were looking to replace.
Algebraix’s secret sauce is that the DBMS keeps reorganizing and recopying the data on disk, to optimize performance in response to expected query patterns (automatically inferred from queries it’s seen so far). This sounds a lot like the Infobright story, with some of the more obvious differences being: Read more
Categories: Algebraix, Data warehousing, Database compression, Infobright, Theory and architecture | 3 Comments |
Extended set theory, aka “What is a tuple anyway?”
The Algebraix folks are trying to repopularize David Childs’ idea of “Extended set theory.” In a nutshell, the extended set theory idea is:
A tuple is a set of (field-name, field-value) pairs.
I’ve been fairly negative about the extended set theory concept – but in fairness, that may be because I misunderstood how other people thought of tuples. Any time I’ve had to formalize what I thought of a tuple as being, I came up with something very much like the above, except that if one wants to be relational one needs a requirement like:
In any one tuple, each field-name must be unique.
In line with that definition, I’d say a table is something like: Read more
VoltDB finally launches
VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:
- VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
- VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
- VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
- VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
- In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
- VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
- A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
- Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
- Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
- In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)
More on Sybase IQ, including Version 15.2
Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I’m finally getting around to doing so. Highlights include but are not limited to:
- Slide 2 has some market success figures and so on. (>3100 copies at >1800 users, >200 sales last year)
- Slides 6-11 give more detail on Sybase’s indexing and data access methods than I put into my recent technical basics of Sybase IQ post.
- Slide 16 reminds us that in-database data mining is quite competitive with what SAS has actually delivered with its DBMS partners, even if it doesn’t have the nice architectural approach of Aster or Netezza. (I.e., Sybase IQ’s more-than-SQL advanced analytics story relies on C++ UDFs — User Defined Functions — running in-process with the DBMS.) In particular, there’s a data mining/predictive analytics library — modeling and scoring both — licensed from a small third party.
- A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)
Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven’t gotten around to yet.
More recently, Sybase volunteered permission for me to preannounce Sybase IQ Version 15.2 by a few days (it’s scheduled to come out this week). Read more
Technical basics of Sybase IQ
The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I’ve been slow about posting based on those briefings. But what with Sybase being acquired by SAP, Sybase having an analyst meeting this week, and other reasons – well, this seems like a good time to post about Sybase IQ. 🙂
For starters, Sybase IQ is not just a bitmapped system, but it’s also not all that closely akin to C-Store or Vertica. In particular,
- Sybase IQ stores data in columns – like, for example, Vertica.
- Sybase IQ relies on indexes to retrieve data – unlike, for example, Vertica, in which the column pretty much is the index.
- However, columns themselves can be used as indexes in the usual Vertica-like way.
- Most of Sybase IQ’s indexes are bitmaps, or a lot like bitmaps, ala’ the original IQ product.
- Some of Sybase IQ’s indexes are not at all like bitmaps, but more like B-trees.
- In general, Sybase recommends that you put multiple indexes on each column because — what the heck – each one of them is pretty small. (In particular, the bitmap-like indexes are highly compressible.) Together, indexes tend to take up <10% of Sybase IQ storage space.
Categories: Columnar database management, Data warehousing, Database compression, Sybase, Theory and architecture | 3 Comments |
Further quick SAP/Sybase reactions
Raj Nathan of Sybase has been calling around to chat quickly about the SAP/Sybase deal and related matters. Talking with Raj didn’t change any of my initial reactions to SAP’s acquisition of Sybase. I also didn’t bother Raj with too many hard questions, as he was clearly in call-and-reassure mode, reaching out to customers and influencers alike.
That said, Read more