Kickfire’s FPGA-based technical strategy
Kickfire’s basic value proposition is that, if you have a data warehouse in the 100s of gigabytes, they’ll sell you – for $32,000 – a tiny box that solves all your query performance problems, as per the Kickfire spec sheet. And Kickfire backs that up with a pretty cool product design. However, thanks in no small part to what was heretofore Kickfire’s penchant for self-defeating secrecy, the Kickfire story is not widely appreciated.
Fortunately, Kickfire is getting over its secrecy kick. And so, here are some Kickfire technical basics.
- Kickfire is MySQL-based, with all the SQL functionality and lack of functionality that entails.
- The Kickfire/MySQL DBMS is columnar, with the usual benefits in compression and I/O reduction.
- Kickfire is based on FPGAs (Field-Programmable Gate Arrays).
- The Kickfire DBMS is ACID-compliant.
- Kickfire runs only as a single-box appliance.
- While Kickfire earlier estimated that, at least for data sets that compressed well, a Kickfire box could hold 3-10 terabytes of user data, more recent figures I’ve heard from Kickfire have been in the 1-1 /2 terabyte range. (Edit: Karl Van Der Bergh subsequently wrote in to say that the 1 1/2 TB is raw disk figure, not user data.)
The new information there is that Kickfire relies on an FPGA; Read more
Categories: Analytic technologies, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Kickfire, MySQL, Theory and architecture | 16 Comments |
Sorting out Netezza and Oracle Exadata data warehouse appliance pricing
Netezza recently announced a new generation of data warehouse appliance called TwinFin. TwinFin’s clearest stated list price is “a little under $20,000 per terabyte of user data,” which in my opinion immediately became the new industry reference point for discussing prices in the data warehouse appliance category. Vigorous discussion ensued, especially in the comment thread to the first of the two posts linked above. Here’s some followup.
Netezza should not have claimed a “10-15X price/performance improvement,” based on a 3-5X performance improvement and a 3X decrease in price/terabyte, and I should have grilled Netezza harder when it first made the claim. In fact, there is no unit of performance that you can, in a reasonable blended average, get 10-15X more of per dollar in TwinFin than you can in the predecessor NPS series.
Categories: Data warehousing, Exadata, Netezza, Oracle, Pricing | 19 Comments |
What does Netezza do in the FPGAs anyway, and other questions
The news of Netezza’s new TwinFin product family has generated a lot of comments and questions, some pretty reasonable, some quite silly. E.g., I’ve seen it suggested privately or publicly that
- Netezza’s older products only handle one query at a time (nonsense, and I’m going to loyally protect the identity of the person who emailed that odd suggestion to me)
- A Netezza node can be a single point of failure (also nonsense, although performance degradation from a node failure might be considerable)
- Netezza has a cache consistency problem (also hardly true, except insofar as it’s an issue to overcome in future development as Netezza moves toward parallelizing bulk loads, transactional updates, and/or trickle feeds).
Netezza’s Phil Francisco addressed some points of this nature in a recent blog post.
More reasonable is the question:
Now that Netezza has changed its architecture, what are all those FPGAs (Field-Programmable Gate Arrays) being used for anyway?
The short answer is: Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Theory and architecture | 6 Comments |
Dataupia is officially for sale
Dataupia marketing VP Samantha Stone — who by the way has been one heck of a trooper through Dataupia’s troubles — is joining the exodus from the company. General graciousness aside, the heart of Samantha’s farewell email reads:
Unfortunately, we have had to reduce our burn rate as we seek an acquirer for our technology.
We have a group of loyal employees remaining on staff focused on current production customers and the acquisition efforts.
As part of the most recent staff reductions I will be leaving Dataupia.
Two years ago I wrote:
[Dataupia would] make a great acquisition for a BI company or DBMS vendor who could then say “Oh, no, this isn’t a DBMS appliance – it’s merely a data warehouse accelerator.” When you look at it that way, their chances of prospering look distinctly higher.
But at this point I think there probably would be more appealing ways for those vendors to meet the same needs.
Categories: Data warehouse appliances, Data warehousing, Dataupia, Emulation, transparency, portability | 14 Comments |
Please ping me if one of your comments doesn’t appear
I just found two comments that went to Akismet spam wrongly, one because the author (Marcin Zukowski) pinged me, and one because I searched my spam folder on “Netezza” and there it was.
If one of your comments doesn’t go up, please ping me, and also suggest a keyword I could search on to find it.
I’m sorry for any inconvenience!
Categories: About this blog | Leave a Comment |
FlexStore and the rest of Vertica 3.5
Today, Vertica is announcing its 3.5 release, timed in line with a TDWI conference. Vertica 3.5 is scheduled to go into beta test in mid-August and be released to general availability in early October. Vertica 3.5 highlights include:
- Vertica/MapReduce integration, which I’m covering in a separate post.
- A new storage architecture called Vertica FlexStore, which seems to boil down essentially to three things:
- A sort of row/column hybridization — Vertica would probably prefer to call it something like a column clustering feature — that I’m also covering in a separate post.
- The beginnings of a multi-temperature capability, somewhat akin to Teradata Virtual Storage.
- Enhancements to Vertica’s WOS (Write-Optimized Store, the in-memory part of Vertica that first receives updates). I don’t understand WOS architecture well enough to write about that yet.
- Load-balancing, to route queries evenly among Vertica nodes — probably just round-robin — rather than having them just be processed by whichever node happens to receive them.
PAX Analytica? Row- and column-stores begin to come together
Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness. Their case generally runs along the lines:
- Analytic queries commonly return only a fraction of all possible columns.
- Only returning the columns needed
- Saves I/O
- Saves cache space
- Reduces processing
- Facilitates compression
- Presumably all those row-based MPP vendors just went row-based because they had a fine row-based DBMS (usually but not always PostgreSQL) to build on.
Pushbacks to this argument from row-based vendors include:
- Yes, but it’s harder to update a column store
- Yes, but there are more steps to retrieving a bunch of columns than there are to retrieving the same information from row stores
Categories: Analytic technologies, Columnar database management, Data warehousing, Theory and architecture, VectorWise, Vertica Systems | 11 Comments |
Vertica’s version of MapReduce integration
I talked with Omer Trajman of Vertica Monday night about Vertica’s MapReduce integration, part of its Vertica 3.5 release. Highlights included:
- By “integrating Vertica and MapReduce,” Vertica means “integrating Vertica and Hadoop.”
- Vertica’s Hadoop integration is based on Cloudera’s DBInputFormat.
- Omer called out for me several features of Vertica’s Hadoop integration that didn’t just come from Cloudera, namely:
- Cloudera’s DBInputFormat assumes the database runs on a single computer, or a single head node of an MPP system. Vertica’s technology, however, runs on peer parallel nodes with no head, and so Vertica adapted the DBInputFormat technology accordingly.
- Vertica lets you push down Map functions to the database. Omer reports a roughly even division among users and prospects between those who want to do this and ones who don’t.
- Vertica lets you do Reduce functions (or Map functions, if you don’t push them down to the database) on a separate cluster than you run the database software. Vertica asserts that its customers and prospects all want to do this. Right here is the big difference between Vertica’s MapReduce integration and Aster’s or Greenplum’s. (Aster would also say that Vertica’s weaker MapReduce/SQL programming integration is a big difference as well.)
- Indeed, Vertica lets you Reduce into a different DBMS than Vertica, if you choose.
- Vertica gives you flexibility on the size of the Map and Reduce clusters. Omer agreed with me when I said there were some limits on how fast one can add or subtract nodes in a Vertica grid, because there’s data redistribution involved. But one can add/change/delete Hadoop clusters extremely quickly.
Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically: Read more
VectorWise, Ingres, and MonetDB
I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn’t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic Daniel Abadi. Basic facts that you may already know include:
- VectorWise, the product, will be an open-source columnar analytic DBMS. (But that’s not quite true. Pending productization, it’s more accurate to call the VectorWise technology a row/column hybrid.)
- VectorWise is due to be introduced in 2010. (Peter Boncz said that to me more clearly than I’ve seen in other coverage.)
- VectorWise and Ingres have a deal in which Ingres will at least be the exclusive seller of the VectorWise technology, and hopefully will buy the whole company.
- Notwithstanding that it was once named something like “MonetDB,” VectorWise actually is not the same thing as MonetDB, another open source columnar analytic DBMS from the same research group.
- The MonetDB and VectorWise research groups consist in large part of academics in Holland, specifically at CWI (Centrum voor Wiskunde en Informatica). But Ingres has a research group working on the project too. (Right now there are about seven “highly experienced” people each on the VectorWise and Ingres sides, although at least the VectorWise folks aren’t all full-time. More are being added.)
- Ingres and VectorWise haven’t agreed exactly how VectorWise and Ingres Classic will play together in the Ingres product line. (All of the obvious possibilities are still on the table.)
- VectorWise is shared-everything, just as Ingres is. But plans — still tentative — are afoot to integrate VectorWise with MapReduce in Daniel Abadi’s HadoopDB project.
Categories: Actian and Ingres, Analytic technologies, Columnar database management, Data warehousing, Database compression, MonetDB, Open source, Theory and architecture, VectorWise | 12 Comments |
The Boston Globe had an article on VoltDB
The Boston Globe article has more detail than Vertica and VoltDB have ever OKed me to put out, and some business details they’ve never given me.