Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Kickfire kicks off
I chatted with Raj Cherabuddi and others on the Kickfire (formerly C2) team for over an hour on Monday, and now have a better sense of their story. There are some very basic questions I still don’t have answers to; I’ll fill those in when I can.
Highlights of what I have and haven’t figured out so far include:
-
Kickfire’s technology has two main parts: A SQL co-processor chip and a MySQL storage engine.
-
Kickfire makes a Type 0 appliance. If I understood correctly, it contains the chip, a couple of standard CPU cores, and 64 gigs of RAM. Or else it contains just the chip, and is meant to be hooked up to a 2U box with 64 gigs of RAM. I’m confused.
-
The Kickfire box can handle up to 3 terabytes of user data. The disk required for that is 4-5 terabytes without redundancy, 2X with. Based on that formulation and other clues, I’m guessing Kickfire — unlike other appliance vendors — doesn’t build in storage itself.
-
I don’t know whether the Kickfire chip is true custom silicon or an FPGA emulation.
-
The essential idea of the chip is dataflow programming for SQL, with pipelining between operations. This eliminates the overhead of registers and context switching. I don’t know what the trade-offs are, if any.
-
Kickfire’s database software is columnar, operating on compressed data even in RAM. In that, Kickfire’s story is most similar to Vertica’s, although I’m guessing Exasol may do something similar as well. Like Vertica, Kickfire uses multiple compression methods (they’re reluctant to give detail, but agreed it would be fair to say they use both something like dictionary/token and something like delta compression).
-
Kickfire’s software is ACID-compliant. You can do incremental loads or trickle feeds. Bulk load speed is 100 Gb/hour. Kickfire’s solution for the traditional problem of updating column stores is called “snapshots.” Without giving details, they position that as similar to the Vertica solution.
-
Like other MySQL storage engines, Kickfire inherits whatever data connectivity, stored procedure capabilities, user-defined functions ability, etc. that MySQL has.
-
Kickfire has no paying customers, but does have a slide showing many logos of “prospects and beta customers.”
-
Kickfire has no MPP capabilities at this time, but says adding those is “on the roadmap” and will be “easy.”
-
Kickfire submitted a 100 Gb TPC-H result, in which it beat the previous leaders — Exasol, ParAccel, and Microsoft – on price-performance, and lagged only Exasol and ParAccel on absolute performance. Kickfire is extremely proud of this. Indeed, I don’t recall another vendor ascribing that much weight to them in the entire history of TPCs.* Kickfire seems unfazed by the fact that its result is for a system listed with a ship date 6 months in the future (I’m guessing that’s the latest the TPC will allow), while the other results are for systems available today.
*Somebody – perhaps adman extraordinaire Rick Bennett? — may want to check my memory on this, but I think Oracle’s famed “Gentlemen, start your snails” ad in the early 1990s was about PC World tests, not TPCs. Oracle also had an ad about WW1-style planes nosediving, but I don’t think those referenced TPCs either.
Kickfire is de-cloaking
Kickfire, the renamed C2, is doing one of those buzz-building rollouts in which they make sure the first word comes from people on their payroll golly-gee-whizzing. You can see those at Xarpb and Diamond Notes, as well as a forthcoming article in MySQL magazine. Farhan Mashraqi also appears to be involved. Kickfire is also sponsoring the MySQL user conference next week.
I plan to write more after I get some substance, but a few things seem clear:
1. Kickfire’s product is an appliance that functions as a MySQL storage engine.
2. There’s a custom chip involved.
3. Kickfire plans to throw around the “stream processing” buzzphrase a lot.
Now, “stream processing” means a lot of different things to different people. E.g., Netezza uses the phrase just because their FPGA throws away a lot of data before ever routing it to more conventional SQL processing. But pending a briefing, I’m guessing that Kickfire’s sense is similar to what underlies the case for using CEP in BI.
Edit: Here’s an update after an actual Kickfire briefing.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Kickfire, MySQL | 7 Comments |
Positioning the data warehouse appliances and specialty DBMS
There now are four hardware vendors that each offer or seem about to announce two different tiers of data warehouse appliances: Sun, HP, EMC, and Teradata. Specifically:
-
Sun partners with both Greenplum and ParAccel.
-
HP sells Neoview, and also is partnered with Vertica.
-
EMC (together with Dell in North America and Bull in Europe) sells DATAllegro. Now EMC is also entering a partnership with ParAccel.
-
Teradata is pretty far down the road toward releasing a low-end product.
EMC is partnering with ParAccel
A talk about a ParAccel/EMC partnership has been promised for a forthcoming EMC user conference. Otherwise, ParAccel is exposing no useful information on the matter.*
*So what else is new?
The talk is called Highly Scalable Analytic Appliance Powered by EMC and ParAccel, and the abstract says: Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, EMC, ParAccel | 2 Comments |
Netezza’s April Fool press release
Short and cute. Even makes a genuine marketing point (low power consumption), and ties into past marketing gimmicks (they’ve played Pimp My SPU in the past, with dramatic paint jobs).
Netezza Corporation (NYSE Arca: NZ), the global leader in data warehouse and analytic appliances, today introduced a limited-edition range of its award-winning Netezza system. Expected to become an instant industry collectible, the systems can now be purchased in a variety of color finishes – pink, blue, red or silver. The standard gun-metal gray unit will continue to be the default option for orders requiring eight or more units, to ensure availability.
Affectionately known as ‘the Netezza’ by customers and partners, the systems not only offer unparalleled processing performance, but the secret sauce of its innovative design is also leading the way in effective power and cooling management – making it a truly green option for any data center.
Not earth-shaking — even if it purports to be earth-saving — but unless I’ve overlooked a biggie, there isn’t much competition this rather lame April Fool’s year.
Categories: Data warehouse appliances, Data warehousing, Humor, Netezza | 5 Comments |
The illuminate guys have a CTO blog
If you want to know more about illuminate’s data warehouse offerings, CTO Joe Foley has a blog. A good starting point might be the post on value-based storage. Two key points seem to be:
The VBS also provides some data access features that can not be duplicated in any other structure. A search can be executed starting with a data value in the pool. By going from the value pool back to the index, it is possible to quickly locate every use of the value wherever is may be used in the logical record structures.
which makes sense, and
This structure also enables our incremental query capability. As the result of a query, the database returns a set of instance identifiers rather than a set of records. This is because there are no records, only pointers and values. With the response being a set of pointers, it is a simple matter to perform the next query step and then get the union or difference between the two sets of pointers for the result of the second query step. This process can be continued indefinitely with the result set shrinking or growing as the new results are merged with the old.
which still sounds like gobbledygook to me. Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, illuminate Solutions | Leave a Comment |
iLuminate’s correlation/associative approach to data warehousing
illuminate Solutions (small “i”) is an interesting little company, still rough around the edges. (E.g., the Press Release Archive page at i-lluminate.com says, in its entirety, “We are in the process of loading our historical press releases. Please check back the second week in March!” And I only got that much when I corrected an obvious typo in the URL in the menu bar.) According to CTO Joe Foley, illuminate has 37 or so employees, and 40+ customers, ¾ of whom are in their home country of Spain and ½ the rest of whom are in Latin America. Now they’re entering the US.
illuminate’s basic idea is one I’ve heard before, but mainly from companies with more of a search orientation*, such as Attivio: Take a collection of tables, create a big inverted index on all the values in all columns at once, and do queries on that. This, illuminate claims, obviates all sorts of database design problems and similar hassles you otherwise might have. illuminate’s buzzword for all this is “CDBMS”, where the “C” stands for correlation. The actual CDBMS product is called iLuminate; related business intelligence tools are called iCorrelate and iAnalyze. What iLuminate actually indexes is a token that holds four pieces of information: Instance identifier, table identifier, column identifier, and value. Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, illuminate Solutions | 2 Comments |
GridSQL: What EnterpriseDB is and is not doing in Postgres-based MPP data warehousing
While talking with EnterpriseDB about today’s Postgres Plus announcements, I took the chance to clear up a point of confusion. Somebody told Seth Grimes that EnterpriseDB is out to compete with Greenplum, but that person was wrong. EnterpriseDB fondly hopes to manage multi-terabyte data warehouses, just as Oracle and Microsoft do with their respective general-purpose DBMS. However, EnterpriseDB is not going after the 10s-100s of terabytes sized DBMS that are the province of specialists such as Greenplum, Teradata, Netezza, or columnar DBMS vendors.
Even so, in GridSQL EnterpriseDB does seem to be open-sourcing MPP shared-nothing basics. There’s a lightweight optimizer that does a little (but only a little) more to minimize data movement beyond just optimizing queries on each node. And GridSQL knows how to replicate small tables across each node, a key aspect of many MPP designs. (Partition your facts; replicate your dimensions.)
Categories: Analytic technologies, Data warehousing, EnterpriseDB and Postgres Plus, Greenplum, Open source, Parallelization | 1 Comment |
Data warehousing with paper clips and duct tape
An interesting part of my conversation with Dataupia’s CTO John O’Brien came when we talked about data warehousing in general. On the one hand, he endorsed the view that using Oracle probably isn’t a good idea for data warehouses larger than 10 terabytes, with SQL Server’s limit being well below that. On the other hand, he said he’d helped build 50-60 terabyte warehouses in Oracle years ago.
The point is that to build warehouses that big in Oracle or other traditional DBMS, you have to pull out a large bag of tricks. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Microsoft and SQL*Server, Oracle | 17 Comments |
Dataupia catch-up
I had a catch-up phone meeting with Dataupia, since I hadn’t spoke with the company since the middle of last year. Like several other companies in the data warehouse specialist market, Dataupia can be annoyingly secretive. On the plus side – and this is very refreshing — Dataupia doesn’t seem to expect credit for accomplishments beyond those they’re willing to provide actual evidence for.
What I’ve gleaned about Dataupia’s customer activity to date amounts to: Read more