Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Infobright’s Release 3.4
Infobright called a couple weeks ago to discuss, among other subjects, its subsequently-released Infobright Release 3.4. I made no effort to distinguish between community/open source and professional/chargeable editions, but leaving that aside, it seems fair to characterize Infobright 3.4 as having two overlapping primary themes:
- Performance and bottleneck cleanup.
- “Omigod, you mean you didn’t have that feature before?” cleanup.
That said, the traditional release for cleaning up the last huge gaps in an analytic DBMS product seems have become 4.0; recent examples include Aster Data, Vertica and Greenplum. Infobright seems on track to be another example of that rule.
Ack. Now that I’ve said that, other vendors are going to be tempted to accelerate their numbering so as to reach the 4.0 mark sooner …
A lot of Infobright performance enhancements are in the vein “We used to rely on generic MySQL for that, but now we do it ourselves, and it works a lot better.” Examples include: Read more
Categories: Data warehousing, Infobright, MySQL, Workload management | 6 Comments |
Netezza’s version of EnterpriseDB-based Oracle compatibility
EnterpriseDB has some deplorable business practices (my stories of being screwed by EnterpriseDB have been met by “Well, you’re hardly the only one”). But a couple of more successful DBMS vendors have happily partnered with EnterpriseDB even so, to help pick off Oracle users. IBM’s approach was in the vein of an EnterpriseDB–infused version of SQL handling within DB2.* Netezza just announced an EnterpriseDB-based Netezza Migrator that is rather different.
*The comment threads are the most informative parts of those posts.
I’m a little unclear as to the Netezza Migrator details, not least because Netezza folks don’t seem to care too much about Netezza Migrator themselves. That said, the core ideas of Netezza Migrator are: Read more
Categories: Data integration and middleware, Data warehousing, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Netezza, Oracle | 19 Comments |
Flash is coming, well …
I really, really wanted to title this post “Flash is coming in a flash.” That seems a little exaggerated — but only a little.
- Netezza now intends to come out with a flash-based appliance earlier than it originally expected.
- Indeed, Netezza has suspended — by which I mean “scrapped” — prior plans for a RAM-heavy disk-based appliance. It will use a RAM/flash combo instead.*
- Tim Vincent of IBM told me that customers seem ready to adopt solid-state memory. One interesting comment he made is that Flash isn’t really all that much more expensive than high-end storage area networks.
Uptake of solid-state memory (i.e. flash) for analytic database processing will probably stay pretty low in 2010, but in 2011 it should be a notable (b)leading-edge technology, and it should get mainstreamed pretty quickly after that. Read more
Categories: Data integration and middleware, Data warehousing, IBM and DB2, Memory-centric data management, Netezza, Solid-state memory, Theory and architecture | 4 Comments |
My talk this morning
Netezza’s Enzee Universe conference is now almost over, and I still haven’t figured out what my gig as “conference blogger” entails. More precisely, I’m operating from our unspoken fallback plan, namely “If all else fails, do what you’d do anyway, but do more of it.” For me to live up to that, all Netezza had to do was find interesting things to write about — and as far as I’m concerned, they already did that last Thursday in spades; the five interesting meetings they set up for with users and partners on Tuesday were just gravy.
Another part of the deal was that I’d give a talk this morning at 9:30 am. And when I give talks, I like to put up posts that cover whatever material I haven’t written up before, while also offering the talk’s listeners convenient links to materials I have already covered previously at length.
Categories: Analytic technologies, Business intelligence, Data warehousing, Netezza, Presentations | 3 Comments |
What kinds of data warehouse load latency are practical?
I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:
- Subsecond load latency is substantially impossible. Doing that amounts to OLTP.
- 5 seconds or so is doable with aggressive investment and tuning.
- Several minute load latency is pretty easy.
- 10-15 minute latency or longer is now very routine.
There’s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.
I’d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both Vertica and ParAccel designed in workarounds from the getgo. Aster Data probably didn’t meet these criteria until Version 4.0, its old “frontline” positioning notwithstanding, but I think it does now.
Related link
-
Just what is your need for speed anyway?
Categories: Analytic technologies, Aster Data, Columnar database management, Data warehousing, IBM and DB2, Netezza, ParAccel, Vertica Systems | 4 Comments |
The Netezza and IBM DB2 approaches to compression
Thursday, I spent 3 ½ hours talking with 10 of Netezza’s more senior engineers. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed we needed at least 2 hours more. In both cases, the compression part of the discussion seems like a good candidate to split out into a separate post. So here goes.
When you sell a row-based DBMS, as Netezza and IBM do, there are a couple of approaches you can take to compression. First, you can compress the blocks of rows that your DBMS naturally stores. Second, you can compress the data in a column-aware way. Both Netezza and IBM have chosen completely column-oriented compression, with no block-based techniques entering the picture to my knowledge. But that’s about as far as the similarity between Netezza and IBM compression goes. Read more
Categories: Data warehousing, Database compression, IBM and DB2, Microsoft and SQL*Server, Netezza | 17 Comments |
Netezza’s silicon balance
As I’ve mentioned in a couple of other posts, Netezza is stressing that the most recent wave of its technology is software-only, with no hardware upgrades made or needed. In other words, Netezza boxes already have all the silicon they need. But of course, there are really at least three major aspects to the Netezza silicon story – FPGA (Field-Programmable Gate Array), CPU, and RAM.
- Netezza planned to be “generous” in its original TwinFin FPGA capacity, anticipating software upgrades like the ones it’s introducing now. It is satisfied that this strategy worked. More on this below.
- The same surely applies to CPU.
- What’s more, I get the sense that the CPU turned out in practice to be even more over-provisioned than they anticipated …
- … at least when one just considers Netezza’s base NPS software.
- However, I suspect that if the advanced analytics capability takes off, Netezza will determine that more CPU is always better.
- And by the way, NEC is making versions of Netezza appliances with more advanced chips than Netezza is. So if anybody should really, really need more CPU in their Netezza boxes, there’s a very straightforward way to make that happen. (And if there were nontrivial demand for that, appropriate support plans could surely be structured.)
- Everybody needs to be careful about RAM. Netezza is surely no exception.
The major parts of Netezza’s FPGA software are:
- Compress Engine 2. This is Netezza’s new way of doing compression.
- Compress Engine 1. This is Netezza’s old way of doing compression. It is being kept around so that existing Netezza tables don’t suddenly have to be changed or reloaded.
- Project Engine. Guess what this does.
- Restrict Engine. Ditto.
- Visibility Engine. This enforces ACID and handles row-level security. It is “sort of a corner of” the Restrict Engine (Actually, Netezza seems to waver as to whether to describe “Restrict” and “Visibility” as being two engines or one.)
- Miscellaneous plumbing.
If I understood correctly, each Netezza FPGA has two each of the engines in parallel.
Related link
- An August, 2009 post on what Netezza does in its FPGA
Categories: Data warehouse appliances, Data warehousing, Database compression, Netezza, Theory and architecture | Leave a Comment |
A partial overview of Netezza database software technology
Netezza is having its user conference Enzee Universe in Boston Monday–Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza Thursday for meetings that included a 3 ½ hour session with 10 or so senior engineers, and have exchanged some clarifying emails since. Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Theory and architecture, Workload management | 15 Comments |
Notes on a spate of Netezza-related blog posts
Fearing that last year’s tight travel budgets would hamper attendance, Netezza – like a number of other vendors – decided to forgo a traditional user conference. Instead, it took its Enzee Universe show on the road, essentially spreading the conference across eight cities. I was asked to keynote six of the installments.
After the first one, Netezza Marketing VP Tim Young took me aside for two pieces of constructive criticism. The surprising one* was that he felt I had been INSUFFICIENTLY critical of Netezza. Since then, every other conversation we’ve had about content creation has also featured ringing reassurances that Tim truly wants independent, non-pandering work.
*The unsurprising one was that I’d rushed. Well, duh. After months of telling me I had a 1 hour slot, Netezza cut me to ½ hour a few days beforehand. And my talk had been designed to be high-speed even in the longer time slot …
As a result, I accepted a subsequent gig from Netezza that I would barely consider from most other vendors. Namely, for this year’s Enzee Universe – June 21-23, aka Monday-Wednesday of this week, at the Westin Waterfront Hotel in Boston – I would do some contemporaneous blogging. The parameters we agreed on included: Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Presentations | 3 Comments |
Best practices for analytic DBMS POCs
When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:
- How quickly and cost-effectively does it execute SQL?
- What analytic functionality, SQL or otherwise, does it do a good job of executing?
And so, in undertaking such a selection, you need to start by addressing three issues:
- What does “speed” mean to you?
- What does “cost” mean to you?
- What analytic functionality do you need anyway?
Categories: Benchmarks and POCs, Data warehousing, Exadata, Netezza, ParAccel, Teradata | 7 Comments |