To a first approximation, Infobright – maker of BrightHouse — is yet another data warehouse DBMS specialist with a columnar architecture, boasting great compression and running on commodity hardware, emphasizing easy set-up, simple administration, great price-performance, and hence generally low TCO. BrightHouse isn’t actually MPP yet, but Infobright confidently promises a generally available MPP version by the end of 2008. The company says that experience shows >10:1 compression of user data is realistic – i.e., an expansion ratio that’s fractional, and indeed better than 1/10:1. Accordingly, despite the lack of shared-nothing parallelism, Infobright claims a sweet spot of 1-10 terabyte warehouses, and makes occasional references to figures up to 30 terabytes or so of user data.
BrightHouse is essentially a MySQL storage engine, and hence gets a lot of connectivity and BI tool support features from MySQL for “free.” Beyond that, Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values.
*Infobright makes regrettably confusing use of the terms “node” and “grid.” The other annoying aspect to their jargon is mumbo-jumbo about rough sets, complete with a team of four Polish mathematicians/computer scientists in Warsaw; I’m ignoring that part as hard as I can.
In addition to data pack nodes, there are knowledge nodes, which store information such as which pairs of data packs (on columns in different tables) would have hits in the case of a join on their respective columns. What sounds particularly cool is that the whole thing is very dynamic. Knowledge nodes are created only as and to the extent needed, but then persisted in case they’ll ever help with future queries. Query plans are reoptimized on the fly, depending on what the early results (e.g., from checking the knowledge nodes) turns up.
Since there’s very little in the way of indexing or anything index-like, BrightHouse isn’t automatically subject to the traditional columnar database problems with load speeds. Indeed, Infobright claims load speed as a major strength. There even are a couple of customers streaming in ticker data or the like, which leads me to anticipate some direct competition between Infobright and the StreamBase/Vertica partnership.
As befits a small company’s product, BrightHouse is still primitive in a number of ways, even beyond the lack of MPP. Despite the focus on “load-and-go” simplicity, there doesn’t seem to be any actual quasi-appliance packaging. (Actually, I’m not sure that’s particularly important or relevant except in an MPP version.) True insert/update/delete (as opposed to batch loads) is a coming-soon feature, promised for GA in Q12008.
Hmm. I’m forgetting what parts of company history (number of customers, specific customer names, etc.) are or aren’t confidential. So I’ll just say that actual customer evidence is definitely more than zero, and leave it to company management to add whatever else they want along those lines in the comment thread below.