Infobright
Analysis of Infobright and its MySQL-based data warehouse DBMS formerly known as Brighthouse. Related subjects include:
My current customer list among the analytic DBMS specialists
(This is an updated version of an August, 2008 post.)
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the analytic/data warehouse DBMS/appliance specialists in the group. They are:
- Aster Data
- Greenplum
- Infobright
- Kickfire
- Kognitio
- Microsoft
- Netezza (my biggest client this year, probably, because of all the Enzee Universe appearances)
- Sybase
- Teradata
- Vertica
- Attivio, which may or may not be construed as being in the analytic DBMS business
- Clearpace, ditto
All of those are Monash Advantage members.
If you care about all this, you may also be interested in the rest of my standards and disclosures.
| Categories: About this blog, Aster Data, Data warehousing, Greenplum, Infobright, Kickfire, Microsoft and SQL*Server, Netezza, Sybase, Teradata, Vertica Systems | 2 Comments |
Daniel Abadi on Kickfire and related subjects
Daniel Abadi has a new blog, whose first post centers around Kickfire. The money quote is (emphasis mine):
In order for me to get excited about Kickfire, I have to ignore Mike Stonebraker’s voice in my head telling me that DBMS hardware companies have been launched many times in the past are ALWAYS fail (the main reasoning is that Moore’s law allows for commodity hardware to catch up in performance, eventually making the proprietary hardware overpriced and irrelevant). But given that Moore’s law is transforming into increased parallelism rather than increased raw speed, maybe hardware DBMS companies can succeed now where they have failed in the past
Good point.
More generally, Abadi speculates about the market for MySQL-compatible data warehousing. My responses include:
- OF COURSE there are many MySQL users who need to move to a serious analytic DBMS.
- What’s less clear is whether there’s any big advantage to those users in remaining MySQL-compatible when they do move. I’m not sure what MySQL-specific syntax or optimizations they’d have that would be difficult to port to a non-MySQL system.
- It’s nice to see Abadi speaking well of Infobright and its technology.
- To say that Infobright went open source because it was “desperate” is overstated. That said, I don’t think Infobright was on track to prosper without going open source.
- While open source and MySQL go together, an appliance like Kickfire loses many (not all) of the benefits of open source.
- Calpont has indeed never disclosed a customer win. Any year now … (Just kidding, Vogel!)
- In general, seeing Abadi be so favorable toward Vertica competitors adds credibiity to the recent Hadoop vs. DBMS paper.
Anyhow, as previously noted, I’m a big Daniel Abadi fan. I look forward to seeing what else he posts in his blog, and am optimistic he’ll live up to or exceed its stated goals.
| Categories: Calpont, Columnar database management, DBMS product categories, Data warehouse appliances, Data warehousing, Infobright, Kickfire, MySQL, Open source, Theory and architecture | 2 Comments |
This week is a REALLY good time to actively strengthen the MySQL forkers
As my first three posts on the Oracle/Sun merger suggested, I think Oracle will do a better job with MySQL product development than Sun has. But of course that’s a low hurdle. And so it leaves open the questions:
What should and/or will be the most widely adopted code lines of MySQL (or other open source DBMS),
especially for the types of users and vendors who are engaged with MySQL (as opposed to principal alternative PostgreSQL) today?
As much as I’ve bashed MySQL/MyISAM and MySQL/InnoDB for being low-quality general-purpose DBMS, I’d still hate to see MySQL-based development stall out. There are a number of MySQL engine providers with rather unique technology, that deserve a good front-end partner to build their products with. The high-volume sharding guys deserve the chance to continue down their current path as well. And so does the low-end mass market — although I’m least worried about them, as I can’t imagine any realistic scenario in which Oracle doesn’t offer a version of MySQL fully suited to support 10s of millions of WordPress and Joomla installations.
So far as I can tell, there are only four real and currently active candidates for MySQL code coordinator:
- MySQL itself, soon to be owned by Oracle.
- MariaDB, Monty Widenius’ proposed mainstream MySQL alternative
- Percona, which seems to have some fans as a superior alternative to vendor-supplied MySQL/InnoDB
- Drizzle, which is directly focused at web-centric MySQL users who never wanted a robust DBMS in the first place.
Patrick Galbraith and Steven Vaughan-Nichols did good jobs of illustrating the turmoil.
Oracle isn’t a very comfortable partner long term for the storage engine vendors, and Drizzle doesn’t seem to be what they need. So I think that Infobright, Kickfire, Tokutek, Calpont, et al. need to get aligned in a hurry with an outside MySQL provider such as Percona or MariaDB or a newcomer, preferably all with the same one. Yes, I understand that Infobright is getting a lot of marketing help from Sun these days, that Kickfire just got a nice-sounding Sun marketing announcement as well, and so on. But the time to start working toward the inevitable future is now.
And by “now” I mean “right now,” since the MySQL community is at this moment gathered together for its annual conference.
| Categories: Infobright, Kickfire, MySQL, Open source | 12 Comments |
MySQL storage engine round-up, with Oracle-related thoughts
Here’s what I know about MySQL storage engines, more or less.
- MySQL with MyISAM is fast. But it’s not transactional. Except for limited purposes, MySQL with MyISAM is a pretty crummy DBMS. Nothing can change that.
- MySQL with InnoDB is transactional. But it’s not particularly fast. MySQL with InnoDB is a pretty mediocre DBMS. Oracle could fix that, at least partially, over time.
- I don’t know much about Falcon, Maria, and so on. With Oracle winding up owning both MySQL and InnoDB, the motivation for those engines (except as Oracle-free forks) might fade.
- Infobright is the most established of the rest. At the moment I’m not recommending it for most industrial-strength uses unless the user is particularly cash-constrained. But I wouldn’t be surprised if that changed soon. A cheap, fast, simple columnar analytic DBMS has a place in the world.
- Kickfire is next in line, offering a hardware-based growth path for users who’ve maxed out on what unaided MySQL can do. It remains to be seen for how many users the desire to keep things simple and stay with MySQL outweighs the desire to avoid custom hardware. Having Oracle salespeople all over those accounts surely wouldn’t help. Kickfire also has a second market, namely OEM vendors who are mainly interested in the superfast chip. That would probably be pretty unaffected by Oracle.
- Tokutek offers a technical proposition that’s hard to match head-on without going the CEP route. Users who care are likely to be MySQL shops. Tokutek’s main challenge is to prove that it sufficiently outdoes competing technical strategies for sufficiently many users. Oracle ownership of MySQL seems pretty irrelevant to Tokutek’s success or failure.
- Calpont offers a kind of lightweight Exadata alternative. With Calpont’s packaging and positioning perennially unclear, it’s difficult to predict the effect of a particular change — i.e., Oracle buying MySQL — in Calpont’s market environment.
- I haven’t heard from transactionally-oriented ScaleDB since I wrote about them a year ago. Apparently, they’re rolling out beta product this week, and their venerable techie guru sadly passed away earlier this month.
| Categories: Calpont, Columnar database management, Data warehousing, Exadata, Infobright, Kickfire, MySQL, Open source, Oracle, Tokutek | 13 Comments |
Infobright update
For the past couple of quarters, Infobright has been MySQL’s partner of choice for larger data warehousing applications. Infobright’s stated business metrics, and I quote, include:
> 50 Customers in 7 Countries
> 25 Partners on 4 continents
A vibrant open source community
+1 million visitors
Approaching 10,000 downloads
2,000 active community participants
These may be compared with analogous metrics Infobright offered in February.
Infobright has also made or promised a variety of technological enhancements. Ones that are either shipping now or promised soon include:
| Categories: Columnar database management, Data warehousing, Infobright, MySQL, Open source | 4 Comments |
Database implications if IBM acquires Sun
Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion today (some links below). Here are some quick thoughts around the subject of how the IBM/Sun deal — if it happens — might affect the database management system industry.
Infobright update
Infobright briefed me, and I thought it would be best to invite them to provide a write-up themselves of what customer and other information they did and didn’t want to disclose, for me to publish. Read more
| Categories: Application areas, Data warehousing, Infobright, Open source, Telecommunications, Web analytics | 2 Comments |
Draft slides on how to select an analytic DBMS
I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night. Anybody see something I’m overlooking, or just plain got wrong?
Edit: The slides have now been finalized.
Introduction to Pentaho
I finally caught up with Pentaho, which along with Jaspersoft is one of the two most visible open source business intelligence companies, Actuate perhaps excepted. Highlights included:
- Much like Jaspersoft, Pentaho’s initial focus was mainly on embedded, operational BI.
- However, Pentaho now feels it has a decent end-user GUI as well, and traditional-BI is a bigger part of sales.
- Also, some sales are focused on data integration, perhaps in support of more traditional BI products. Pentaho has even had an Ab Initio replacement in data integration. (Can there be any change more extreme than going from Ab Initio to open source?)
- As an example of technical breadth, Pentaho says that its Mondrian OLAP engine is used by Jaspersoft.
- Pentaho has Excel output, but not in the form of live formulas.
- Pentaho does XQuery.
- Industries with more Pentaho adoption than average include:
- Financial services (traditionally open-source-friendly, according to Pentaho)
- Government (ditto)
- Web 2.0 (obviously ditto)
- Travel/transportation (cash-strapped)
- Frontier Airlines is a Pentaho/Greenplum customer.
- TradeDoubler is a Pentaho/InfoBright customer. (Pentaho thinks that TradeDoubler reloads its warehouse every day, which if true frankly casts some doubt on InfoBright’s architecture.)
- Data mining is something of a Pentaho sideline. There’s some university in New Zealand that built data mining capabilities in Pentaho, and some data mining research is done in that. Separately, Pentaho has been integrated with R.
- Community contributions are concentrated in the areas you’d expect — features some user or system integrator needs for a specific project, connectors, bug reports, and the like.
| Categories: Ab Initio Software, Application areas, Business intelligence, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Greenplum, Infobright, Jaspersoft, Pentaho, Pricing | 5 Comments |
Database compression is heavily affected by the kind of data
I’ve written often of how different kinds or brands of data warehouse DBMS get very different compression figures. But I haven’t focused enough on how much compression figures can vary among different kinds of data. This was really brought home to me when Vertica told me that web analytics/clickstream data can often be compressed 60X in Vertica, while at the other extreme — some kind of floating point data, whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type app. Greenplum’s customer getting 7.5X — high for a row-based system — is managing clickstream data and related stuff. Bottom line:
When evaluating compression ratios — especially large ones — it is wise to inquire about the nature of the data.
| Categories: Data warehousing, Database compression, Greenplum, Infobright, Vertica Systems, Web analytics | 4 Comments |
Web analytics — clickstream and network event data
It should surprise nobody that web analytics – and specifically clickstream data — is one of the biggest areas for high-end data warehousing. For example:
- I believe that both of the previously mentioned petabyte+ databases on Greenplum will feature clickstream data.
- Aster Data’s largest disclosed database, by almost two orders of magnitude, is at MySpace.
- Clickstream analytics is a big application area for Vertica Systems.
- Clickstream analytics is a big application area for Netezza.
- Infobright’s customer success stories appear to be concentrated in clickstream analytics.
- Coral8 tells me that CEP is also being used for clickstream data, although I suspect that a lot of Coral8’s evidence in that regard comes from a single flagship account. Edit: Actually, Coral8 has a bunch of clickstream customers.
| Categories: Aleri and Coral8, Aster Data, Complex event processing (CEP), Greenplum, Infobright, Netezza, Vertica Systems, Web analytics | 2 Comments |
Infobright update
In connection with the announcements that:
- Infobright is open sourcing its analytical DBMS product (which is a really good idea)
- Infobright raised a $10 million VC round, with Sun as a new investor
I got my first real Infobright update since January. Highlights included:
| Categories: Data warehousing, Infobright, MySQL, Open source | 2 Comments |
Infobright’s open source move has a lot of potential
Infobright announced today that it’s going full-bore into open source – specifically in the MySQL ecosystem — with the licensing approach, pricing, distribution strategy, and VC money from Sun that such a move naturally entails. I think this is a great idea, for a number of reasons: Read more
| Categories: Data warehousing, Infobright, MySQL, Open source, Uncategorized | 4 Comments |
Infobright goes open source — sound bites
As has recently become my custom when there is industry news, I herewith provide quotable sound bites about Infobright and its move to an open source strategy. Weather permitting, I’ll be on a plane to the Netezza conference this afternoon. And I’ve only slept about 10 hours since Thursday. So I hope these suffice, although if they don’t and you email me I’ll try to respond by some time Tuesday morning.
- For almost anybody in the MySQL world who needs high-performance analytics, Infobright is the first good solution.
- Infobright’s product strengths and use cases are a great match for open source.
- Most leading analytic DBMS have open source roots, but they generally haven’t been open sourced themselves. Infobright immediately becomes one of the premier open source analytic database offerings. The only serious open source rival that’s coming to mind is MonetDB.
- Storage engines are MySQL’s achilles heel. Each good MySQL storage engine is precious.
- Infobright has enough production references to show that it can get the job done for many data mart uses. It won’t meet everybody’s needs, but it’s well worth an experimental download.
- If you want to build a little data mart and run it yourself, most good products are too complicated or expensive. But in the right use cases, Infobright pretty much runs itself, and there’s no arguing with the Community Edition price (free).
- So Infobright is a great fit for the individual downloader – i.e., for the stereotypical open source user.
- Netezza, DATAllegro, Vertica, ParAccel, Greenplum, and Aster Data are all based in one way or another on PostgreSQL (even though Vertica includes no PostgreSQL code). DATAllegro was based on Ingres. Infobright and Kickfire are based on MySQL.
- If Infobright doesn’t get the job done, try downloading Vertica, which – while closed source – is also free for download and development.
- The “rough set” part of Infobright’s story is a lot of mumbo-jumbo, but the “knowledge grid” part is more real.
- When you compare Infobright to Teradata, Netezza, Greenplum, or even Vertica, it’s kind of a toy. But when you compare it to generic MySQL, it’s more like rocket science.
- Infobright was too little, too late in the mainstream analytic DBMS market. They had to do something different. Kudos to them for recognizing that.
- The Infobright product has some serious limitations. If you want a market that’s willing to adopt a DBMS with serious limitations, the MySQL world is the place for you.
Posts today on open source DBMS
- Infobright’s smart move to open source
- General Infobright update
- Infobright sound bites
- The many faces of open source DBMS
| Categories: Data warehousing, Infobright, MySQL, Open source | 3 Comments |
My current customer list among the data warehouse specialists
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the data warehouse DBMS/appliance specialists in the group. They are:
- Aster Data
- Calpont
- DATAllegro
- Greenplum
- Infobright
- Netezza
- ParAccel
- Teradata
- Vertica
All of those are Monash Advantage members.
If you care about all this, you may also be interested in the rest of my standards and disclosures.
| Categories: About this blog, Aster Data, Calpont, DATAllegro, Data warehousing, Greenplum, Infobright, Netezza, ParAccel, Teradata, Vertica Systems | 3 Comments |
How is MySQL’s join performance these days?
In a comment thread on a recent post comparing MySQL to Postgres, Jonathon Moore chimed in based on experience with both products. His characterization of some MySQL problems: Read more
| Categories: Infobright, MySQL, Open source | 6 Comments |
Outsourced data marts
Call me slow on the uptake if you like, but it’s finally dawned on me that outsourced data marts are a nontrivial segment of the analytics business. For example:
- I was just briefed by Vertica, and got the impression that data mart outsourcers may be Vertica’s #3 vertical market, after financial services and telecom. Certainly it seems like they are Vertica’s #3 market if you bundle together data mart outsourcers and more conventional OEMs.
- When Netezza started out, a bunch of its early customers were credit data-based analytics outsourcers like Acxiom.
- After nagging DATAllegro for a production reference, I finally got a good one — TEOCO. TEOCO specializes in figuring out whether inter-carrier telcom bills are correct. While there’s certainly a transactional invoice-processing aspect to this, the business seems to hinge mainly around doing calculations to figure out correct charges.
- I was talking with Pervasive about Pervasive Datarush, a beta product that lets you do super-fast analytics on data even if you never load it into a DBMS in the first place. I challenged them for use cases. One user turns out to be an insurance claims rule-checking outsourcer.
- One of Infobright’s references is a French CRM analytics outsourcer, 1024 Degres.
- 1010data has built up a client base of 50-60, including a number of financial and retail blue-chippers, with a soup-to-nuts BI/analysis/columnar database stack.
- I haven’t heard much about Verix in a while, but their niche was combining internal sales figures with external point-of-sale/prescription data to assess retail (especially pharma) microtrends.
To a first approximation, here’s what I think is going on. Read more
Positioning the data warehouse appliances and specialty DBMS
There now are four hardware vendors that each offer or seem about to announce two different tiers of data warehouse appliances: Sun, HP, EMC, and Teradata. Specifically:
-
Sun partners with both Greenplum and ParAccel.
-
HP sells Neoview, and also is partnered with Vertica.
-
EMC (together with Dell in North America and Bull in Europe) sells DATAllegro. Now EMC is also entering a partnership with ParAccel.
-
Teradata is pretty far down the road toward releasing a low-end product.
Will Brighthouse become the MySQL data warehouse of choice?
As I’ve previously noted:
- Infobright is about to make more noise about its MySQL-based data warehouse software, Brighthouse.
- Brighthouse has some very interesting technical features.
- A Sun/Infobright partnership would make a lot of sense.
Talking with Infobright today, I was again struck by how close their relationship with MySQL (the company is). Stay tuned.
| Categories: Analytic technologies, Data warehousing, Infobright, MySQL | Leave a Comment |
Infobright is gearing up for a press push
There’s another TDWI conference coming up, so it’s time for data warehouse-related press rollouts. Infobright (one of my many clients in this area) will be doing one of them, and ran an early version by me. Customer announcements, vendor partnerships, and so on are still being finalized, but anyhow Infobright has 7 revenue-recognized customers and a bunch more that are sold and in the implementation cycle. There’s a Release 3 of Brighthouse coming up. As one would expect, Release 3’s major claims to fame are the general addition of features (including some which elicit a “You didn’t have that already?” reaction), plus huge performance improvements in some queries (i.e., the biggest bottlenecks in Brighthouse Release 2).
On that level, it’s all standard stuff, as is Infobright’s core pitch — ease, simplicity, low cost, etc., and the benefits of same. But drilling down, there are some rather unique technical claims. Read more
| Categories: Analytic technologies, Data warehousing, Infobright | 1 Comment |
Things could get interesting for Infobright
Of the many new specialty data warehouse DBMS and appliances, Infobright’s BrightHouse is the only leading one based on MySQL. I expect Sun and Infobright to have some interesting conversations now. Conversely, I wouldn’t be optimistic about any partnering discussions Infobright might have with, say, HP.
The most directly competitive relationship Sun now has to any future Infobright partnership is with ParAccel.
| Categories: Analytic technologies, Data warehousing, Infobright, MySQL, Open source, ParAccel | 2 Comments |
Infobright responds
An InfoBright employee posted something quite reasonable-looking in response to my inaugaral post about BrightHouse. Even so, InfoBright asked if they could substitute something with a slightly different tone. I agreed. Here’s what they sent in.
Curt, thanks for the write-up and the opportunity to talk about our customer success stories. As you say, our customer story is definitely “more than zero.” We are addressing a number of critical customer issues with our unique approach to data warehousing.
Infobright currently has 5 customers - customers that have bucked the trend of throwing hardware at the problem. To be perfectly braggadocio about this, we have never lost a competitive proof of concept in which we’ve been engaged. This is accomplished with the horsepower of one box (though for redundancy customers may deploy multiple boxes with a load balancer).
| Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Infobright | Leave a Comment |
Infobright BrightHouse — columnar, VERY compressed, simple, and related to MySQL
To a first approximation, Infobright – maker of BrightHouse — is yet another data warehouse DBMS specialist with a columnar architecture, boasting great compression and running on commodity hardware, emphasizing easy set-up, simple administration, great price-performance, and hence generally low TCO. BrightHouse isn’t actually MPP yet, but Infobright confidently promises a generally available MPP version by the end of 2008. The company says that experience shows >10:1 compression of user data is realistic – i.e., an expansion ratio that’s fractional, and indeed better than 1/10:1. Accordingly, despite the lack of shared-nothing parallelism, Infobright claims a sweet spot of 1-10 terabyte warehouses, and makes occasional references to figures up to 30 terabytes or so of user data.
BrightHouse is essentially a MySQL storage engine, and hence gets a lot of connectivity and BI tool support features from MySQL for “free.” Beyond that, Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values. Read more
