Analytic technologies
Discussion of technologies related to information query and analysis. Related subjects include:
- Business intelligence
- Data warehousing
- (in Text Technologies) Text mining
- (in The Monash Report) Data mining
- (in The Monash Report) General issues in analytic technology
eBay thinks MPP DBMS clobber MapReduce
I talked with Oliver Ratzesberger and his team at eBay last week, who I already knew to be MapReduce non-fans. This time I added more detail.
Oliver believes that, on the whole, MapReduce is 6-8X slower than native functionality in an MPP DBMS, and hence should only be used sporadically. This view is based on part on simulations eBay ran of the Terasort benchmark. On 72 Teradata nodes or 96 lower-powered nodes running another (currently unnamed, as per yet another of my PR fire drills) MPP DBMS, a simulation of Terasort executed in 78 and 120 secs respectively, which is very comparable to the times Google and Yahoo got on 1000 nodes or more.
And by the way, if you use many fewer nodes, you also consume much less floor space or electric power.
| Categories: Analytic technologies, eBay, Hadoop, MapReduce, Parallelization, Teradata | 11 Comments |
Stonebraker, DeWitt, et al. compare MapReduce to DBMS
Along with five other coauthors — the lead author seems to be Andy Pavlo — famous MapReduce non-fans Mike Stonebraker and David DeWitt have posted a SIGMOD 2009 paper called “A Comparison of Approaches to Large-Scale Data Analysis.” The heart of the paper is benchmarks of Hadoop, Vertica, and “DBMS-X” on identical clusters of 100 low-end nodes., across a series of tests including (if I understood correctly):
- A couple of different flavors of a Grep task originally proposed in a Google MapReduce paper.
- A database query on simulated clickstream data
- A join on the same clickstream data.
- Two aggregations on the clickstream data.
| Categories: Analytic technologies, Hadoop, MapReduce, Michael Stonebraker, Parallelization, Vertica Systems | 6 Comments |
Ingres update
I talked with Ingres today. Much of the call was fluff — open-source rah-rah, plus some numbers showing purported success, but so finely parsed as to be pretty meaningless. (To Ingres’ credit, they did offer to let me talk w/ their CFO, even if they offered no promises as to whether he’d offer any more substantive information.) Highlights included: Read more
| Categories: Actian and Ingres, Data warehousing, EnterpriseDB and Postgres Plus, MySQL, Open source, Oracle, PostgreSQL, Sybase | 6 Comments |
Donald Farmer knocks the April Fool 8-ball out of the park
Donald Farmer has an excellently-crafted April Fool post about a revolution in business intelligence. Look at the character names, for example.
I wonder whether Donald learned operations research from that textbook where two main decision-making characters were Mark Off and his father Pop, an example company was Edifice Wrecks, and an example CEO was Dawn Shirley Light …
| Categories: Analytic technologies, Business intelligence, Humor | 1 Comment |
Business intelligence notes and trends
I keep not finding the time to write as much about business intelligence as I’d like to. So I’m going to do one omnibus post here covering a lot of companies and trends, then circle back in more detail when I can. Top-level highlights include:
- Jaspersoft has a new v3.5 product release. Highlights include multi-tenancy-for-SaaS and another in-memory OLAP option. Otherwise, things sound qualitatively much as I wrote last September.
- Inforsense has a cool composite-analytical-applications story. More precisely, they said my phrase “analytics-oriented EAI” was an “exceptionally good” way to describe their focus. Inforsense’s biggest target market seems to be health care, research and clinical alike. Financial services is next in line.
- Tableau Software “gets it” a little bit more than other BI vendors about the need to decide for yourself how to define metrics. (Of course, it’s possible that other “exploration”-oriented new-style vendors are just as clued-in, but I haven’t asked in the right way.)
- Jerome Pineau’s favorable view of Gooddata and unfavorable view of Birst are in line with other input I trust. I’ve never actually spoken with the Gooddata folks, however.
- Seth Grimes suggests the qualitative differences between open-source and closed-source BI are no longer significant. He has a point, although I’d frame it more as being about the difference between the largest (but acquisition-built) BI product portfolios and the smaller (but more home-grown) ones, counting open source in the latter group.
- I’ve discovered about five different in-memory OLAP efforts recently, and no doubt that’s just the tip of the iceberg.
- I’m hearing ever more about public-facing/extranet BI. Information Builders is a leader here, but other vendors are talking about it too.
A little more detail Read more
| Categories: Application areas, Business intelligence, Information Builders, Inforsense, Jaspersoft, QlikTech and QlikView, Scientific research, Tableau Software | 8 Comments |
Lots of analytic DBMS vendors are hiring
After writing about a Twitter jobs page, it occurred to me to check out whether analytic DBMS vendors are still hiring. Based on the Careers pages on their websites, I determined that Aster, Greenplum, Kickfire, and ParAccel all evidently are, in various mixes of (mainly) technical and field positions. At that point I got bored and stopped.
I didn’t choose those vendors entirely at random. If I had to name three vendors who are said to have had small layoffs at some point over the past few quarters, it would be ParAccel, Greenplum, and Kickfire. So if even they are hiring, the analytic DBMS sector is still pretty healthy … or at least thinks it is. 😉
| Categories: Aster Data, Data warehousing, Greenplum, Kickfire, ParAccel | 5 Comments |
Somebody is spreading Teradata acquisition rumors again
An mass email from Tom Coffing was forwarded to me today that starts:
I have heard from reliable sources that both HP and SAP have purchased more than 5% of Teradata stock. My sources tell me that both companies appear to be positioning themselves for a bid.
I got my version of the same email from Coffing yesterday with a different introduction but otherwise the same substance (he’s pushing a new product of his). It also had a different From address.
Possible explanations include but are not limited to:
- Coffing knows something (seems unlikely, but I haven’t actually checked www.sec.gov to confirm or disconfirm)
- Coffing thinks he knows something
- Coffing just made this up (I hope not)
- There’s an April Fool’s Day prank going on (not by me — after my bizarre March, I’m recusing myself from April Fool’s pranks this year)
| Categories: Data warehousing, HP and Neoview, SAP AG, Teradata | 4 Comments |
Twitter is considering using MapReduce
From a Twitter job listing (formatting mine). The most interesting section is “Additional preferred experience.” Read more
| Categories: Analytic technologies, Data warehousing, MapReduce, Specific users, Web analytics | 6 Comments |
What you learn in statistics class
xkcd does it again. Previous links to xkcd here and here.
| Categories: Analytic technologies, Fun stuff, Humor | 2 Comments |
Aleri update
My skeptical remarks on the Aleri/Coral8 merger generated some pushback. Today I actually got around to talking with John Morell, who was marketing chief at Coral8 and has remained with the combined company. First, some quick metrics:
- The combined Aleri has around 100 employees, 60-40 from Aleri vs. Coral8.
- The combined Aleri has around 80 customers. All of Aleri’s, with one sort-of exception at Banks.com, were in financial services. A large minority of Coral8’s were in financial services too.
- However, half of Aleri’s marketing spend going forward is budgeted outside the financial services markets. Not unreasonably, John presents this as a proof point Aleri is serious about selling to other markets.
- Aleri had 12-14 people in the UK pre-merger. Coral8 had none in Europe.
- Coral8 had 15 OEMs pre-merger, some actually generating revenue. Aleri had substantially none.
- Coral8 had been closing a “couple” of customers/quarter in online commerce. But recently, that rate ramped up to a “few.”
- Aleri’s engine is used to handle “many” hundreds of thousands of messages per second. Coral8’s highest-throughput user processes 100-150,000 messages/second.
John is sticking by the company line that there will be an integrated Aleri/Coral8 engine in around 12 months, with all the performance optimization of Aleri and flexibility of Coral8, that compiles and runs code from any of the development tools either Aleri or Coral8 now has. While this is a lot faster than, say, the Informix/Illustra or Oracle/IRI Express integrations, John insists that integrating CEP engines is a lot easier. We’ll see.
I focused most of the conversation on Aleri’s forthcoming efforts outside the financial services market. John sees these as being focused around Coral8’s old “Continuous (Business) Intelligence” message, enhanced by Aleri’s Live OLAP. Aleri Live OLAP is an in-memory OLAP engine, real-time/event-driven, fed by CEP. Queries can be submitted via ODBO/MDX today. XMLA is coming. John reports that quite a few Coral8 customers are interested in Live OLAP, and positions the capability as one Coral8 would have had to develop had the company remained independent. Read more
