Fox Interactive Media’s multi-hundred terabyte database running on Greenplum
Greenplum’s largest named account is Fox Interactive Media — the parent organization of MySpace — which has a multi-hundred terabyte database that it uses for hardcore data mining/analytics. Greenplum has been engaging in regrettable business practices, claiming that it is in the process of supplanting Aster Data at Fox/MySpace. In fact, MySpace’s use of Aster is more mission-critical than Fox’s use of Greenplum, and is increasing significantly.
Still, as Greenplum’s gushing customer video with Fox Interactive Media* illustrates, the Fox/Greenplum database is impressive on its own merits. Read more
| Categories: Analytic technologies, Aster Data, Data warehousing, Fox and MySpace, Greenplum, Specific users, Theory and architecture, Web analytics | 3 Comments |
MySpace’s multi-hundred terabyte database running on Aster Data
Aster Data has put up a blog post embedding and summarizing a video about its MySpace account. Basic metrics include:
The combined Aster deployment now has 200+ commodity hardware servers working together to manage 200+ TB of data that is growing at 2-3TB per day by collecting 7-10B events that happen on one of the world.
I’m pretty sure that’s counting correctly (i.e., user data).* Read more
| Categories: Analytic technologies, Application areas, Aster Data, Data warehousing, Fox and MySpace, Specific users, Theory and architecture, Web analytics | 11 Comments |
Ideas for BI POCs
Kevin Spurway of Altosoft has a post up offering his suggestions on how to do business intelligence POCs (Proofs-of-Concept). Among the best ideas in his post are:
- Do POCs.
- Don’t let the vendors prepare the details of the POCs in advance.
- Get your hands on the actual SQL generated in the queries.
- Try to understand the actual development and deployment processes.
The post’s worst, or at least most self-serving, idea is:
- Restrict POCs to single-day toy projects.
Of course, he didn’t phrase it exactly that way, but that was the gist.
Actually, the more realistically your POC models:
- Full query workloads and throughput
- Repositories jammed full with a lot of messy detail
the more reliable it will be.
| Categories: Analytic technologies, Benchmarks and POCs, Business intelligence, Buying processes | 1 Comment |
Closing the book on the DATAllegro customer base
I’m prepared to call an end to the “Guess DATAllegro’s customers” game. Bottom line is that there are three in all, two of which are TEOCO and Dell, and the third of which is a semi-open secret. I wrote last week:
The number of DATAllegro production references is expected to double imminently, from one to two. Few will be surprised at the identity of the second reference. I imagine the number will then stay at two, as DATAllegro technology is no longer being sold, and the third known production user has never been reputed to be particularly pleased with it.
Dell did indeed disclose at TDWI that it was a large DATAllegro user, notwithstanding that Dell is a huge Teradata user as well. No doubt, Dell is gearing up to be a big user of Madison too.
Also at TDWI, I talked with some former DATAllegro employees who now work for rival vendors. None thinks DATAllegro has more than three customers. Neither do I.
Edit: Subsequently, the DATAllegro customer count declined to 1.
| Categories: Data warehouse appliances, Data warehousing, DATAllegro, Market share and customer counts, Microsoft and SQL*Server, Specific users | 10 Comments |
Named customer silliness
Neither Greenplum nor eBay will say for the record that eBay is a Greenplum customer. Indeed, saying that is quite verboten. On the other hand, Greenplum’s press release boilerplate says that Skype is a Greenplum customer, and Skype is of course a subsidiary of eBay. (Edit: Speaking of silliness, fixed a typo there.)
The point of such distinctions is sometimes lost on me.
In related news, of Greenplum’s two customers who back in August were supposedly heading into production soon with petabyte-plus databases, one hasn’t yet made it to that size. (“As we speak” turned out to be a longer conversation than I might have anticipated ….) The other (of course unnamed) customer has, Greenplum assures me, made it that high. But upon checking with that (unnamed, in case I forgot to mention the point) customer, I don’t detect a whole lot of enthusiasm about Greenplum.
| Categories: Data warehousing, eBay, Greenplum, Specific users | 3 Comments |
Data warehousing business trends
I’ve talked with a whole lot of vendors recently, some here at TDWI, as well as users, fellow analysts, and so on. Repeated themes include: Read more
| Categories: Analytic technologies, Application areas, Data mart outsourcing, Data warehousing, eBay, Microsoft and SQL*Server, MySQL, Oracle, Teradata | Leave a Comment |
HP and Neoview update
I had lunch with some HP folks at TDWI. Highlights (burgers and jokes aside) included:
- HP’s BI consulting (especially the former Knightsbridge) and analytic product groups (including Neoview) are now tightly integrated.
- HP is trying to develop and pitch “solutions” where it has particular “intellectual property.” This IP can come from ordinary product engineering or internal use, because HP Labs serves both sides of the business. Specific examples offered included:
- Telecom. Apparently, HP made specialized data warehouse devices for CDRs (Call Detail Records) long ago, and claims this has been area of particular expertise ever since.
- Supply chain – based on HP’s internal experiences.
- Customer relationship – ditto
- The main synergy suggested between consulting and Neoview is that HP’s experts work on talking buyers into such a complex view of their requirements that only Neoview (supposedly) can fit the bill.
- HP insists there are indeed new Neoview sales.
- Neoview sales seem to be concentrated in what Aster might call “frontline” applications — i.e., low latency, OLTP-like uptime requirements, etc.
- HP says it did an actual 80 TB POC. I asked whether this was for an 80 TB app or something a lot bigger, but didn’t get a clear answer.
Given the emphasis on trying to exploit HP’s other expertise in the data warehousing business, I suggested it was a pity that HP spun off Agilent (HP’s instrumentation division, aka HP Classic). Nobody much disagreed.
| Categories: Analytic technologies, Business intelligence, Data warehouse appliances, Data warehousing, HP and Neoview, Telecommunications | 4 Comments |
Even more final version of my TDWI slide deck
My TDWI talk on How to Select an Analytic DBMS starts in less than an hour. So the latest version of my slide deck should prove truly final, unlike my prior two.
I won’t have printouts or other access to my notes, so those aren’t a good guide to the actual verbiage I’ll use.
| Categories: Benchmarks and POCs, Buying processes, Presentations | 4 Comments |
Partial overview of Ab Initio Software
Ab Initio is an absurdly secretive company, as per a couple of prior posts and the comment threads on same. But yesterday at TDWI I actually found civil people staffing an Ab Initio trade show booth. Based on that conversation and other tidbits, I think it’s fairly safe to say: Read more
| Categories: Ab Initio Software, Analytic technologies, Benchmarks and POCs, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Expressor, Pricing, Talend | 14 Comments |
Introduction to Expressor Software
I’ve chatted a few times with marketing chief Michael Waclawiczek and others at data integration startup Expressor Software. Highlights of the Expressor story include:
- Expressor was founded in 2003 and funded in 2007. Two rounds of funding raised $16 million.
- Expressor’s first product release was in May, 2008; before that Expressor built custom integration tools for a couple of customers.
- Michael believes Expressor will have achieved 5 actual sales by the end of this quarter, as well being in 25 “highly active” sales cycles.
- Whatever Expressor’s long-term vision, right now it’s selling mainly on the basis of performance and affordability.
- In particular, Expressor believes it is superior to Ab Initio in both performance and ease of use.
- Expressor says that parallelism (a key aspect of data integration performance, it unsurprisingly seems) took a long time to develop. Obviously, they feel they got it right.
- Expressor is written in C, so as to do hard-core memory management for best performance.
- Expressor founder John Russell seems to have cut his teeth at Info USA, which he left in the 1990s. Other stops on his journey include Trilogy (briefly) and then Knightsbridge, before he branched out on his own.
Expressor’s real goals, I gather, have little to do with the performance + price positioning. Rather, John Russell had a vision of the ideal data integration tool, with a nice logical flow from step to step, suitable integrated metadata management, easy role-based UIs, and so on. But based on what I saw during an October visit, most of that is a ways away from fruition.
