May 8th, 2008 Curt Monash
Call me slow on the uptake if you like, but it’s finally dawned on me that outsourced data marts are a nontrivial segment of the analytics business. For example:
- I was just briefed by Vertica, and got the impression that data mart outsourcers may be Vertica’s #3 vertical market, after financial services and telecom. Certainly it seems like they are Vertica’s #3 market if you bundle together data mart outsourcers and more conventional OEMs.
- When Netezza started out, a bunch of its early customers were credit data-based analytics outsourcers like Acxiom.
- After nagging DATAllegro for a production reference, I finally got a good one — TEOCO. TEOCO specializes in figuring out whether inter-carrier telcom bills are correct. While there’s certainly a transactional invoice-processing aspect to this, the business seems to hinge mainly around doing calculations to figure out correct charges.
- I was talking with Pervasive about Pervasive Datarush, a beta product that lets you do super-fast analytics on data even if you never load it into a DBMS in the first place. I challenged them for use cases. One user turns out to be an insurance claims rule-checking outsourcer.
- One of Infobright’s references is a French CRM analytics outsourcer, 1024 Degres.
- 1010data has built up a client base of 50-60, including a number of financial and retail blue-chippers, with a soup-to-nuts BI/analysis/columnar database stack.
- I haven’t heard much about Verix in a while, but their niche was combining internal sales figures with external point-of-sale/prescription data to assess retail (especially pharma) microtrends.
To a first approximation, here’s what I think is going on. Read the rest of this entry »
Posted in 1010data, Analytics and analytic technologies, Business intelligence, Cloud computing, Data warehousing, Infobright and Brighthouse, Netezza, Pervasive Software, SaaS, Specific users, TEOCO, Vertica Systems | 1 Comment »
March 25th, 2008 Curt Monash
Oliver Ratzesberger and his crew have started a blog, focusing on xldb analytics. Naturally, one of the early posts gives a quick overview of their system stats. Highlights include:
Incoming data volumes exceed 40TB per day, with more than 10^11 new items/lines/records being added per day. Our analytical processing infrastructure exceeds 6PB of physical storage with over 2.9PB(1.4+1.5) in our largest cluster.
We leverage compression technologies wherever possible and are achieving compression ratios as high as 99% on our highest volume data feeds.
On any given day our massive parallel systems process more than 27PB of data, not factoring in various levels of caches that serve similar activities or processes and reduce the amount of physical IOs significantly.
We execute millions of requests on a daily basis, spanning from near realtime highly localized access to enormous jobs that span 100s of TB in a single or series of models.
Posted in Specific users, eBay | No Comments »
March 13th, 2008 Curt Monash
Twitter commonly has the problem of duplicate tweets. That is, if you post a message, it shows up twice. After a little while, the dupe disappears, but if you delete the dupe manually, the original is gone too.
I presume what’s going on is that tweets are cached, the tweets are eventually batched to disk, and they don’t always get deleted from cache until some time after they’re persisted. If you happen to check the page of your recent tweets inbetween — boom, you get two hits. But what I don’t understand is why the two versions have different timestamps.
Presumably, this could be explained at a MySQL User Conference session next month, one of whose topics will be Intelligent caching strategies using a hybrid MemCache / MySQL approach. I’m so glad they don’t use stupid strategies to do this … Read the rest of this entry »
Posted in Cache, MySQL, OLTP database management, Specific users | 3 Comments »
March 4th, 2008 Curt Monash
Intelligent Enterprise has an article on Sybase IQ and columnar systems that leaves me shaking my head. E.g., it ends by saying Netezza has a columnar architecture (uh, no). It also quotes an IBM exec as saying only 10-20% of what matters in a data warehouse DBMS is performance (already an odd claim), and then has him saying columnar only provides a 10% performance gain (let’s be generous and hope that’s a misquote).
Also from the article — and this part seems more credible — is:
“Sybase IQ revenues were up 70% last year,” said Richard Pledereder, VP of engineering. … Sybase now claims 1,200 Sybase IQ customers. It runs large data warehouses powered by big, multiprocessor servers. Priced at $45,000 per CPU, those IQ customers now account for a significant share of Sybase’s revenues, although the company won’t break down revenues by market segment.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Columnar architectures, Data warehousing, Relational database management systems, Specific users, Sybase | 1 Comment »
February 27th, 2008 Curt Monash
I’ve posted a couple times about eBay’s analytics side. As a companion, Don Burleson pointed me at a fascinating November, 2006 slide presentation outlining eBay’s transactional architecture and evolution. Highlights include:
- A whole lot of manual slicing of Oracle databases, so as not to exceed their capacity.
- A whole lot of careful design and ordering of transactions.
- Putting all the business logic in the application tier, with a custom O/R mapper. There’s lots of caching there, but very little state.
The presentation has a bunch of specific numbers, in case anybody wants to dive in.
Please subscribe to our feed!
Technorati Tags: transaction processing, OLTP
Posted in OLTP database management, Specific users, eBay | No Comments »
February 26th, 2008 Curt Monash
I had a non-technical introduction today to Exasol, a data warehouse specialist that has gotten a little buzz recently for publishing TPC-H results even faster than ParAccel’s. Here are some highlights:
- Exasol was founded back in 2000.
- Exasol is a German company, with 60 employees. While I didn’t ask, the vast majority are surely German.
- Exasol has two customers. 6-8 more are Coming Real Soon. Most or all of those are in Germany, although one may be in Asia.
- Karstadt (big German retailer) has had Exasol deployed for 3 years. The other deployed customer is the German subsidiary of data provider IMS Health.
- [Redacted for confidentiality] is a strategic investor in and partner of Exasol. [Redacted for confidentiality]’s only competing partnership is with Oracle.
- Exasol’s system is more completely written from scratch than many. E.g., all they use from Linux are some drivers, and maybe a microkernel.
- Exasol runs in-memory. There doesn’t seem to be a disk-centric mode.
- Exasol’s data access methods are sort of like columnar, but not exactly. I look forward to a more technical discussion to sort that out.
- Exasol’s claimed typical compression is 5-7X. As in the Vertica story, database operations are carried out on compressed data.
- Exasol says it has performed a very fast TPC-H inhouse at the 30 terabyte level. However, its deployed sites are probably a lot smaller than that. IMS Health is cited in its literature as 145 gigabytes.
- Oracle and Microsoft are listed as Exasol partners, so there may be some kind of plug-compatibility or back-end processing story.
Please subscribe to our feed!
Posted in Analytics and analytic technologies, Data warehousing, Exasol, Relational database management systems, Specific users | No Comments »
February 26th, 2008 Curt Monash
There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:
- eBay’s figure of >1.4 petabytes of data — for its largest single analytic database — counts disks or something, not raw user data.
- I previously published a strong conjecture that the database vendor in question was Teradata, which is definitely an eBay supplier. In particular, it is definitely not an Oracle data warehouse.
- While eBay isn’t saying who it is either — not even off-the-record — the 50%ish compression figures they experience just happen to map well to Teradata’s usual range.
- Edit: Just to be clear — not that there was any doubt, but I have reconfirmed that eBay is a Teradata user, in or including eBay’s Paypal division.
Please subscribe to our feed!
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Relational database management systems, Specific users, Teradata, eBay | No Comments »
February 11th, 2008 Curt Monash
Single largest database >1.4 petabytes.
From Oliver Ratzesberger’s LinkedIn profile:
Our systems process in excess of 10 billion records per day, serving thousands of users and delivering hundreds of millions of queries per month in a true global 24×7 operation with distributed teams around the globe on systems over 5 PB in size (largest single system >1.4PB).
Posted in Specific users, eBay | 3 Comments »
January 25th, 2008 Curt Monash
Spinn3r crawls and indexes blogs. It says it covers 1 million blogs and 25K posts/hour, doing thousands of write transactions per second. And it does this into federated MySQL — but with a lot of software built on top. To wit: Read the rest of this entry »
Posted in MySQL, Specific users | 1 Comment »
October 19th, 2007 Curt Monash
I was at the Business Objects conference this week, and as usual went to very few sessions. But one I did stroll into was on “Managing Rapid Growth With the Right BI Strategy.” This was by Reliance Telecommunications, an outfit in India that is adding telecom subscribers very quickly, and consequently banging 100-150 gigs of data per day into a 35 terabyte warehouse.
The beginning of the talk astonished me, as the presenter seemed to be saying they were doing all this on Oracle. Hah. Oracle is what they moved away from; instead, they got Greenplum. I couldn’t get details; indeed, as a BI guy he was far enough away from DBMS to misspeak and say that Greenplum was brought in by ‘HP’, before quickly correcting himself when prompted. Read the rest of this entry »
Posted in Analytics and analytic technologies, Business Objects, Data warehouse appliances, Data warehousing, Greenplum, Oracle, Specific users | No Comments »
October 9th, 2007 Curt Monash
Usually, I don’t engage in the kind of high-speed quick-response blogging I have over the past couple of days from the Teradata Partners conference (and more generally have for the past week or so). And I’m not sure it’s working out so well.
For example, the claim that Teradata has surpassd the one-petabyte mark comes as quite a surprise to variety of Teradata folks, not to mention at least one reliable outside anonymous correspondent. That claim may indeed be true about raw disk space on systems sold. But the real current upper limit, according to CTO Todd Walter,* is 5-700 terabytes of user data. He thinks half a dozen or so customers are in that range. I’d guess quite strongly that three of those are Wal-Mart, eBay, and an unspecified US intelligence agency.
*Teradata seems to have quite a few CTOs. But I’ve seen things much sillier than that in the titles department, and accordingly shan’t scoff further — at least on that particular subject.
On the other hand, if anybody did want to buy a 10 petabyte system, Teradata could ship them one. And by the way, the Teradata people insist Sybase’s claims in the petabyte area are quite bogus. Teradata claims to have had bigger internal systems tested earlier than the one Sybase writes about.
Technorati Tags: Teradata, petabyte, data warehouse, Sybase, Wal-Mart, eBay
Posted in Data warehouse appliances, Data warehousing, Specific users, Sybase, Teradata | No Comments »
October 8th, 2007 Curt Monash
According to a hurried conversation I had with Chief Marketing Office Darryl MacDonald, Teradata has customers with over 1 petabyte of user data in a single instance. He wouldn’t disclose any names, but I’d guess one is eBay, who he did confim is a customer. The intelligence area is another one where I’d speculate there are Very Large Databases.
However, since Darryl mentioned testing systems internally up to 4 petabytes, I’d guess the upper limit of Teradata deployments is in the 1-2 petabyte range.
EDIT: I’m now guessing that Teradata’s largest classified database — which previously was the largest overall — isn’t much over a petabyte in size. And there’s a strong chance this is larger than any unclassified one.
Update: That wasn’t really 1+ petabyte of user data.
Technorati Tags: Teradata, petabyte, data warehouse
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Specific users, Teradata, eBay | No Comments »
August 8th, 2006 Curt Monash
Every sufficiently large or agile enterprise needs to follow the DBMS2 approach. The following is from an article on eBay’s version:
“eBay has built a software-based Integration Tier. This contains both a data access layer (DAL) and a services framework. The Integration Tier acts as an abstraction layer for software engineers to work with many disparate back-end data sources through a consistent set of abstractions.”
Posted in EII, ETL, and/or EAI, Specific users, eBay | No Comments »
July 25th, 2006 Curt Monash
Last year, I pointed out that Amazon has a highly diversified DBMS strategy. Now Mike Vizard has a great interview with Werner Vogel, Amazon’s CTO, where he unearths a lot more detail. And it turns out that Amazon has been a hardcore adopter of DBMS2, since long before DBMS2 was named.
Read the rest of this entry »
Posted in Amazon, SimpleDB, and S3, Database diversity, Database theory and practice, Specific users | No Comments »
October 10th, 2005 Curt Monash
I don’t know for a fact that the Amazon.com bookstore is the world’s biggest OLTP application — but if it isn’t, it’s close.
And the thing is — that’s never been an entirely relational application. Oh, the ordering part surely is. But the inventory lookup is currently driven by an OODBMS (from Progress). The personalization used to be done in Red Brick (I knew which software replaced it, but I’m forgetting at the moment — it may even be one of the relational warehouse appliance vendors). And of course the full-text search is a custom in-house system.
Posted in Amazon, SimpleDB, and S3, Cache, Database theory and practice, Memory-centric data management, OLTP database management, Objects, Progress, Apama, and DataDirect, Specialized data management in general, Specific users | 3 Comments »