eBay OLTP architecture
I’ve posted a couple times about eBay’s analytics side. As a companion, Don Burleson pointed me at a fascinating November, 2006 slide presentation outlining eBay’s transactional architecture and evolution. Highlights include:
- A whole lot of manual slicing of Oracle databases, so as not to exceed their capacity.
- A whole lot of careful design and ordering of transactions.
- Putting all the business logic in the application tier, with a custom O/R mapper. There’s lots of caching there, but very little state.
The presentation has a bunch of specific numbers, in case anybody wants to dive in.
| Categories: OLTP, Specific users, eBay | Leave a Comment |
Introduction to Exasol
I had a non-technical introduction today to Exasol, a data warehouse specialist that has gotten a little buzz recently for publishing TPC-H results even faster than ParAccel’s. Here are some highlights:
- Exasol was founded back in 2000.
- Exasol is a German company, with 60 employees. While I didn’t ask, the vast majority are surely German.
- Exasol has two customers. 6-8 more are Coming Real Soon. Most or all of those are in Germany, although one may be in Asia.
- Karstadt (big German retailer) has had Exasol deployed for 3 years. The other deployed customer is the German subsidiary of data provider IMS Health.
- [Redacted for confidentiality] is a strategic investor in and partner of Exasol. [Redacted for confidentiality]‘s only competing partnership is with Oracle.
- Exasol’s system is more completely written from scratch than many. E.g., all they use from Linux are some drivers, and maybe a microkernel.
- Exasol runs in-memory. There doesn’t seem to be a disk-centric mode.
- Exasol’s data access methods are sort of like columnar, but not exactly. I look forward to a more technical discussion to sort that out.
- Exasol’s claimed typical compression is 5-7X. As in the Vertica story, database operations are carried out on compressed data.
- Exasol says it has performed a very fast TPC-H inhouse at the 30 terabyte level. However, its deployed sites are probably a lot smaller than that. IMS Health is cited in its literature as 145 gigabytes.
- Oracle and Microsoft are listed as Exasol partners, so there may be some kind of plug-compatibility or back-end processing story.
| Categories: Analytic technologies, Data warehousing, Exasol, Specific users | Leave a Comment |
The biggest eBay database
There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:
- eBay’s figure of >1.4 petabytes of data — for its largest single analytic database — counts disks or something, not raw user data.
- I previously published a strong conjecture that the database vendor in question was Teradata, which is definitely an eBay supplier. In particular, it is definitely not an Oracle data warehouse.
- While eBay isn’t saying who it is either — not even off-the-record — the 50%ish compression figures they experience just happen to map well to Teradata’s usual range.
- Edit: Just to be clear — not that there was any doubt, but I have reconfirmed that eBay is a Teradata user, in or including eBay’s Paypal division.
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Specific users, Teradata, eBay | 1 Comment |
All should be functioning again
The server move has completed. The brief outage is behind us. Comments have been turned back on. All SHOULD be well.
I plan to write a little more soon about web hosting over on the Monash Report, if for no other reason than that what’s there is not wholly accurate and needs updating.
| Categories: About this blog | Leave a Comment |
Comments off Friday night
I’m moving servers again. In connection with that, I’m turning comments off for a few hours.
Everything SHOULD be fine again by Saturday.
| Categories: About this blog | Leave a Comment |
ObjectGrid versus H-Store
Billy Newport of IBM sees a lot of similarities between his app-server-based product ObjectGrid and H-Store. In both cases, constrained tree schemas are assumed, and OLTP performance goodness ensues. A couple of points I noted on a quick skim through his blog:
- He calls out RAM consumption as a challenge for this kind of architecture.
- He points out that it’s a big advantage to have data called and used in the same address space.
Being based in RAM is obviously a huge part of the H-Store scheme. But so is having transaction execution be close to the database.
IBM now has both ObjectGrid and a memory-centric DBMS (solidDB) that they’ve been using as a front end for DBMS. Integration of the two could be pretty interesting.
| Categories: Cache, IBM and DB2, Memory-centric data management, OLTP, Theory and architecture, VoltDB and H-Store, solidDB | Leave a Comment |
The architectural assumptions of H-Store
I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.
Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi. Read more
| Categories: Database diversity, In-memory DBMS, Memory-centric data management, OLTP, VoltDB and H-Store | 5 Comments |
Mike Stonebraker may be oversimplifying data warehousing just a tad
Mike Stonebraker has now responded to the second post in my five-part database diversity series. Takeaways and rejoinders include: Read more
| Categories: Analytic technologies, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, Theory and architecture, Vertica Systems | 2 Comments |
Kalido — CASE for complex data warehouses
Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:
- Kalido Business Information Modeler (the newest part)
- Kalido Dynamic Information Warehouse
- Kalido Universal Information Director
- Kalido Master Data Management
But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.
| Categories: Data integration and middleware, Data models and architecture, Data warehousing, EAI, EII, ETL, ELT, ETLT, Kalido, Theory and architecture | 1 Comment |
ParAccel technical highlights
I recently caught up with ParAccel’s CTO Barry Zane and Marketing VP Kim Stanick for a long technical discussion, which they have graciously continued by email. It would be impolitic in the extreme to comment on what led up to that. Let’s just note that many things I’ve previously written about ParAccel are now inoperative, and go straight to the highlights.
