February 27, 2008

eBay OLTP architecture

I’ve posted a couple times about eBay’s analytics side. As a companion, Don Burleson pointed me at a fascinating November, 2006 slide presentation outlining eBay’s transactional architecture and evolution. Highlights include:

A whole lot of manual slicing of Oracle databases, so as not to exceed their capacity.
A whole lot of careful design and ordering of transactions.
Putting all the business logic in the application tier, with a custom O/R mapper. There’s lots of caching there, but very little state.

The presentation has a bunch of specific numbers, in case anybody wants to dive in.

Categories: eBay, OLTP, Specific users

Leave a Comment

February 26, 2008

Introduction to Exasol

I had a non-technical introduction today to Exasol, a data warehouse specialist that has gotten a little buzz recently for publishing TPC-H results even faster than ParAccel’s. Here are some highlights:

Exasol was founded back in 2000.
Exasol is a German company, with 60 employees. While I didn’t ask, the vast majority are surely German.
Exasol has two customers. 6-8 more are Coming Real Soon. Most or all of those are in Germany, although one may be in Asia.
Karstadt (big German retailer) has had Exasol deployed for 3 years. The other deployed customer is the German subsidiary of data provider IMS Health.
[Redacted for confidentiality] is a strategic investor in and partner of Exasol. [Redacted for confidentiality]’s only competing partnership is with Oracle.
Exasol’s system is more completely written from scratch than many. E.g., all they use from Linux are some drivers, and maybe a microkernel.
Exasol runs in-memory. There doesn’t seem to be a disk-centric mode.
Exasol’s data access methods are sort of like columnar, but not exactly. I look forward to a more technical discussion to sort that out.
Exasol’s claimed typical compression is 5-7X. As in the Vertica story, database operations are carried out on compressed data.
Exasol says it has performed a very fast TPC-H inhouse at the 30 terabyte level. However, its deployed sites are probably a lot smaller than that. IMS Health is cited in its literature as 145 gigabytes.
Oracle and Microsoft are listed as Exasol partners, so there may be some kind of plug-compatibility or back-end processing story.

Categories: Analytic technologies, Data warehousing, Exasol, Specific users

Leave a Comment

February 26, 2008

The biggest eBay database

There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:

eBay’s figure of >1.4 petabytes of data — for its largest single analytic database — counts disks or something, not raw user data.
I previously published a strong conjecture that the database vendor in question was Teradata, which is definitely an eBay supplier. In particular, it is definitely not an Oracle data warehouse.
While eBay isn’t saying who it is either — not even off-the-record — the 50%ish compression figures they experience just happen to map well to Teradata’s usual range.
Edit: Just to be clear — not that there was any doubt, but I have reconfirmed that eBay is a Teradata user, in or including eBay’s Paypal division.

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, eBay, Specific users, Teradata

1 Comment

February 23, 2008

All should be functioning again

The server move has completed. The brief outage is behind us. Comments have been turned back on. All SHOULD be well.

I plan to write a little more soon about web hosting over on the Monash Report, if for no other reason than that what’s there is not wholly accurate and needs updating.

Categories: About this blog

Leave a Comment

February 22, 2008

Comments off Friday night

I’m moving servers again. In connection with that, I’m turning comments off for a few hours.

Everything SHOULD be fine again by Saturday.

Categories: About this blog

Leave a Comment

February 20, 2008

ObjectGrid versus H-Store

Billy Newport of IBM sees a lot of similarities between his app-server-based product ObjectGrid and H-Store. In both cases, constrained tree schemas are assumed, and OLTP performance goodness ensues. A couple of points I noted on a quick skim through his blog:

He calls out RAM consumption as a challenge for this kind of architecture.
He points out that it’s a big advantage to have data called and used in the same address space.

Being based in RAM is obviously a huge part of the H-Store scheme. But so is having transaction execution be close to the database.

IBM now has both ObjectGrid and a memory-centric DBMS (solidDB) that they’ve been using as a front end for DBMS. Integration of the two could be pretty interesting.

Categories: Cache, IBM and DB2, Memory-centric data management, OLTP, solidDB, Theory and architecture, VoltDB and H-Store

Leave a Comment

February 19, 2008

The architectural assumptions of H-Store

I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.

Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi. Read more

Categories: Database diversity, In-memory DBMS, Memory-centric data management, OLTP, VoltDB and H-Store

5 Comments

February 19, 2008

Mike Stonebraker may be oversimplifying data warehousing just a tad

Mike Stonebraker has now responded to the second post in my five-part database diversity series. Takeaways and rejoinders include: Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, Theory and architecture, Vertica Systems

2 Comments

February 19, 2008

Kalido — CASE for complex data warehouses

Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:

Kalido Business Information Modeler (the newest part)
Kalido Dynamic Information Warehouse
Kalido Universal Information Director
Kalido Master Data Management

But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.

Categories: Data integration and middleware, Data models and architecture, Data warehousing, EAI, EII, ETL, ELT, ETLT, Kalido, Theory and architecture

1 Comment

February 18, 2008

ParAccel technical highlights

I recently caught up with ParAccel’s CTO Barry Zane and Marketing VP Kim Stanick for a long technical discussion, which they have graciously continued by email. It would be impolitic in the extreme to comment on what led up to that. Let’s just note that many things I’ve previously written about ParAccel are now inoperative, and go straight to the highlights.

Categories: Columnar database management, Data warehousing, Emulation, transparency, portability, Microsoft and SQL*Server, ParAccel

5 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

eBay OLTP architecture

Introduction to Exasol

The biggest eBay database

All should be functioning again

Comments off Friday night

ObjectGrid versus H-Store

The architectural assumptions of H-Store

Mike Stonebraker may be oversimplifying data warehousing just a tad

Kalido — CASE for complex data warehouses

ParAccel technical highlights

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin