Oracle’s hardware strategy
Larry Ellison stated clearly in an email interview with Reuters (links here and here) that Oracle intends to keep Sun’s hardware business and indeed intends to invest in the SPARC chip. Naturally, I have a few thoughts about this.
As Stephen O’Grady points out, Sun’s main strength lay in selling to the large enterprise market. Well, that’s Oracle’s overwhelming focus too. As I noted two years ago:
One Oracle response is to provide lots of add-on technologies for high-end customers, on the database and middle tiers alike. In app servers it’s done surprisingly well against BEA. It’s sold a lot of clustering. And it’s bought into and tried to popularize niche technologies like TimesTen and Tangosol’s.
This all makes perfect sense – it’s a great fit for Oracle’s best customers, and a way to get thousands of extra dollars per server from enterprises that may already have bought all-you-can-eat licenses to the Oracle DBMS. And being so sensible, it fits into the Clayton Christensen disruption story in two ways:
Oracle may be helpless against mid-tier competition, but it sure has the high-end core of its market locked up.
- As one type of technology is commoditized, value is created in other parts of the technology stack.
Oracle’s ongoing acquisition spree in system software, application software, and now hardware just supports that story. MySQL, embedded Java, and so on may be welcome to Oracle as yet more opportunities to tap additional markets — but Oracle’s emphasis is and surely will remain on the large enterprise market.
The next notable point may be found in Larry’s key quote: Read more
Categories: Data warehouse appliances, Data warehousing, Exadata, HP and Neoview, IBM and DB2, Oracle | 8 Comments |
37 Ways To Get More From Analytics, Version 2.0
As I hoped, there were some very helpful responses to my post listing ways to improve analytic effectiveness. Here’s a second draft incorporating them. Comments continue to be very welcome. I need to finalize this soon. Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, Presentations, Web analytics | 4 Comments |
eBay’s two enormous data warehouses
A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I’ve already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn’t like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I’m finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.
Metrics on eBay’s main Teradata data warehouse include:
- >2 petabytes of user data
- 10s of 1000s of users
- Millions of queries per day
- 72 nodes
- >140 GB/sec of I/O, or 2 GB/node/sec, or maybe that’s a peak when the workload is scan-heavy
- 100s of production databases being fed in
Metrics on eBay’s Greenplum data warehouse (or, if you like, data mart) include:
- 6 1/2 petabytes of user data
- 17 trillion records
- 150 billion new records/day, which seems to suggest an ingest rate well over 50 terabytes/day
- 96 nodes
- 200 MB/node/sec of I/O (that’s the order of magnitude difference that triggered my post on disk drives)
- 4.5 petabytes of storage
- 70% compression
- A small number of concurrent users
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, eBay, Greenplum, Petabyte-scale data management, Teradata, Web analytics | 48 Comments |
37 Ways To Get More From Analytics
I posted several stages of my thinking in connection with a February presentation on how to buy an analytic DBMS. The whole process seemed like a success, with good input early on, and at least one new client directly attracted by the uploaded slide presentation. So now I’m trying the same idea again, starting at an even earlier stage of the process.
I’m going to be speaking this September at six of the seven installments of Netezza’s 2009 traveling regional user conference, namely those in London, Milan, and the United States. (Edited for schedule changes.) The topic is going to be something like “N Ways to Get More From Analytics”, for N a decent-sized two-digit integer. The talk is meant to be more conceptual, upbeat, rah-rah, and/or inspirational than is my usual style, at the cost of perhaps being less complete, detailed, or carefully organized. Right now I’m at the point of sharing an initial list of ideas, and throwing open the question: What did I leave out?
The initial list is: Read more
Data warehouse storage options — cheap, expensive, or solid-state disk drives
This is a long post, so I’m going to recap the highlights up front. In the opinion of somebody I have high regard for, namely Carson Schmidt of Teradata:
- There’s currently a huge — one order of magnitude — performance difference between cheap and expensive disks for data warehousing workloads.
- New disk generations coming soon will have best-of-both-worlds aspects, combining high-end performance with lower-end cost and power consumption.
- Solid-state drives will likely add one or two orders of magnitude to performance a few years down the road. Echoing the most famous logjam in VC history — namely the 60+ hard disk companies that got venture funding in the 1980s — 20+ companies are vying to cash in.
In other news, Carson likes 10 Gigabit Ethernet, dislikes Infiniband, and is “ecstatic” about Intel’s Nehalem, which will be the basis for Teradata’s next generation of servers.
Categories: Data warehouse appliances, Data warehousing, eBay, Solid-state memory, Storage, Teradata | 16 Comments |
The SAP/Teradata deal explained
When I first saw the press release about the latest SAP/Teradata deal, I thought it sounded very Barney. But it turns out there’s a little bit of substance, as well. Amazingly, SAP BW doesn’t really run on Teradata right now. This deal will fix that. The time frame seems to be that SAP-BW-on-Teradata will ship with SAP BW 7.2 whenever that goes out. (First half of 2010?) Early adopters may be able to get their hands on it as early as Q3 2009.
Note: It surely would be more precise to insert “NetWeaver” a few times into that paragraph.
Just to be clear — I still don’t see this as a big deal. It doesn’t portend any grand SAP/Teradata joint mission to smite Oracle, IBM, and/or Microsoft. Nor is it a telling first step toward an SAP/Teradata merger. It just removes a particular competitive disadvantage Teradata had vs. Oracle et al., from which Teradata’s smaller specialist competitors still suffer. And it offers SAP BW customers another high-quality DBMS option.
Categories: Business intelligence, Data warehousing, SAP AG, Teradata | Leave a Comment |
Vertica pricing and customer metrics
Since last fall, Vertica’s stated pricing has been “$100K per terabyte of user data.” Vertica hastens to point out that unlike, for example, appliance vendors or Sybase, it only charges for deployment licenses; development and test are free (although of course you have to Bring Your Own hardware). Offer the past few weeks, I’ve gotten other pricing comments from Vertica to the effect that:
- Of course, Vertica offers substantial negotiated quantity discounts. (Specifics that Vertica told me are confidential.)
- Actually,Vertica’s official price list (unpublished but apparently freely available to prospects) contains quantity discounts too.
- Finally, Vertica told me that its actual average price is around $25K/terabyte, and gave me person to publish same.
I didn’t press my luck and ask exactly what “average” means in this context.
As for customers, metrics I got include: Read more
Some DB2 highlights
I chatted with IBM Thursday, about recent and imminent releases of DB2 (9.5 through 9.7). Highlights included:
- DB2 is getting Oracle emulation, which I posted about separately.
- IBM says that it had >50 new DB2 data warehouse customers last year. I neglected to ask how many of these had been general-purpose DB2 customers all along.
- By “data warehouse customer” I mean a user for InfoSphere Warehouse, which previously was called DB2’s DPF (Data Partitioning Feature). Apparently, this includes both logical and physical partitioning. E.g., DB2 isn’t shared-nothing without this feature.
- IBM is proud of DB2’s compression, which it claims commonly reaches 70-80%. It calls this “industry-leading” in comparison to Oracle, SQL Server, and other general-purpose relational DBMS.
- DB2 compression’s overall effect on performance stems from a trade-off between I/O (lessened) and CPU burden (increased). For OLTP workloads, this is about a wash. For data warehousing workloads, IBM says 20% performance improvement from compression is average.
- DB2 now has its version of one of my favorite Oracle security features, called Label Based Access Control. A label-control feature can make it much easier to secure data on a row-by-row, value-by-value basis. The obvious big user is national intelligence, followed by financial services. IBM says the health care industry also has interest in LBAC.
- Also in the security area, IBM reworked DB2’s audit feature for 9.5
- I think what I heard in our discussion of DB2 virtualization is:
- Increasingly, IBM is seeing production use of VMware, rather than just test/development.
- IBM believes it is a much closer partner to VMware than Oracle or Microsoft is, because it’s not pushing its own competing technology.
- Generally, virtualization is more important for OLTP workloads than data warehousing ones, because OLTP apps commonly only need part of the resources of a node while data warehousing often wants the whole node.
- AIX data warehousing is an exception. I think this is because AIX equates to big SMP boxes, and virtualization lets you spread out the data warehousing processing across more nodes, with the usual parallel I/O benefits.
- When IBM talks of new autonomic/self-tuning features in DB2, they’re used mainly for databases under 1 terabyte in size. Indeed, the self-tuning feature set doesn’t work with InfoSphere Warehouse.
- Even with the self-tuning feature it sounds as if you need at least a couple of DBA hours per instance per week, on average.
- DB2 on Linux/Unix/Windows has introduced some enhanced workload management features analogous to those long found in mainframe DB2. For example, resource allocation rules can be scheduled by time. (The point of workload management is to allocate resources such as CPU or I/O among the simultaneous queries or other tasks that contend for them.) Workload management rules can have thresholds for amounts of resources consumed, after which the priority for a task can go up (“Get it over with!”) or down (“Stop hogging my system!”).
Categories: Application areas, Data warehousing, Database compression, IBM and DB2, Market share and customer counts, OLTP, Parallelization, Workload management | 2 Comments |
IBM’s Oracle emulation strategy reconsidered
I’ve now had a chance to talk with IBM about its recently-announced Oracle emulation strategy for DB2. (This is for DB2 9.7, which I gather has been quasi-announced in April, will be re-announced in May, and will be re-re-announced as being in general availability in June.)
Key points include:
- This really is more like Oracle emulation than it is transparency, a term I carelessly used before.
- IBM’s Oracle emulation effort is focused on two technological goals:
- Making it easy for an Oracle application to be ported to DB2.
- Making it easy for an Oracle developer to develop for DB2.
- The initial target market for DB2’s Oracle emulation is ISVs (Independent Software Vendors) much more than it is enterprises. IBM suggested there were a couple hundred early adopters, and those are primarily in the ISV area.
Because of Oracle’s market share, many ISVs focus on Oracle as the underlying database management system for their applications, whether or not they actually resell it along with their own software. IBM proposed three reasons why such ISVs might want to support DB2: Read more
Clearing some of my buffer
I have a large number of posts still in backlog. For starters, there are ones based on recent visits with Aster, Greenplum, Sybase, Vertica, and a Very Large User. I suspect I’ll write more soon on Oracle as well. Plus there’s my whole future-of-online-media area. And quite a bit more will grow out of planned research.
So there are a whole lot of other worthy subjects I doubt I’ll be getting to any time soon. In some cases, of course, other people are doing great jobs of writing about same. Here are pointers to a few links that I am glad to recommend:
- I wrote recently that I’ve discovered a number of different in-memory OLAP engines. Cindi Howson far outdid that, writing at length for Intelligent Enterprise on in-memory analytics, in an article that seems to itself be a teaser for a longer, free white paper on the subject.
- CouchDB posted an eye-catching, risque slide presentation promoting CouchDB and, more generally, key-value stores, at least for internet applications. And yes, they’ve integrated MapReduce.
- Merv Adrian posted favorably about Birst, with special reference to its OEM efforts. As previously noted, I was highly unimpressed with Birst’s end-user BI story at the time of its September roll-out, and Jerome Pineau’s recent examination did nothing to reassure me. But perhaps OEM is a different matter.
- Merv also offers an interesting post about data integration upstart Expressor, and a highly favorable one about “visualization” vendor Tableau.
- Ann All interviewed Nigel Pendse, who grumped that BI features are overrated, and what end users really want is great query performance. I’m not so sure about the features side of that, but I’m hugely in agreement about the performance. That’s a big part of why the analytic DBMS industry is so vibrant. It’s also why in-memory OLAP is suddenly so hot.