Analytic technologies
Discussion of technologies related to information query and analysis. Related subjects include:
- Business intelligence
- Data warehousing
- (in Text Technologies) Text mining
- (in The Monash Report) Data mining
- (in The Monash Report) General issues in analytic technology
The Sybase Aleri RAP
Well, I got a quick Sybase/Aleri briefing, along with multiple apologies for not being prebriefed. (Main excuse: News was getting out, which accelerated the announcement.) Nothing badly contradicted my prior post on the Sybase/Aleri deal.
To understand Sybase’s plans for Aleri and CEP, it helps to understand Sybase’s current CEP-oriented offering, Sybase RAP. So far as I can tell, Sybase RAP has to date only been sold in the form of Sybase RAP: The Trading Edition. In that guise, Sybase RAP has been sold to >40 outfits since its May, 2008 launch, mainly big names in the investment banking and stock exchange sectors. If I understood correctly, the next target market for Sybase RAP is telcos, for real-time network tuning and management.
In addition to any domain-specific applications, Sybase RAP has three layers:
- CEP (Complex Event Processing). Sybase RAP CEP is based on a version of the Coral8 engine Sybase licensed and has been subsequently developing.
- In-memory DBMS. Sybase’s IMDB is part of (but I guess separable from) and has the same API as Sybase’s OLTP DBMS Adaptive Server Enterprise (ASE, aka Sybase Classic).
- Sybase IQ. Actually, Sybase used the phrase “based on Sybase IQ,” but I’m guessing it’s just Sybase IQ.
Quick thoughts on Sybase/Aleri
Sybase announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) Aleri. Perhaps not coincidentally, Sybase already had technology under the hood from Aleri predecessor/acquiree Coral8, for financial services uses (notwithstanding that between Aleri Classic and Coral8, Aleri Classic was the one of the two more focused on financial services). Quick reactions include:
- The folks at Sybase still haven’t figured out when to prebrief me. (Edit: I’ve been briefed subsequently.)
- Sybase/Aleri is a potentially powerful combination, if they can effectively address the point I just made about integrating disparate latencies. That said, I’m not expecting a lot, because the CEP industry always disappoints me.
- Microsoft, IBM, and (somewhat less clearly) Oracle are all trying to do CEP inhouse. Sybase is making a good choice in having serious CEP inhouse itself
- Surely the main focus and financial justification for the Sybase/Aleri acquisition is the financial services market.
- Specifically, I expect the focus of technical integration between Aleri and Sybase’s DBMS products to start with Sybase IQ.
- Coral8 had some interesting ideas about how to integrate CEP with OLTP/operational BI, but I’m not aware that they got much traction.
- I bet there are use cases where Sybase tries and fails to sell Adaptive Server SQL Anywhere that CEP would be a better technical fit, but I don’t immediately see much practical business significance to that observation.
- While this deal could easily strengthen the Vertica/StreamBase partnership, I don’t see any reason why it would lead those two companies to actually merge.
Related link
| Categories: Aleri and Coral8, Analytic technologies, Complex event processing (CEP), Investment research and trading, Sybase | 7 Comments |
Open issues in database and analytic technology
The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I’ve posted here. So I’ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points. Read more
Interesting trends in database and analytic technology
My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the union of database and analytic technologies – the intersection of those two sectors is an area of particular focus, but is far from the whole of my coverage.)
One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including: Read more
Flash, other solid-state memory, and disk
If there’s one subject on which the New England Database Summit changed or at least clarified my thinking,* it’s future storage technologies. Here’s what I now think:
- Solid-state memory will soon be the right storage technology for a large fraction of databases, OLTP and analytic alike. I’m not sure whether the initial cutoff in database size is best thought of as terabytes or 10s of terabytes, but it’s in that range. And it will increase over time, for the usual cheaper-parts reasons.
- That doesn’t necessarily mean flash. PCM (Phase-Change Memory) is coming down the pike, with perhaps 100X the durability of flash, in terms of the total number of writes it can tolerate. On the other hand, PCM has issues in the face of heat. More futuristically, IBM is also high on magnetic racetrack memory. IBM likes the term storage-class memory to cover all this — which I find regrettable, since the acronym SCM is way overloaded already.
- Putting a disk controller in front of solid-state memory is really wasteful. It wreaks havoc on I/O rates.
- Generic PCIe interfaces don’t suffice either, in many analytic use cases. Their I/O is better, but still not good enough. (Doing better yet is where Petascan – the stealth-mode company I keep teasing about – comes in.)
- Disk will long be useful for very large databases. Kryder’s Law, about disk capacity, has at least as high an annual improvement as Moore’s Law shows for chip capacity, the disk rotation speed bottleneck notwithstanding. Disk will long be much cheaper than silicon for data storage. And cheaper silicon in sensors will lead to ever more machine-generated data that fills up a lot of disks.
- Disk will long be useful for archiving. Disk is the new tape.
*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.
Related links
- A slide deck by C. Mohan of IBM similar to the one he presented at the NEDB Summit about storage-class memories.
- A much more detailed IBM presentation on storage-class memories.
Other posts based on my January, 2010 New England Database Summit keynote address
- Data-based snooping — a huge threat to liberty that we’re all helping make worse
- Interesting trends in database and analytic technology
- Open issues in database and analytic technology
| Categories: Data warehousing, Michael Stonebraker, Presentations, Solid-state memory, Storage, Theory and architecture | 2 Comments |
The disk rotation speed bottleneck
I’ve been referring to the disk (rotation) speed bottleneck for years, but I don’t really have a clean link for it. Let me fix that right now.
The first hard disks ever were introduced by IBM in 1956. They rotated 1,200 times per minute. Today’s state-of-the-art disk drives rotate 15,000 times per minute. That’s a 12.5-fold improvement since the first term of the Eisenhower Administration. (I understand that the reason for this slow improvement is aerodynamic — a disk that spins too fast literally flies off the spindle.)
Unfortunately, random seek time is bounded below, on average, by 1/2 of a disk’s rotation time. Hence disk seek times can never get below 2 milliseconds.
From that, much about modern analytic DBMS design follows.
Data-based snooping — a huge threat to liberty that we’re all helping make worse
Every year or two, I get back on my soapbox to say:
- Database and analytic technology, as they evolve, will pose tremendous danger to individual liberties.
- We in the industry who are creating this problem also have a duty to help fix it.
- Technological solutions alone won’t suffice. Legal changes are needed.
- The core of the needed legal changes are tight restrictions on governmental use of data, because relying on restrictions about data acquisition and retention clearly won’t suffice.
But this time I don’t plan to be so quick to shut up.
My best writing about the subject of liberty to date is probably in a November, 2008 blog post. My best public speaking about the subject was undoubtedly last Thursday, early in my New England Database Summit keynote address; I got a lot of favorable feedback on that part from the academics and technologists in attendance.
My emphasis is on data-based snooping rather than censorship, for several reasons:
- My work and audience are mainly in the database and analytics sectors. Censorship is more a concern for security, networking, and internet-technology folks.
- After censorship, I think data-based snooping is the second-worst technological threat to liberty.
- In the US and other fairly free countries, data-based snooping may well be the #1 threat.
| Categories: Analytic technologies, Data warehousing, Presentations | 3 Comments |
Netezza Skimmer
As I previously complained, last week wasn’t a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this morning, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.
That said, highlights of my Netezza Skimmer briefing included:
- In essence, Netezza Skimmer is 1/3 of Netezza’s previously smallest appliance, for 1/3 the price.
- I.e., Netezza Skimmer has 1 S-blade and 9 disks, vs. 3 S-blades and 24 disks on the Netezza TwinFin 3.
- With 1 disk reserved as a hot spare, that boils down to a 1:1:1 ratio among CPU cores, FPGA cores, and 1-terabyte disks on Netezza skimmer. The same could pretty much be said of Netezza TwinFin, the occasional hot-spare disk notwithstanding.
- Netezza Skimmer costs $125K.
- With 2.8 or so TB of space for user data before compression, that’s right in line with the Netezza price point of slightly <$20K/terabyte of user data.
- That assumes Netezza’s usual 2.25X compression. I forgot to ask when 4X compression was actually being shipped.
- I forgot to ask, but it seems obvious that Netezza Skimmer uses identical or substantially similar components to Netezza TwinFin’s.
- Netezza Skimmer is 7 rack units high.
- In place of the SMP hosts on TwinFin Systems, Netezza Skimmer has a host blade.
- Netezza (specifically Phil Francisco) mentioned that when Kalido uses Netezza Skimmer for its appliance, there will be an additional host computer, but when it uses TwinFin for the same software, the built-in host will suffice. (Even so, I suspect it might be too strong to say that Skimmer’s built-in host computer is underpowered.)
- Netezza also suggested that more appliance OEMs are coming down the pike specifically focused on the affordable Skimmer.
| Categories: Data mart outsourcing, Data warehouse appliances, Data warehousing, Netezza, Pricing | 1 Comment |
Two cornerstones of Oracle’s database hardware strategy
After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms. Read more
Vertica slaughters Sybase in patent litigation
Back in August, 2008, I pooh-poohed Sybase’s patent lawsuit against Vertica. Filed in the notoriously patent-holder-friendly East Texas courts, the suit basically claimed patent rights over the whole idea of a columnar RDBMS. It was pretty clear that this suit was meant to be a model for claims against other columnar RDBMS vendors as well, should they ever achieve material marketplace success.
If a recent Vertica press release is to be believed, Sybase got clobbered. The meat is:
… Sybase has admitted that under the claim construction order issued by the Court on November 9, 2009, “Vertica does not infringe Claims 1-15 of U.S. Patent No. 5,794,229.” Sybase further acknowledged that because the Court ruled that all the remaining claims in the patent (claims 16-24) were invalid, “Sybase cannot prevail on those claims.”
For those counting along at home — the patent only has 24 claims in total.
I have no idea whether Sybase can still cobble together grounds for appeal, or claims under some other patent. But for now, this sounds like a total victory for Vertica.
Edit: I’ve now seen a PDF of a filing suggesting the grounds under which Sybase will appeal. Basically, it alleges that the judge erred in defining a “page” of data too narrowly. Note that if Sybase prevails on appeal on that point, Vertica has a bunch of other defenses that haven’t been litigated yet. It further seems that Sybase may have recently filed another patent case against Vertica, in a different venue, based on a different patent.
One annoying blog troll excepted, is anybody surprised at this outcome?
| Categories: Columnar database management, Data warehousing, Sybase, Vertica Systems | 4 Comments |
