Merv Adrian and Doug Henschen both reported more details about Amazon Redshift than I intend to; see also the comments on Doug’s article. I did talk with Rick Glick of ParAccel a bit about the project, and he noted:
- Amazon Redshift is missing parts of ParAccel, notably the extensibility framework.
- ParAccel did some engineering to make its DBMS run better in the cloud.
- Amazon did some engineering in the areas it knows better than ParAccel — cloud provisioning, cloud billing, and so on.
“We didn’t want to do the deal on those terms” comments from other companies suggest ParAccel’s main financial take from the deal is an already-reported venture investment.
The cloud-related engineering was mainly around communications, e.g. strengthening error detection/correction to make up for the lack of dedicated switches. In general, Rick seemed more positive on running in the (Amazon) cloud than analytic RDBMS vendors have been in the past.
So who should and will use Amazon Redshift? For starters, I’d say:
- If data isn’t already in the Amazon cloud, getting it there remains a pain. Locating your analytic RDBMS on the same premises where the data is created makes life simpler.
- Over 3 years ago, $20,000/terabyte was a great list price for purchasing a data warehouse appliance that required little administration. Imagine negotiated discounts and further declines from there. Even so, Amazon’s <$1K/terabyte/year is a low figure.
- Amazon’s marketing suggests companies should put their whole data warehousing on Redshift. But in fact, that almost never happens even with ParAccel.
Also — if Amazon Redshift is your analytic RDBMS, what’s the rest of your analytic environment? I can think of three possibilities that could work pretty straightforwardly:
- Business intelligence and just BI.
- Statistics and just statistics.
- Hadoop (i.e. Elastic MapReduce) plus a lot of hand-coding.
Anything else would seem hard to stitch together at this time.
Putting that together, I see three kinds of users for whom Amazon Redshift might make sense:
- Web startups, whose data is all in the Amazon cloud anyway, and who need better analytic SQL performance than they can get from Hadoop.
- Data mart outsourcers/data sellers, again probably startups, whose whole business is in the cloud.
- Individual analysts with small budgets, or very small analytic groups within enterprises or other organizations.
All three of those are “traditional” markets for new-generation analytic DBMS and data warehouse appliances, except that those DBMS are rarely put into production in the cloud. But for the most part, vendors have moved upscale — enterprise users, analytic platform features, etc. So the biggest threat from Amazon Redshift is to markets that other vendors have somewhat left behind.
So how should and will the analytic RDBMS industry respond? My thoughts on that begin:
- Doing nothing would be a poor choice.
- They’re already open to having cheap or free low-end offerings — Vertica Community Edition, open-source Infobright, and so on.
- Tweaking their systems to work well in the cloud becomes easier all the time, as cloud platforms mature.
- A natural solution would be something like a Starter/Standard/Enterprise Edition split, with at least the Starter and Standard Editions being cloud-friendly.