Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
7-10 years ago, I repeatedly argued the viewpoints:
- Relational DBMS were the right choice in most cases.
- Multiple kinds of relational DBMS were needed, optimized for different kinds of use case.
- There were a variety of specialized use cases in which non-relational data models were best.
Since then, however:
- Hadoop has flourished.
- NoSQL has flourished.
- Graph DBMS have matured somewhat.
- Much of the action has shifted to machine-generated data, of which there are many kinds.
So it’s probably best to revisit all that in a somewhat organized way.
While I don’t find the Open Data Platform thing very significant, an associated piece of news seems cooler — Pivotal is open sourcing a bunch of software, with Greenplum as the crown jewel. Notes on that start:
- Greenplum has been an on-again/off-again low-cost player since before its acquisition by EMC, but open source is basically a commitment to having low license cost be permanently on.
- In most regards, “free like beer” is what’s important here, not “free like speech”. I doubt non-Pivotal employees are going to do much hacking on the long-closed Greenplum code base.
- That said, Greenplum forked PostgreSQL a long time ago, and the general PostgreSQL community might gain ideas from some of the work Greenplum has done.
- The only other bit of newly open-sourced stuff I find interesting is HAWQ. Redis was already open source, and I’ve never been persuaded to care about GemFire.
Greenplum, let us recall, is a pretty decent MPP (Massively Parallel Processing) analytic RDBMS. Various aspects of it were oversold at various times, and I’ve never heard that they actually licked concurrency. But Greenplum has long had good SQL coverage and petabyte-scale deployments and a columnar option and some in-database analytics and so on; i.e., it’s legit. When somebody asks me about open source analytic RDBMS to consider, I expect Greenplum to consistently be on the short list.
Further, the low-cost alternatives for analytic RDBMS are adding up. Read more
|Categories: Amazon and its cloud, Citus Data, Data warehouse appliances, EAI, EII, ETL, ELT, ETLT, EMC, Greenplum, Hadoop, Infobright, MonetDB, Open source, Pricing||5 Comments|
I spent a day with Teradata in Rancho Bernardo last week. Most of what we discussed is confidential, but I think the non-confidential parts and my general impressions add up to enough for a post.
First, let’s catch up with some personnel gossip. So far as I can tell:
- Scott Gnau runs most of Teradata’s development, product management, and product marketing, the big exception being that …
- … Darryl McDonald run the apps part (Aprimo and so on), and no longer is head of marketing.
- Oliver Ratzesberger runs Teradata’s software development.
- Jeff Carter has returned to his roots and runs the hardware part, in place of Carson Schmidt.
- Aster founders Mayank Bawa and Tasso Argyros have left Teradata (perhaps some earn-out period ended).
- Carson is temporarily running Aster development (in place of Mayank), and has some sort of evangelism role waiting after that.
- With the acquisition of Hadapt, Teradata gets some attention from Dan Abadi. Also, they’re retaining Justin Borgman.
The biggest change in my general impressions about Teradata is that they’re having smart thoughts about the cloud. At least, Oliver is. All details are confidential, and I wouldn’t necessarily expect them to become clear even in October (which once again is the month for Teradata’s user conference). My main concern about all that is whether Teradata’s engineering team can successfully execute on Oliver’s directives. I’m optimistic, but I don’t have a lot of detail to support my good feelings.
In some quick-and-dirty positioning and sales qualification notes, which crystallize what we already knew before:
- The Teradata 1xxx series is focused on cost-per-bit.
- The Teradata 2xxx series is focused on cost-per-query. It is commonly Teradata’s “lead” product, at least for new customers.
- The Teradata 6xxx series is supposed to be able to do “everything”.
- The Teradata Aster “Discovery Analytics” platform is sold mainly to customers who have a specific high-value problem to solve. (Randy Lea gave me a nice round dollar number, but I won’t share it.) I like that approach, as it obviates much of the concern about “Wait — is this strategic for us long-term, given that we also have both Teradata database and Hadoop clusters?”
Also: Read more
|Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadapt, Hadoop, MapReduce, Solid-state memory, Teradata||2 Comments|
As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data warehouse appliances), NoSQL/non-SQL short-request DBMS, MySQL, PostgreSQL, NewSQL and Hadoop.
DBMS rarely have trouble with the criterion “Is there an identifiable buying process?” If an enterprise is doing application development projects, a DBMS is generally chosen for each one. And so the organization will generally have a process in place for buying DBMS, or accepting them for free. Central IT, departments, and — at least in the case of free open source stuff — developers all commonly have the capacity for DBMS acquisition.
In particular, at many enterprises either departments have the ability to buy their own analytic technology, or else IT will willingly buy and administer things for a single department. This dynamic fueled much of the early rise of analytic RDBMS.
Buyer inertia is a greater concern.
- A significant minority of enterprises are highly committed to their enterprise DBMS standards.
- Another significant minority aren’t quite as committed, but set pretty high bars for new DBMS products to cross nonetheless.
- FUD (Fear, Uncertainty and Doubt) about new DBMS is often justifiable, about stability and consistent performance alike.
A particularly complex version of this dynamic has played out in the market for analytic RDBMS/appliances.
- First the newer products (from Netezza onwards) were sold to organizations who knew they wanted great performance or price/performance.
- Then it became more about selling “business value” to organizations who needed more convincing about the benefits of great price/performance.
- Then the behemoth vendors became more competitive, as Teradata introduced lower-price models, Oracle introduced Exadata, Sybase got more aggressive with Sybase IQ, IBM bought Netezza, EMC bought Greenplum, HP bought Vertica and so on. It is now hard for a non-behemoth analytic RDBMS vendor to make headway at large enterprise accounts.
- Meanwhile, Hadoop has emerged as serious competitor for at least some analytic data management, especially but not only at internet companies.
Otherwise I’d say: Read more
The Spark buzz keeps increasing; almost everybody I talk with expects Spark to win big, probably across several use cases.
Disclosure: I’ll soon be in a substantial client relationship with Databricks, hoping to improve their stealth-mode marketing.
The “real-time analytics” gold rush I called out last year continues. A large fraction of the vendors I talk with have some variant of “real-time analytics” as a central message.
Hadapt laid off its sales and marketing folks, and perhaps some engineers as well. In a nutshell, Hadapt’s approach to SQL-on-Hadoop wasn’t selling vs. the many alternatives, and Hadapt is doubling down on poly-structured data*/schema-on-need.
*While Hadapt doesn’t to my knowledge use the term “poly-structured data”, some other vendors do. And so I may start using it more myself, at least when the poly-structured/multi-structured distinction actually seems significant.
WibiData is partnering with DataStax, WibiData is of course pleased to get access to Cassandra’s user base, which gave me the opportunity to ask why they thought Cassandra had beaten HBase in those accounts. The answer was performance and availability, while Cassandra’s traditional lead in geo-distribution wasn’t mentioned at all.
Disclosure: My fingerprints are all over that deal.
In other news, WibiData has had some executive departures as well, but seems to be staying the course on its strategy. I continue to think that WibiData has a really interesting vision about how to do large-data-volume interactive computing, and anybody in that space would do well to talk with them or at least look into the open source projects WibiData sponsors.
I encountered another apparently-popular machine-learning term — bandit model. It seems to be glorified A/B testing, and it seems to be popular. I think the point is that it tries to optimize for just how much you invest in testing unproven (for good or bad) alternatives.
I had an awkward set of interactions with Gooddata, including my longest conversations with them since 2009. Gooddata is in the early days of trying to offer an all-things-to-all-people analytic stack via SaaS (Software as a Service). I gather that Hadoop, Vertica, PostgreSQL (a cheaper Vertica alternative), Spark, Shark (as a faster version of Hive) and Cassandra (under the covers) are all in the mix — but please don’t hold me to those details.
I continue to think that computing is moving to a combination of appliances, clusters, and clouds. That said, I recently bought a new gaming-class computer, and spent many hours gaming on it just yesterday.* I.e., there’s room for general-purpose workstations as well. But otherwise, I’m not hearing anything that contradicts my core point.
*The last beta weekend for The Elder Scrolls Online; I loved Morrowind.
From time to time I like to do “what I’m working on” posts. From my recent blogging, you probably already know that includes:
- Hadoop (always, and please see below).
- Analytic RDBMS (ditto).
- NoSQL and NewSQL.
- Specifically, SQL-on-Hadoop
- Spark and other memory-centric technology, including streaming.
- Public policy, mainly but not only in the area of surveillance/privacy.
- General strategic advice for all sizes of tech company.
Other stuff on my mind includes but is not limited to:
1. Certain categories of buying organizations are inherently leading-edge.
- Internet companies have adopted Hadoop, NoSQL, NewSQL and all that en masse. Often, they won’t even look at things that are conventional or expensive.
- US telecom companies have been buying 1 each of every DBMS on the market since pre-relational days.
- Financial services firms — specifically algorithmic traders and broker-dealers — have been in their own technical world for decades …
- … as have national-security agencies …
- … as have pharmaceutical research departments.
Fine. But what really intrigues me is when more ordinary enterprises also put leading-edge technologies into production. I pester everybody for examples of that.
I think that most sufficiently large enterprise SaaS vendors should offer an appliance option, as an alternative to the core multi-tenant service. In particular:
- SaaS appliances address customer fears about security, privacy, compliance, performance isolation, and lock-in.
- Some of these benefits occur even if the appliance runs in the same data centers that host the vendor’s standard multi-tenant SaaS. Most of the rest occur if the customer can choose a co-location facility in which to place the appliance.
- Whether many customers should or will use the SaaS appliance option is somewhat secondary; it’s a check-mark item. I.e., many customers and prospects will be pleased that the option at least exists.
How I reached them
Core reasons for selling or using SaaS (Software as a Service) as opposed to licensed software start:
- The SaaS vendor handles all software upgrades, and makes them promptly. In principle, this benefit could also be achieved on a dedicated system on customer premises (or at the customer’s choice of co-location facility).
- In addition, the SaaS vendor handles all the platform and operational stuff — hardware, operating system, computer room, etc. This benefit is antithetical to direct customer control.
- The SaaS vendor only has to develop for and operate on a tightly restricted platform stack that it knows very well. This benefit is also enjoyed in the case of customer-premises appliances.
Conceptually, then, customer-premises SaaS is not impossible, even though one of the standard Big Three SaaS benefits is lost. Indeed:
- Microsoft Windows and many other client software packages already offer to let their updates be automagically handled by the vendor.
- In that vein, consumer devices such as game consoles already are a kind of SaaS appliance.
- Complex devices of any kind, including computers, will see ever more in the way of “phone-home” features or optional services, often including routine maintenance and upgrades.
But from an enterprise standpoint, that’s all (relatively) simple stuff. So we’re left with a more challenging question — does customer-premises SaaS make sense in the case of enterprise applications or other server software?
|Categories: Data warehouse appliances, HP and Neoview, salesforce.com, Software as a Service (SaaS), Surveillance and privacy||6 Comments|
Generalizing about SaaS (Software as a Service) is hard. To prune some of the confusion, let’s start by noting:
- SaaS has been around for over half a century, and at times has been the dominant mode of application delivery.
- The term multi-tenancy is being used in several different ways.
- Multi-tenancy, in the purest sense, is inessential to SaaS. It’s simply an implementation choice that has certain benefits for the SaaS provider. And by the way, …
- … salesforce.com, the chief proponent of the theory that true multi-tenancy is the hallmark of true SaaS, abandoned that position this week.
- Internet-based services are commonly, if you squint a little, SaaS. Examples include but are hardly limited to Google, Twitter, Dropbox, Intuit, Amazon Web Services, and the company that hosts this blog (KnownHost).
- Some of the core arguments for SaaS’ rise, namely the various efficiencies of data center outsourcing and scale, apply equally to the public cloud, to SaaS, and to AEaaS (Anything Else as a Service).
- These benefits are particularly strong for inherently networked use cases. For example, you really don’t want to be hosting your website yourself. And salesforce.com got its start supporting salespeople who worked out of remote offices.
- In theory and occasionally in practice, certain SaaS benefits, namely the outsourcing of software maintenance and updates, could be enjoyed on-premises as well. Whether I think that could be a bigger deal going forward will be explored in future posts.
For smaller enterprises, the core outsourcing argument is compelling. How small? Well:
- What’s the minimum level of IT operations headcount needed for mission-critical systems? Let’s just say “several”.
- What does that cost? Fully burdened, somewhere in the six figures.
- What fraction of the IT budget should such headcount be? As low a double digit percentage as possible.
- What fraction of revenues should be spent on IT? Some single-digit percentage.
So except for special cases, an enterprise with less than $100 million or so in revenue may have trouble affording on-site data processing, at least at a mission-critical level of robustness. It may well be better to use NetSuite or something like that, assuming needed features are available in SaaS form.*
|Categories: Amazon and its cloud, Buying processes, Cloud computing, Data mart outsourcing, Data warehouse appliances, Data warehousing, Infobright, Netezza, Pricing, salesforce.com, Software as a Service (SaaS), Workday||4 Comments|
Relational DBMS used to be fairly straightforward product suites, which boiled down to:
- A big SQL interpreter.
- A bunch of administrative and operational tools.
- Some very optional add-ons, often including an application development tool.
Now, however, most RDBMS are sold as part of something bigger.
- Oracle has hugely thickened its stack, as part of an Innovator’s Solution strategy — hardware, middleware, applications, business intelligence, and more.
- IBM has moved aggressively to a bundled “appliance” strategy. Even before that, IBM DB2 long sold much better to committed IBM accounts than as a software-only offering.
- Microsoft SQL Server is part of a stack, starting with the Windows operating system.
- Sybase was an exception to this rule, with thin(ner) stacks for both Adaptive Server Enterprise and Sybase IQ. But Sybase is now owned by SAP, and increasingly integrated as a business with …
- … SAP HANA, which is closely associated with SAP’s applications.
- Teradata has always been a hardware/software vendor. The most successful of its analytic DBMS rivals, in some order, are:
- Netezza, a pure appliance vendor, now part of IBM.
- Greenplum, an appliance-mainly vendor for most (not all) of its existence, and in particular now as a part of EMC Pivotal.
- Vertica, more of a software-only vendor than the others, but now owned by and increasingly mainstreamed into hardware vendor HP.
- MySQL’s glory years were as part of the “LAMP” stack.
- Various thin-stack RDBMS that once were or could have been important market players … aren’t. Examples include Progress OpenEdge, IBM Informix, and the various strays adopted by Actian.
Some subjects just keep coming up. And so I keep saying things like:
Most generalizations about “Big Data” are false. “Big Data” is a horrific catch-all term, with many different meanings.
Most generalizations about Hadoop are false. Reasons include:
- Hadoop is a collection of disparate things, most particularly data storage and application execution systems.
- The transition from Hadoop 1 to Hadoop 2 will be drastic.
- For key aspects of Hadoop — especially file format and execution engine — there are or will be widely varied options.
Hadoop won’t soon replace relational data warehouses, if indeed it ever does. SQL-on-Hadoop is still very immature. And you can’t replace data warehouses unless you have the power of SQL.
Note: SQL isn’t the only way to provide “the power of SQL”, but alternative approaches are just as immature.
Most generalizations about NoSQL are false. Different NoSQL products are … different. It’s not even accurate to say that all NoSQL systems lack SQL interfaces. (For example, SQL-on-Hadoop often includes SQL-on-HBase.)