Teradata

Analysis of data warehousing giant Teradata. Related subjects include:

June 16, 2017

Generally available Kudu

I talked with Cloudera about Kudu in early May. Besides giving me a lot of information about Kudu, Cloudera also helped confirm some trends I’m seeing elsewhere, including:

Security is an ever bigger deal.
There’s a lot of interest in data warehouses (perhaps really data marts) that are updated in human real-time.
- Prospects for that respond well to the actual term “data warehouse”, at least when preceded by some modifier to suggest that it’s modern/low-latency/non-batch or whatever.
- Flash is often — but not yet always — preferred over disk for that kind of use.
- Sometimes these data stores are greenfield. When they’re migrations, they come more commonly from analytic RDBMS or data warehouse appliance (the most commonly mentioned ones are Teradata, Netezza and Vertica, but that’s perhaps just due to those product lines’ market share), rather than from general purpose DBMS such as Oracle or SQL Server.
Intel is making it ever easier to vectorize CPU operations, and analytic data managers are increasingly taking advantage of this possibility.

Now let’s talk about Kudu itself. As I discussed at length in September 2015, Kudu is:

A data storage system introduced by Cloudera (and subsequently open-sourced).
Columnar.
Updatable in human real-time.
Meant to serve as the data storage tier for Impala and Spark.

Kudu’s adoption and roll-out story starts: Read more

Categories: Cloudera, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Databricks, Spark and BDAS, Hadoop, Market share and customer counts, Netezza, NoSQL, Open source, Solid-state memory, SQL/Hadoop integration, Teradata, Vertica Systems

1 Comment

August 28, 2016

Are analytic RDBMS and data warehouse appliances obsolete?

I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:

Many of the independent vendors were swooped up by acquisition.
- IBM bought Netezza.
- Microsoft bought DATAllegro.
- HP bought Vertica.
- Greenplum went to EMC/VMware/Pivotal.
- Teradata bought Aster.
- Actian bought both ParAccel and Vectorwise.
None of those acquisitions was a big success.
- Microsoft did little with DATAllegro.
- Netezza struggled with R&D after being bought by IBM. An IBMer recently told me that their main analytic RDBMS engine was BLU.
- I hear about Vertica more as a technology to be replaced than as a significant ongoing market player.
- Pivotal open-sourced Greenplum. I have detected few people who care.
- Ditto for Actian’s offerings.
- Teradata claimed a few large Aster accounts, but I never hear of Aster as something to compete or partner with.
Smaller vendors fizzled too. Hadapt and Kickfire went to Teradata as more-or-less acquihires. InfiniDB folded. Etc.
Impala and other Hadoop-based alternatives are technology options.
Oracle, Microsoft, IBM and to some extent SAP/Sybase are still pedaling along … but I rarely talk with companies that big. 🙂

Simply reciting all that, however, begs the question of whether one should still care about analytic RDBMS at all.

My answer, in a nutshell, is:

Analytic RDBMS — whether on premises in software, in the form of data warehouse appliances, or in the cloud — are still great for hard-core business intelligence, where “hard-core” can refer to ad-hoc query complexity, reporting/dashboard concurrency, or both. But they aren’t good for much else.

Categories: Actian and Ingres, Amazon and its cloud, Aster Data, Business intelligence, Calpont, Cloud computing, Data warehouse appliances, Data warehousing, Databricks, Spark and BDAS, DATAllegro, EAI, EII, ETL, ELT, ETLT, EMC, Exasol, Greenplum, Hadapt, Hadoop, HP and Neoview, IBM and DB2, In-memory DBMS, Infobright, Kickfire, Kognitio, MemSQL, Microsoft and SQL*Server, Netezza, Open source, Oracle, ParAccel, Predictive modeling and advanced analytics, SAP AG, SQL/Hadoop integration, Sybase, Teradata, VectorWise, Vertica Systems

29 Comments

January 14, 2016

BI and quasi-DBMS

I’m on two overlapping posting kicks, namely “lessons from the past” and “stuff I keep saying so might as well also write down”. My recent piece on Oracle as the new IBM is an example of both themes. In this post, another example, I’d like to memorialize some points I keep making about business intelligence and other analytics. In particular:

BI relies on strong data access capabilities. This is always true. Duh.
Therefore, BI and other analytics vendors commonly reinvent the data management wheel. This trend ebbs and flows with technology cycles.

Similarly, BI has often been tied to data integration/ETL (Extract/Transform/Load) functionality.* But I won’t address that subject further at this time.

*In the Hadoop/Spark era, that’s even truer of other analytics than it is of BI.

My top historical examples include:

The 1970s analytic fourth-generation languages (RAMIS, NOMAD, FOCUS, et al.) commonly combined reporting and data management.
The best BI visualization technology of the 1980s, Executive Information Systems (EIS), was generally unsuccessful. The core reason was a lack of what we’d now call drilldown. Not coincidentally, EIS vendors — notably leader Comshare — didn’t do well at DBMS-like technology.
Business Objects, one of the pioneers of the modern BI product category, rose in large part on the strength of its “semantic layer” technology. (If you don’t know what that is, you can imagine it as a kind of virtual data warehouse modest enough in its ambitions to actually be workable.)
Cognos, the other pioneer of modern BI, depending on capabilities for which it needed a bundled MOLAP (Multidimensional OnLine Analytic Processing) engine.
But Cognos later stopped needing that engine, which underscores my point about technology ebbing and flowing.

Categories: Business intelligence, Business Objects, Cognos, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Hadoop, Information Builders, MicroStrategy, Software as a Service (SaaS), Teradata

5 Comments

June 8, 2015

Teradata will support Presto

At the highest level:

Presto is, roughly speaking, Facebook’s replacement for Hive, at least for queries that are supposed to run at interactive speeds.
Teradata is announcing support for Presto with a classic open source pricing model.
Presto will also become, roughly speaking, Teradata’s replacement for Hive.
Teradata’s Presto efforts are being conducted by the former Hadapt.

Now let’s make that all a little more precise.

Regarding Presto (and I got most of this from Teradata)::

To a first approximation, Presto is just another way to write SQL queries against HDFS (Hadoop Distributed File System). However …
… Presto queries other data stores too, such as various kinds of RDBMS, and federates query results.
Facebook at various points in time created both Hive and now Presto.
Facebook started the Presto project in 2012 and now has 10 engineers on it.
Teradata has named 16 engineers – all from Hadapt – who will be contributing to Presto.
Known serious users of Presto include Facebook, Netflix, Groupon and Airbnb. Airbnb likes Presto well enough to have 1/3 of its employees using it, via an Airbnb-developed tool called Airpal.
Facebook is known to have a cluster cited at 300 petabytes and 4000 users where Presto is presumed to be a principal part of the workload.

Daniel Abadi said that Presto satisfies what he sees as some core architectural requirements for a modern parallel analytic RDBMS project: Read more

Categories: Cloudera, Columnar database management, Data integration and middleware, Data pipelining, Data warehousing, Facebook, Hadapt, Hadoop, MapReduce, Market share and customer counts, Open source, Petabyte-scale data management, SQL/Hadoop integration, Teradata, Workload management

19 Comments

December 7, 2014

Notes on the Hortonworks IPO S-1 filing

Given my stock research experience, perhaps I should post about Hortonworks’ initial public offering S-1 filing. 🙂 For starters, let me say:

Hortonworks’ subscription revenues for the 9 months ended last September 30 appear to be:
- $11.7 million from everybody but Microsoft, …
- … plus $7.5 million from Microsoft, …
- … for a total of $19.2 million.
Hortonworks states subscription customer counts (as per Page 55 this includes multiple “customers” within the same organization) of:
- 2 on April 30, 2012.
- 9 on December 31, 2012.
- 25 on April 30, 2013.
- 54 on September 30, 2013.
- 95 on December 31, 2013.
- 233 on September 30, 2014.
Per Page 70, Hortonworks’ total September 30, 2014 customer count was 292, including professional services customers.
Non-Microsoft subscription revenue in the quarter ended September 30, 2014 seems to have been $5.6 million, or $22.5 million annualized. This suggests Hortonworks’ average subscription revenue per non-Microsoft customer is a little over $100K/year.
This IPO looks to be a sharply “down round” vs. Hortonworks’ Series D financing earlier this year.
- In March and June, 2014, Hortonworks sold stock that subsequently was converted into 1/2 a Hortonworks share each at $12.1871 per share.
- The tentative top of the offering’s price range is $14/share.
- That’s also slightly down from the Series C price in mid-2013.

And, perhaps of interest only to me — there are approximately 50 references to YARN in the Hortonworks S-1, but only 1 mention of Tez.

Categories: Hadoop, Hortonworks, HP and Neoview, Market share and customer counts, Microsoft and SQL*Server, Pricing, Teradata, Yahoo

8 Comments

November 15, 2014

Technical differentiation

I commonly write about real or apparent technical differentiation, in a broad variety of domains. But actually, computers only do a couple of kinds of things:

Accept instructions.
Execute them.

And hence almost all IT product differentiation fits into two buckets:

Easier instruction-giving, whether that’s in the form of a user interface, a language, or an API.
Better execution, where “better” usually boils down to “faster”, “more reliable” or “more reliably fast”.

As examples of this reductionism, please consider:

Application development is of course a matter of giving instructions to a computer.
Database management systems accept and execute data manipulation instructions.
Data integration tools accept and execute data integration instructions.
System management software accepts and executes system management instructions.
Business intelligence tools accept and execute instructions for data retrieval, navigation, aggregation and display.

Similar stories are true about application software, or about anything that has an API (Application Programming Interface) or SDK (Software Development Kit).

Yes, all my examples are in software. That’s what I focus on. If I wanted to be more balanced in including hardware or data centers, I might phrase the discussion a little differently — but the core points would still remain true.

What I’ve said so far should make more sense if we combine it with the observation that differentiation is usually restricted to particular domains. Read more

Categories: Business intelligence, Data warehousing, Hadoop, Teradata

4 Comments

September 7, 2014

An idealized log management and analysis system — from whom?

I’ve talked with many companies recently that believe they are:

Focused on building a great data management and analytic stack for log management …
… unlike all the other companies that might be saying the same thing 🙂 …
… and certainly unlike expensive, poorly-scalable Splunk …
… and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.

At best, I think such competitive claims are overwrought. Still, it’s a genuinely important subject and opportunity, so let’s consider what a great log management and analysis system might look like.

Much of this discussion could apply to machine-generated data in general. But right now I think more players are doing product management with an explicit conception either of log management or event-series analytics, so for this post I’ll share that focus too.

A short answer might be “Splunk, but with more analytic functionality and more scalable performance, at lower cost, plus numerous coupons for free pizza.” A more constructive and bottoms-up approach might start with: Read more

Categories: Aleri and Coral8, Business intelligence, Cloudera, Data models and architecture, Data warehousing, Databricks, Spark and BDAS, Hadapt, Hadoop, Hortonworks, HP and Neoview, IBM and DB2, Log analysis, MapR, Microsoft and SQL*Server, Oracle, Platfora, Predictive modeling and advanced analytics, SAP AG, Schema on need, SenSage, Splunk, Streaming and complex event processing (CEP), Teradata, Vertica Systems, Web analytics, WibiData, Workload management

10 Comments

August 31, 2014

Notes from a visit to Teradata

I spent a day with Teradata in Rancho Bernardo last week. Most of what we discussed is confidential, but I think the non-confidential parts and my general impressions add up to enough for a post.

First, let’s catch up with some personnel gossip. So far as I can tell:

Scott Gnau runs most of Teradata’s development, product management, and product marketing, the big exception being that …
… Darryl McDonald run the apps part (Aprimo and so on), and no longer is head of marketing.
Oliver Ratzesberger runs Teradata’s software development.
Jeff Carter has returned to his roots and runs the hardware part, in place of Carson Schmidt.
Aster founders Mayank Bawa and Tasso Argyros have left Teradata (perhaps some earn-out period ended).
Carson is temporarily running Aster development (in place of Mayank), and has some sort of evangelism role waiting after that.
With the acquisition of Hadapt, Teradata gets some attention from Dan Abadi. Also, they’re retaining Justin Borgman.

The biggest change in my general impressions about Teradata is that they’re having smart thoughts about the cloud. At least, Oliver is. All details are confidential, and I wouldn’t necessarily expect them to become clear even in October (which once again is the month for Teradata’s user conference). My main concern about all that is whether Teradata’s engineering team can successfully execute on Oliver’s directives. I’m optimistic, but I don’t have a lot of detail to support my good feelings.

In some quick-and-dirty positioning and sales qualification notes, which crystallize what we already knew before:

The Teradata 1xxx series is focused on cost-per-bit.
The Teradata 2xxx series is focused on cost-per-query. It is commonly Teradata’s “lead” product, at least for new customers.
The Teradata 6xxx series is supposed to be able to do “everything”.
The Teradata Aster “Discovery Analytics” platform is sold mainly to customers who have a specific high-value problem to solve. (Randy Lea gave me a nice round dollar number, but I won’t share it.) I like that approach, as it obviates much of the concern about “Wait — is this strategic for us long-term, given that we also have both Teradata database and Hadoop clusters?”

2 Comments

July 23, 2014

Teradata bought Hadapt and Revelytix

My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete until at least September, although I do expect some kind of stated direction in time for its October user conference.

*I love my business, but it does have one distressing aspect, namely the combination of subscription pricing and customer churn. When your customers transform really quickly, or even go out of existence, so sometimes does their reliance on you.

I’ve written extensively about Hadapt, but to review:

The HadoopDB project was started by Dan Abadi and two grad students.
HadoopDB tied a bunch of PostgreSQL instances together with Hadoop MapReduce. Lab benchmarks suggested it was more performant than the coyly named DBx (where x=2), but not necessarily competitive with top analytic RDBMS.
Hadapt was formed to commercialize HadoopDB.
After some fits and starts, Hadapt was a Cambridge-based company. Former Vertica CEO Chris Lynch invested even before he was a VC, and became an active chairman. Not coincidentally, Hadapt had a bunch of Vertica folks.
Hadapt decided to stick with row-based PostgreSQL, Dan Abadi’s previous columnar enthusiasm notwithstanding. Not coincidentally, Hadapt’s performance never blew anyone away.
Especially after the announcement of Cloudera Impala, Hadapt’s SQL-on-Hadoop positioning didn’t work out. Indeed, Hadapt laid off most or all of its sales and marketing folks. Hadapt pivoted to emphasize its schema-on-need story.
Chris Lynch, who generally seems to think that IT vendors are created to be sold, shopped Hadapt aggressively.

As for what Teradata should do with Hadapt: Read more

Categories: Aster Data, Citus Data, Cloudera, Columnar database management, Data warehousing, Hadapt, Hadoop, MapReduce, Oracle, SQL/Hadoop integration, Teradata

8 Comments

July 14, 2014

21st Century DBMS success and failure

As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data warehouse appliances), NoSQL/non-SQL short-request DBMS, MySQL, PostgreSQL, NewSQL and Hadoop.

DBMS rarely have trouble with the criterion “Is there an identifiable buying process?” If an enterprise is doing application development projects, a DBMS is generally chosen for each one. And so the organization will generally have a process in place for buying DBMS, or accepting them for free. Central IT, departments, and — at least in the case of free open source stuff — developers all commonly have the capacity for DBMS acquisition.

In particular, at many enterprises either departments have the ability to buy their own analytic technology, or else IT will willingly buy and administer things for a single department. This dynamic fueled much of the early rise of analytic RDBMS.

Buyer inertia is a greater concern.

A significant minority of enterprises are highly committed to their enterprise DBMS standards.
Another significant minority aren’t quite as committed, but set pretty high bars for new DBMS products to cross nonetheless.
FUD (Fear, Uncertainty and Doubt) about new DBMS is often justifiable, about stability and consistent performance alike.

A particularly complex version of this dynamic has played out in the market for analytic RDBMS/appliances.

First the newer products (from Netezza onwards) were sold to organizations who knew they wanted great performance or price/performance.
Then it became more about selling “business value” to organizations who needed more convincing about the benefits of great price/performance.
Then the behemoth vendors became more competitive, as Teradata introduced lower-price models, Oracle introduced Exadata, Sybase got more aggressive with Sybase IQ, IBM bought Netezza, EMC bought Greenplum, HP bought Vertica and so on. It is now hard for a non-behemoth analytic RDBMS vendor to make headway at large enterprise accounts.
Meanwhile, Hadoop has emerged as serious competitor for at least some analytic data management, especially but not only at internet companies.

Otherwise I’d say: Read more

Categories: Business intelligence, Buying processes, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, EMC, EnterpriseDB and Postgres Plus, Exadata, Greenplum, Hadoop, HP and Neoview, IBM and DB2, In-memory DBMS, MarkLogic, Microsoft and SQL*Server, Mid-range, MySQL, NewSQL, NoSQL, OLTP, Open source, PostgreSQL, SAP AG, Sybase, Teradata, Vertica Systems, Workload management

2 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in