Amazon and its cloud

Analysis of Amazon’s role in database and analytic technology, especially via the S3/EC2 cloud computing initiative. Also covered are SimpleDB and Amazon’s role as a technology user. Related subjects include:

December 15, 2017

The technology industry is under broad political attack

I apologize for posting a December downer, but this needs to be said.

The technology industry is under attack:

From politicians and political pundits …
… especially from “populists” and/or the political right …
… in the United States and other countries.

These attacks:

Are in some cases specific to internet companies such as Google and Facebook.
In some cases threaten the tech industry more broadly.
Are in some cases part of general attacks on the educated/ professional/“globalist”/”coastal” “elites”.

You’ve surely noticed some of these attacks. But you may not have noticed just how many different attacks and criticisms there are, on multiple levels.

Categories: Amazon and its cloud, Facebook, Google, Public policy

3 Comments

August 17, 2017

More notes on the transition to the cloud

Last year I posted observations about the transition to the cloud. Here are some further thoughts.

0. In case any doubt remained, the big questions about transitioning to the cloud are “When?” and “How?”. “Whether”, by way of contrast, is pretty much settled.

1. The answer to “When?” is generally “Over many years”. In particular, at most enterprises the cloud transition will span multiple CIO’s tenure in their positions.

Few enterprises will ever execute on simple, consistent, unchanging “cloud strategies”.

2. The SaaS (Software as a Service) vs. on-premises tradeoffs are being reargued, except that proponents now spell SaaS C-L-O-U-D. (Ali Ghodsi of Databricks made a particularly energetic version of that case in a recent meeting.)

3. In most countries (at least in the US and the rest of the West), the cloud vendors deemed to matter are Amazon, followed by Microsoft, followed by Google. And so, when it comes to the public cloud, Microsoft is much, much more enterprise-savvy than its key competitors.

Categories: Amazon and its cloud, Cloud computing, Databricks, Spark and BDAS, Google, Microsoft and SQL*Server, Storage

1 Comment

June 14, 2017

Cloudera Altus

I talked with Cloudera before the recent release of Altus. In simplest terms, Cloudera’s cloud strategy aspires to:

Provide all the important advantages of on-premises Cloudera.
Provide all the important advantages of native cloud offerings such as Amazon EMR (Elastic MapReduce, or at least come sufficiently close to that goal.
Benefit from customers’ desire to have on-premises and cloud deployments that work:
- Alike in any case.
- Together, to the extent that that makes use-case sense.

In other words, Cloudera is porting its software to an important new platform.* And this port isn’t complete yet, in that Altus is geared only for certain workloads. Specifically, Altus is focused on “data pipelines”, aka data transformation, aka “data processing”, aka new-age ETL (Extract/Transform/Load). (Other kinds of workload are on the roadmap, including several different styles of Impala use.) So what about that is particularly interesting? Well, let’s drill down.

*Or, if you prefer, improving on early versions of the port.

Categories: Amazon and its cloud, Cloud computing, Cloudera, Databricks, Spark and BDAS, Hadoop, Log analysis, MapReduce, Software as a Service (SaaS)

2 Comments

October 3, 2016

Notes on the transition to the cloud

1. The cloud is super-hot. Duh. And so, like any hot buzzword, “cloud” means different things to different marketers. Four of the biggest things that have been called “cloud” are:

The Amazon cloud, Microsoft Azure, and their competitors, aka public cloud.
Software as a service, aka SaaS.
Co-location in off-premises data centers, aka colo.
On-premises clusters (truly on-prem or colo as the case may be) designed to run a broad variety of applications, aka private cloud.

Further, there’s always the idea of hybrid cloud, in which a vendor peddles private cloud systems (usually appliances) running similar technology stacks to what they run in their proprietary public clouds. A number of vendors have backed away from such stories, but a few are still pushing it, including Oracle and Microsoft.

This is a good example of Monash’s Laws of Commercial Semantics.

2. Due to economies of scale, only a few companies should operate their own data centers, aka true on-prem(ises). The rest should use some combination of colo, SaaS, and public cloud.

This fact now seems to be widely understood.

Categories: Amazon and its cloud, Cloud computing, Data mart outsourcing, Data warehousing, Database diversity, Google, Hadoop, Health care, Investment research and trading, Log analysis, Microsoft and SQL*Server, NoSQL, Software as a Service (SaaS), Solid-state memory, Splunk, Surveillance and privacy, Text, Web analytics

5 Comments

August 28, 2016

Are analytic RDBMS and data warehouse appliances obsolete?

I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:

Many of the independent vendors were swooped up by acquisition.
- IBM bought Netezza.
- Microsoft bought DATAllegro.
- HP bought Vertica.
- Greenplum went to EMC/VMware/Pivotal.
- Teradata bought Aster.
- Actian bought both ParAccel and Vectorwise.
None of those acquisitions was a big success.
- Microsoft did little with DATAllegro.
- Netezza struggled with R&D after being bought by IBM. An IBMer recently told me that their main analytic RDBMS engine was BLU.
- I hear about Vertica more as a technology to be replaced than as a significant ongoing market player.
- Pivotal open-sourced Greenplum. I have detected few people who care.
- Ditto for Actian’s offerings.
- Teradata claimed a few large Aster accounts, but I never hear of Aster as something to compete or partner with.
Smaller vendors fizzled too. Hadapt and Kickfire went to Teradata as more-or-less acquihires. InfiniDB folded. Etc.
Impala and other Hadoop-based alternatives are technology options.
Oracle, Microsoft, IBM and to some extent SAP/Sybase are still pedaling along … but I rarely talk with companies that big. 🙂

Simply reciting all that, however, begs the question of whether one should still care about analytic RDBMS at all.

My answer, in a nutshell, is:

Analytic RDBMS — whether on premises in software, in the form of data warehouse appliances, or in the cloud — are still great for hard-core business intelligence, where “hard-core” can refer to ad-hoc query complexity, reporting/dashboard concurrency, or both. But they aren’t good for much else.

Categories: Actian and Ingres, Amazon and its cloud, Aster Data, Business intelligence, Calpont, Cloud computing, Data warehouse appliances, Data warehousing, Databricks, Spark and BDAS, DATAllegro, EAI, EII, ETL, ELT, ETLT, EMC, Exasol, Greenplum, Hadapt, Hadoop, HP and Neoview, IBM and DB2, In-memory DBMS, Infobright, Kickfire, Kognitio, MemSQL, Microsoft and SQL*Server, Netezza, Open source, Oracle, ParAccel, Predictive modeling and advanced analytics, SAP AG, SQL/Hadoop integration, Sybase, Teradata, VectorWise, Vertica Systems

29 Comments

July 19, 2016

Notes on vendor lock-in

Vendor lock-in is an important subject. Everybody knows that. But few of us realize just how complicated the subject is, nor how riddled it is with paradoxes. Truth be told, I wasn’t fully aware either. But when I set out to write this post, I found that it just kept growing longer.

1. The most basic form of lock-in is:

You do application development for a target set of platform technologies.
Your applications can’t run without those platforms underneath.
Hence, you’re locked into those platforms.

2. Enterprise vendor standardization is closely associated with lock-in. The core idea is that you have a mandate or strong bias toward having different apps run over the same platforms, because:

That simplifies your environment, requiring less integration and interoperability.
That simplifies your staffing; the same skill sets apply to multiple needs and projects.
That simplifies your vendor support relationships; there’s “one throat to choke”.
That simplifies your price negotiation.

3. That last point is double-edged; you have more power over suppliers to whom you give more business, but they also have more power over you. The upshot is often an ELA (Enterprise License Agreement), which commonly works:

For a fixed period of time, the enterprise may use as much of a given product set as they want, with costs fixed in advance.
A few years later, the price is renegotiated, based on then-current levels of usage.

Categories: Amazon and its cloud, Buying processes, Cassandra, Exadata, Facebook, IBM and DB2, Microsoft and SQL*Server, MongoDB, Neo Technology and Neo4j, Open source, Oracle, SAP AG

12 Comments

May 18, 2016

Governments vs. tech companies — it’s complicated

Numerous tussles fit the template:

A government wants access to data contained in one or more devices (mobile/personal or server as the case may be).
The computer’s manufacturer or operator doesn’t want to provide it, for reasons including:
- That’s what customers prefer.
- That’s what other governments require.
- Being pro-liberty is the right and moral choice. (Yes, right and wrong do sometimes actually come into play. 🙂 )

As a general rule, what’s best for any kind of company is — pricing and so on aside — whatever is best or most pleasing for their customers or users. This would suggest that it is in tech companies’ best interest to favor privacy, but there are two important quasi-exceptions: Read more

Categories: Amazon and its cloud, Google, Microsoft and SQL*Server, Surveillance and privacy, Web analytics

2 Comments

January 22, 2016

Cloudera in the cloud(s)

Cloudera released Version 2 of Cloudera Director, which is a companion product to Cloudera Manager focused specifically on the cloud. This led to a discussion about — you guessed it! — Cloudera and the cloud.

Making Cloudera run in the cloud has three major aspects:

Cloudera’s usual software, ported to run on the cloud platform(s).
Cloudera Director, which for example launches cloud instances.
Points of integration, e.g. taking information about security-oriented roles from the platform and feeding then to the role-based security that is specific to Cloudera Enterprise.

Features new in this week’s release of Cloudera Director include:

An API for job submission.
Support for spot and preemptable instances.
High availability.
Kerberos.
Some cluster repair.
Some cluster cloning.

I.e., we’re talking about some pretty basic/checklist kinds of things. Cloudera Director is evidently working for Amazon AWS and Google GCP, and planned for Windows Azure, VMware and OpenStack.

As for porting, let me start by noting: Read more

Categories: Amazon and its cloud, Business intelligence, Cloud computing, Cloudera, Data models and architecture, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Google, Hadoop, Hortonworks, Market share and customer counts, Microsoft and SQL*Server, Predictive modeling and advanced analytics

8 Comments

December 7, 2015

Transitioning to the cloud(s)

There’s a lot of talk these days about transitioning to the cloud, by IT customers and vendors alike. Of course, I have thoughts on the subject, some of which are below.

1. The economies of scale of not running your own data centers are real. That’s the kind of non-core activity almost all enterprises should outsource. Of course, those considerations taken alone argue equally for true cloud, co-location or SaaS (Software as a Service).

2. When the (Amazon) cloud was newer, I used to hear that certain kinds of workloads didn’t map well to the architecture Amazon had chosen. In particular, shared-nothing analytic query processing was necessarily inefficient. But I’m not hearing nearly as much about that any more.

3. Notwithstanding the foregoing, not everybody loves Amazon pricing.

4. Infrastructure vendors such as Oracle would like to also offer their infrastructure to you in the cloud. As per the above, that could work. However:

Is all your computing on Oracle’s infrastructure? Probably not.
Do you want to move the Oracle part and the non-Oracle part to different clouds? Ideally, no.
Do you like the idea of being even more locked in to Oracle than you are now? [Insert BDSM joke here.]
Will Oracle do so much better of a job hosting its own infrastructure that you use its cloud anyway? Well, that’s an interesting question.

Actually, if we replace “Oracle” by “Microsoft”, the whole idea sounds better. While Microsoft doesn’t have a proprietary server hardware story like Oracle’s, many folks are content in the Microsoft walled garden. IBM has fiercely loyal customers as well, and so may a couple of Japanese computer manufacturers.

5. Even when running stuff in the cloud is otherwise a bad idea, there’s still: Read more

Categories: Amazon and its cloud, Cloud computing, Emulation, transparency, portability, IBM and DB2, Microsoft and SQL*Server, Oracle, Pricing

6 Comments

July 7, 2015

Zoomdata and the Vs

Let’s start with some terminology biases:

I dislike the term “big data” but like the Vs that define it — Volume, Velocity, Variety and Variability.
Though I think it’s silly, I understand why BI innovators flee from the term “business intelligence” (they’re afraid of not sounding new).

So when my clients at Zoomdata told me that they’re in the business of providing “the fastest visual analytics for big data”, I understood their choice, but rolled my eyes anyway. And then I immediately started to check how their strategy actually plays against the “big data” Vs.

It turns out that:

Zoomdata does its processing server-side, which allows for load-balancing and scale-out. Scale-out and claims of great query speed are relevant when data is of high volume.
Zoomdata depends heavily on Spark.
Zoomdata’s UI assumes data can be a mix of historical and streaming, and that if looking at streaming data you might want to also check history. This addresses velocity.
Zoomdata assumes data can be in a variety of data stores, including:
- Relational (operational RDBMS, analytic RDBMS, or SQL-on-Hadoop).
- Files (generic HDFS — Hadoop Distributed File System or S3).*
- NoSQL (MongoDB and HBase were mentioned).
- Search (Elasticsearch was mentioned among others).
Zoomdata also tries to detect data variability.
Zoomdata is OEM/embedding-friendly.

*The HDFS/S3 aspect seems to be a major part of Zoomdata’s current story.

Core aspects of Zoomdata’s technical strategy include: Read more

Categories: Amazon and its cloud, Business intelligence, Cloudera, Data warehousing, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Kafka and Confluent, MongoDB, NoSQL, Predictive modeling and advanced analytics, QlikTech and QlikView, SQL/Hadoop integration, Tableau Software, Zoomdata

9 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in