December 15, 2017

The technology industry is under broad political attack

I apologize for posting a December downer, but this needs to be said.

The technology industry is under attack:

From politicians and political pundits …
… especially from “populists” and/or the political right …
… in the United States and other countries.

These attacks:

Are in some cases specific to internet companies such as Google and Facebook.
In some cases threaten the tech industry more broadly.
Are in some cases part of general attacks on the educated/ professional/“globalist”/”coastal” “elites”.

You’ve surely noticed some of these attacks. But you may not have noticed just how many different attacks and criticisms there are, on multiple levels.

Categories: Amazon and its cloud, Facebook, Google, Public policy

4 Comments

December 12, 2017

Notes on artificial intelligence, December 2017

Most of my comments about artificial intelligence in December, 2015 still hold true. But there are a few points I’d like to add, reiterate or amplify.

1. As I wrote back then in a post about the connection between machine learning and the rest of AI,

It is my opinion that most things called “intelligence” — natural and artificial alike — have a great deal to do with pattern recognition and response.

2. Accordingly, it can be reasonable to equate machine learning and AI.

AI based on machine learning frequently works, on more than a toy level. (Examples: Various projects by Google)
AI based on knowledge representation usually doesn’t. (Examples: IBM Watson, 1980s expert systems)
“AI” can be the sexier marketing or fund-raising term.

3. Similarly, it can be reasonable to equate AI and pattern recognition. Glitzy applications of AI include:

Understanding or translation of language (written or spoken as the case may be).
Machine vision or autonomous vehicles.
Facial recognition.
Disease diagnosis via radiology interpretation.

4. The importance of AI and of recent AI advances differs greatly according to application or data category. Read more

Categories: Cloud computing, Predictive modeling and advanced analytics, Public policy, Surveillance and privacy

4 Comments

August 22, 2017

Imanis Data

I talked recently with the folks at Imanis Data. For starters:

The point of Imanis is to make copies of your databases, for purposes such as backup/restore, test/analysis, or compliance-driven archiving. (That’s in declining order of current customer activity.) Another use is migration via restoring to a different cluster than the one that created the data in the first place.
The data can come from NoSQL database managers, from Hadoop, or from Vertica. (Again, that’s in declining order.)
As you might imagine, Imanis makes incremental backups; the only full backup is the first one you do for that database.
“Imanis” is a new name; the previous name was “Talena”.

Categories: Cassandra, Hadoop, Market share and customer counts, NoSQL, Predictive modeling and advanced analytics, Vertica Systems

1 Comment

August 17, 2017

More notes on the transition to the cloud

Last year I posted observations about the transition to the cloud. Here are some further thoughts.

0. In case any doubt remained, the big questions about transitioning to the cloud are “When?” and “How?”. “Whether”, by way of contrast, is pretty much settled.

1. The answer to “When?” is generally “Over many years”. In particular, at most enterprises the cloud transition will span multiple CIO’s tenure in their positions.

Few enterprises will ever execute on simple, consistent, unchanging “cloud strategies”.

2. The SaaS (Software as a Service) vs. on-premises tradeoffs are being reargued, except that proponents now spell SaaS C-L-O-U-D. (Ali Ghodsi of Databricks made a particularly energetic version of that case in a recent meeting.)

3. In most countries (at least in the US and the rest of the West), the cloud vendors deemed to matter are Amazon, followed by Microsoft, followed by Google. And so, when it comes to the public cloud, Microsoft is much, much more enterprise-savvy than its key competitors.

Categories: Amazon and its cloud, Cloud computing, Databricks, Spark and BDAS, Google, Microsoft and SQL*Server, Storage

1 Comment

August 10, 2017

Notes on data security

1. In June I wrote about burgeoning interest in data security. I’d now like to add:

Even more than I previously thought, demand seems to be driven largely by issues of regulatory compliance.
In an exception to that general rule, many enterprise have vague mandates for data encryption.
In awkward contradiction to that general rule, there’s a general sense that it’s just security’s “turn” to be a differentiating feature, since various other “enterprise” needs are already being well-addressed.

We can reconcile these anecdata pretty well if we postulate that:

Enterprises generally agree that data security is an important need.
Exactly how they meet this need depends upon what regulators choose to require.

2. My current impressions of the legal privacy vs. surveillance tradeoffs are basically: Read more

Categories: Data warehousing, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Hadoop, Surveillance and privacy

Analytics on the edge?

There’s a theory going around to the effect that:

Compute power is and will be everywhere, for example in cars, robots, medical devices or microwave ovens. Let’s refer to these platforms collectively as “real-world appliances”.
Much more data will be created on these platforms than can reasonably be sent back to centralized/cloudy servers.
Therefore, cloud-centric architectures will soon be obsolete, perhaps before they’re ever dominant in the first place.

There’s enough truth to all that to make it worth discussing. But the strong forms of the claims seem overblown.

1. This story doesn’t even make sense except for certain new classes of application. Traditional business applications run all over the world, in dedicated or SaaSy modes as the case may be. E-commerce is huge. So is content delivery. Architectures for all those things will continue to evolve, but what we have now basically works.

2. When it comes to real-world appliances, this story is partially accurate. An automobile is a rolling network of custom Linux systems, each running hand-crafted real-time apps, a few of which also have minor requirements for remote connectivity. That’s OK as far as it goes, but there could be better support for real-time operational analytics. If something as flexible as Spark were capable of unattended operation, I think many engineers of real-world appliances would find great ways to use it.

3. There’s a case to be made for something better yet. I think the argument is premature, but it’s worth at least a little consideration. Read more

Categories: Business intelligence, Cloud computing, Data warehousing, Database diversity, Databricks, Spark and BDAS, Log analysis, NoSQL, Predictive modeling and advanced analytics, Streaming and complex event processing (CEP)

2 Comments

June 16, 2017

Generally available Kudu

I talked with Cloudera about Kudu in early May. Besides giving me a lot of information about Kudu, Cloudera also helped confirm some trends I’m seeing elsewhere, including:

Security is an ever bigger deal.
There’s a lot of interest in data warehouses (perhaps really data marts) that are updated in human real-time.
- Prospects for that respond well to the actual term “data warehouse”, at least when preceded by some modifier to suggest that it’s modern/low-latency/non-batch or whatever.
- Flash is often — but not yet always — preferred over disk for that kind of use.
- Sometimes these data stores are greenfield. When they’re migrations, they come more commonly from analytic RDBMS or data warehouse appliance (the most commonly mentioned ones are Teradata, Netezza and Vertica, but that’s perhaps just due to those product lines’ market share), rather than from general purpose DBMS such as Oracle or SQL Server.
Intel is making it ever easier to vectorize CPU operations, and analytic data managers are increasingly taking advantage of this possibility.

Now let’s talk about Kudu itself. As I discussed at length in September 2015, Kudu is:

A data storage system introduced by Cloudera (and subsequently open-sourced).
Columnar.
Updatable in human real-time.
Meant to serve as the data storage tier for Impala and Spark.

Kudu’s adoption and roll-out story starts: Read more

Categories: Cloudera, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Databricks, Spark and BDAS, Hadoop, Market share and customer counts, Netezza, NoSQL, Open source, Solid-state memory, SQL/Hadoop integration, Teradata, Vertica Systems

1 Comment

June 14, 2017

The data security mess

A large fraction of my briefings this year have included a focus on data security. This is the first year in the past 35 that that’s been true.* I believe that reasons for this trend include:

Security is an important aspect of being “enterprise-grade”. Other important checkboxes have been largely filled in. Now it’s security’s turn.
A major platform shift, namely to the cloud, is underway or at least being planned for. Security is an important thing to think about as that happens.
The cloud even aside, technology trends have created new ways to lose data, which security technology needs to address.
Traditionally paranoid industries are still paranoid.
Other industries are newly (and rightfully) terrified of exposing customer data.
My clients at Cloudera thought they had a chance to get significant messaging leverage from emphasizing security. So far, it seems that they were correct.

*Not really an exception: I did once make it a project to learn about classic network security, including firewall appliances and so on.

Certain security requirements, desires or features keep coming up. These include (and as in many of my lists, these overlap):

Easy, comprehensive access control. More on this below.
Encryption. If other forms of security were perfect, encryption would never be needed. But they’re not.
Auditing. Ideally, auditing can alert you to trouble before (much) damage is done. If not, then it can at least help you do proactive damage control in the face of breach.
Whatever regulators mandate.
Whatever is generally regarded as best practices. Security “best practices” generally keep enterprises out of legal and regulatory trouble, or at least minimize same. They also keep employees out of legal and career trouble, or minimize same. Hopefully, they even keep data safe.
Whatever the government is known to use. This is a common proxy for “best practices”.

More specific or extreme requirements include: Read more

Categories: Business intelligence, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, QlikTech and QlikView, Tableau Software

4 Comments

June 14, 2017

Light-touch managed services

Cloudera recently introduced Cloudera Altus, a Hadoop-in-the-cloud offering with an interesting processing model:

Altus manages jobs for you.
But you actually run them on your own cluster, and so you never have to put your data under Altus’ control.

Thus, you avoid a potential security risk (shipping your data to Cloudera’s service). I’ve tentatively named this strategy light-touch managed services, and am interested in exploring how broadly applicable it might or might not be.

For light-touch to be a good approach, there should be (sufficiently) little downside in performance, reliability and so on from having your service not actually control the data. That assumption is trivially satisfied in the case of Cloudera Altus, because it’s not an ordinary kind of app; rather, its whole function is to improve the job-running part of your stack. Most kinds of apps, however, want to operate on your data directly. For those, it is more challenging to meet acceptable SLAs (Service-Level Agreements) on a light-touch basis.

Let’s back up and consider what “light-touch” for data-interacting apps (i.e., almost all apps) would actually mean. The basics are: Read more

Categories: Cloud computing, Cloudera, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, Software as a Service (SaaS), Surveillance and privacy

3 Comments

June 14, 2017

Cloudera Altus

I talked with Cloudera before the recent release of Altus. In simplest terms, Cloudera’s cloud strategy aspires to:

Provide all the important advantages of on-premises Cloudera.
Provide all the important advantages of native cloud offerings such as Amazon EMR (Elastic MapReduce, or at least come sufficiently close to that goal.
Benefit from customers’ desire to have on-premises and cloud deployments that work:
- Alike in any case.
- Together, to the extent that that makes use-case sense.

In other words, Cloudera is porting its software to an important new platform.* And this port isn’t complete yet, in that Altus is geared only for certain workloads. Specifically, Altus is focused on “data pipelines”, aka data transformation, aka “data processing”, aka new-age ETL (Extract/Transform/Load). (Other kinds of workload are on the roadmap, including several different styles of Impala use.) So what about that is particularly interesting? Well, let’s drill down.

*Or, if you prefer, improving on early versions of the port.

Categories: Amazon and its cloud, Cloud computing, Cloudera, Databricks, Spark and BDAS, Hadoop, Log analysis, MapReduce, Software as a Service (SaaS)

2 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

The technology industry is under broad political attack

Notes on artificial intelligence, December 2017

Imanis Data

More notes on the transition to the cloud

Notes on data security

Analytics on the edge?

Generally available Kudu

The data security mess

Light-touch managed services

Cloudera Altus

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin