Discussion of Facebook’s data management technologies. Related subjects include:

January 8, 2012

Big data terminology and positioning

Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions:

given that

But the conflation should stop there.

*Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.

When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2×2 matrix of possibilities. For want of better alternatives, my suggestions are:

Read more

September 22, 2011

DataStax pivots back to its original strategy

The DataStax and Cassandra stories are somewhat confusing. Unfortunately, DataStax chose to clarify them in what has turned out to be a crazy news week. I’m going to use this post just to report on the status of the DataStax product line, without going into any analysis beyond that.

Read more

September 19, 2011

Are there any remaining reasons to put new OLTP applications on disk?

Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:

So I’m leaning to saying:   Read more

July 18, 2011

HBase is not broken

It turns out that my impression that HBase is broken was unfounded, in at least two ways. The smaller is that something wrong with the HBase/Hadoop interface or Hadoop’s HBase support cannot necessarily be said to be wrong with HBase (especially since HBase is no longer a Hadoop subproject). The bigger reason is that, according to consensus, HBase has worked pretty well since the .90 release in January of this year.

After Michael Stack of StumbleUpon beat me up for a while,* Omer Trajman of Cloudera was kind enough to walk me through HBase usage. He is informed largely by 18 Cloudera customers, plus a handful of other well-known HBase users such as Facebook, StumbleUpon, and Yahoo. Of the 18 Cloudera customers using HBase that Omer was thinking of, 15 are in HBase production, one is in HBase “early production”, one is still doing R&D in the area of HBase, and one is a classified government customer not providing such details. Read more

July 15, 2011

Soundbites: the Facebook/MySQL/NoSQL/VoltDB/Stonebraker flap, continued

As a follow-up to the latest Stonebraker kerfuffle, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include:

Continuing with that discussion of DBMS alternatives:

And while we’re at it — going schema-free often makes a whole lot of sense. I need to write much more about the point, but for now let’s just say that I look favorably on the Big Four schema-free/NoSQL options of MongoDB, Couchbase, HBase, and Cassandra.

July 6, 2011

Petabyte-scale Hadoop clusters (dozens of them)

I recently learned that there are 7 Vertica clusters with a petabyte (or more) each of user data. So I asked around about other petabyte-scale clusters. It turns out that there are several dozen such clusters (at least) running Hadoop.

Cloudera can identify 22 CDH (Cloudera Distribution [of] Hadoop) clusters holding one petabyte or more of user data each, at 16 different organizations. This does not count Facebook or Yahoo, who are huge Hadoop users but not, I gather, running CDH. Meanwhile, Eric Baldeschwieler of Hortonworks tells me that Yahoo’s latest stated figures are:

Read more

June 1, 2011

The essence of an application

Once upon a time, information technology was strictly about — well, information. And by “information” what was meant was “data”.* An application boiled down to a database design, plus a straightforward user interface, in whatever the best UI technology of the day happened to be. Things rarely worked quite as smoothly as the design-database/press-button/generate-UI propaganda would have one believe, but database design was clearly at the center of application invention.

*Not coincidentally, two of the oldest names for “IT” were data processing and management information systems.

Eventually, there came to be three views of the essence of IT:

Graphical user interfaces were a major enabling technology for that evolution. Equally important, relational databases made some difficult problems easy(ier), freeing application designers to pursue more advanced functionality.

Based on further technical evolution, specifically in analytic and consumer technologies, I think we should now take that list up to five. The new members I propose are:

Read more

May 24, 2011

Notes from the Fusion-io S-1 filing

Fusion-io has filed for an initial public offering. With public offerings go S-1 filings which, along with 10-Ks, are the kinds of SEC filing that typically contain a few nuggets of business information. Notes from Fusion-io’s S-1 include:

Fusion-io is growing very, very fast, doubling or better in revenue every 6 months.

Fusion-io’s marketing message revolves around “data centralization”. Fusion-io is competing against storage-area networks and storage arrays.

Fusion-io’s list of application types includes

… systems dedicated to decision support, high performance financial analysis, web search, content delivery and enterprise resource planning.

Fusion-io says it has shipped over 20 petabytes of storage.

Fusion-io has a shifting array of big customers, including OEMs:  Read more

January 11, 2011

The technology of privacy threats

This post is the second of a series. The first one was an overview of privacy dangers, replete with specific examples of kinds of data that are stored for good reasons, but can also be repurposed for more questionable uses. More on this subject may be found in my August, 2010 post Big Data is Watching You!

There are two technology trends driving electronic privacy threats. Taken together, these trends raise scenarios such as the following:

Not all these stories are quite possible today, but they aren’t far off either.

Read more

January 10, 2011

Privacy dangers — an overview

This post is the first of a series. The second one delves into the technology behind the most serious electronic privacy threats.

The privacy discussion has gotten more active, and more complicated as well. A year ago, I still struggled to get people to pay attention to privacy concerns at all, at least in the United States, with my first public breakthrough coming at the end of January. But much has changed since then.

On the commercial side, Facebook modified its privacy policies, garnering great press attention and an intense user backlash, leading to a quick partial retreat. The Wall Street Journal then launched a long series of articles — 13 so far — recounting multiple kinds of privacy threats. Other media joined in, from Forbes to CNet. Various forms of US government rule-making to inhibit advertising-related tracking have been proposed as an apparent result.

In the US, the government had a lively year as well. The Transportation Security Administration (TSA) rolled out what have been dubbed “porn scanners,” and backed them up with “enhanced patdowns.” For somebody who is, for example, female, young, a sex abuse survivor, and/or a follower of certain religions, those can be highly unpleasant, if not traumatic. Meanwhile, the Wikileaks/Cablegate events have spawned a government reaction whose scope is only beginning to be seen. A couple of “highlights” so far are some very nasty laptop seizures, and the recent demand for information on over 600,000 Twitter accounts. (Christopher Soghoian provided a detailed, nuanced legal analysis of same.)

At this point, it’s fair to say there are at least six different kinds of legitimate privacy fear. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.