Personally selected annual lists of 12 “Most influential” companies and 36 “Companies to watch” in analytics- and database-related sectors.
Made it clear that these are his personal selections.
Nonetheless has called it an Editors’ Choice list, rather than Editor’s Choice. 🙂

(Actually, he’s really called it an “award.”)

Categories: Actian and Ingres, Analytic technologies, Aster Data, Business intelligence, Cloudera, Data warehousing, Greenplum, HP and Neoview, IBM and DB2, Infobright, Intersystems and Cache', Jaspersoft, Kalido, MarkLogic, Microsoft and SQL*Server, Netezza, Open source, Oracle, Pentaho, QlikTech and QlikView, SAP AG, Tableau Software, Talend, Teradata, Vertica Systems

2 Comments

December 30, 2009

Clearing up MapReduce confusion, yet again

I’m frustrated by a constant need — or at least urge 🙂 — to correct myths and errors about MapReduce. Let’s try one more time: Read more

Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Google, Hadoop, MapReduce, SenSage, Splunk

8 Comments

October 18, 2009

Three big myths about MapReduce

Once again, I find myself writing and talking a lot about MapReduce. But I suspect that MapReduce-related conversations would go better if we overcame three fairly common MapReduce myths:

MapReduce is something very new
MapReduce involves strict adherence to the Map-Reduce programming paradigm
MapReduce is a single technology

Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Google, Greenplum, Hadoop, Log analysis, MapReduce, Michael Stonebraker, Parallelization, Web analytics

11 Comments

October 10, 2009

How 30+ enterprises are using Hadoop

MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera’s files. Facts and metrics ranged widely, of course:

Some are in heavy production with Hadoop, and closely engaged with Cloudera. Others are active Hadoop users but are very secretive. Yet others signed up for initial Hadoop training last week.
Some have Hadoop clusters in the thousands of nodes. Many have Hadoop clusters in the 50-100 node range. Others are just prototyping Hadoop use. And one seems to be “OEMing” a small Hadoop cluster in each piece of equipment sold.
Many export data from Hadoop to a relational DBMS; many others just leave it in HDFS (Hadoop Distributed File System), e.g. with Hive as the query language, or in exactly one case Jaql.
Some are household names, in web businesses or otherwise. Others seem to be pretty obscure.
Industries include financial services, telecom (Asia only, and quite new), bioinformatics (and other research), intelligence, and lots of web and/or advertising/media.
Application areas mentioned — and these overlap in some cases — include:
- Log and/or clickstream analysis of various kinds
- Marketing analytics
- Machine learning and/or sophisticated data mining
- Image processing
- Processing of XML messages
- Web crawling and/or text processing
- General archiving, including of relational/tabular data, e.g. for compliance

Categories: Application areas, Aster Data, Cloudera, Data types, Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Hadoop, Investment research and trading, Log analysis, MapReduce, Open source, Parallelization, Predictive modeling and advanced analytics, Scientific research, Structured documents, Telecommunications, Text, Vertica Systems, Web analytics

9 Comments

October 1, 2009

MapReduce tidbits

I’ve never had children, and so have never had to supervise squabbling siblings, each accusing the other of selfishness and insufficient sharing. Perhaps the MapReduce vendors are a form of karmic payback. Be that as it may, my client Cloudera has organized Hadoop World on October 2 in New York, and my other client Aster Data is hosting a MapReduce-centric Big Data Summit the night before, at the same venue. Even if you don’t go, both conference’s agenda pages offer a peek into what’s going on in MapReduce applications. I’m not going either, but even so I hope to post an overview of MapReduce uses after the conferences serve to publicize some of them.

Even better, I plan to hold a couple of webinars on MapReduce, the first at 10 am (blech) and 1 pm Eastern time on October 15. They’re sponsored by Aster Data, and so will have a strong SQL/MapReduce orientation.

In connection with its conference, Aster is introducing an nCluster-Hadoop connector — i.e., a loader from HDFS (Hadoop Distributed File System) implemented in SQL/MapReduce. In particular: Read more

Categories: Aster Data, Cloudera, Data warehousing, Hadoop, MapReduce

7 Comments

August 4, 2009

Vertica’s version of MapReduce integration

I talked with Omer Trajman of Vertica Monday night about Vertica’s MapReduce integration, part of its Vertica 3.5 release. Highlights included:

By “integrating Vertica and MapReduce,” Vertica means “integrating Vertica and Hadoop.”
Vertica’s Hadoop integration is based on Cloudera’s DBInputFormat.
Omer called out for me several features of Vertica’s Hadoop integration that didn’t just come from Cloudera, namely:
- Cloudera’s DBInputFormat assumes the database runs on a single computer, or a single head node of an MPP system. Vertica’s technology, however, runs on peer parallel nodes with no head, and so Vertica adapted the DBInputFormat technology accordingly.
- Vertica lets you push down Map functions to the database. Omer reports a roughly even division among users and prospects between those who want to do this and ones who don’t.
- Vertica lets you do Reduce functions (or Map functions, if you don’t push them down to the database) on a separate cluster than you run the database software. Vertica asserts that its customers and prospects all want to do this. Right here is the big difference between Vertica’s MapReduce integration and Aster’s or Greenplum’s. (Aster would also say that Vertica’s weaker MapReduce/SQL programming integration is a big difference as well.)
- Indeed, Vertica lets you Reduce into a different DBMS than Vertica, if you choose.
- Vertica gives you flexibility on the size of the Map and Reduce clusters. Omer agreed with me when I said there were some limits on how fast one can add or subtract nodes in a Vertica grid, because there’s data redistribution involved. But one can add/change/delete Hadoop clusters extremely quickly.

Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically: Read more

Categories: Analytic technologies, Cloudera, Columnar database management, Data warehousing, Hadoop, Investment research and trading, MapReduce, Parallelization, Theory and architecture, VectorWise, Vertica Systems, Web analytics

5 Comments

April 15, 2009

Cloudera presents the MapReduce bull case

Monday was fire-drill day regarding MapReduce vs. MPP relational DBMS. The upshot was that I was quoted in Computerworld and paraphrased in GigaOm as being a little more negative on MapReduce than I really am, in line with my comment

Frankly, my views on MapReduce are more balanced than [my] weary negativity would seem to imply.

Tuesday afternoon the dial turned a couple notches more positive yet, when I talked with Michael Olson and Jeff Hammerbacher of Cloudera. Cloudera is a new company, built around the open source MapReduce implementation Hadoop. So far Cloudera gives away its Hadoop distribution, without charging for any sort of maintenance or subscription, and just gets revenue from professional services. Presumably, Cloudera plans for this business model to change down the road.

Much of our discussion revolved around Facebook, where Jeff directed a huge and diverse Hadoop effort. Apparently, Hadoop played much of the role of an enterprise data warehouse at Facebook — at least for clickstream/network data — including:

2 1/2 petabytes of data managed via Hadoop
10 terabytes/day of data ingested via Hadoop (Edit: Some of these metrics have been updated in a subsequent post about Facebook.)
Ad targeting queries run every 15 minutes in Hadoop
Dashboard roll-up queries run every hour in Hadoop
Ad-hoc research/analytic Hadoop queries run whenever
Anti-fraud analysis done in Hadoop
Text mining (e.g., of things written on people’s “walls”) done in Hadoop
100s or 1000s of simultaneous Hadoop queries
JSON-based social network analysis in Hadoop

Some Facebook data, however, was put into an Oracle RAC cluster for business intelligence. And Jeff does concede that query execution is slower in Hadoop than in a relational DBMS. Hadoop was also used to build the index for Facebook’s custom text search engine.

Jeff’s reasons for liking Hadoop over relational DBMS at Facebook included: Read more

Categories: Analytic technologies, Cloudera, Data warehousing, EAI, EII, ETL, ELT, ETLT, Facebook, Hadoop, MapReduce, Petabyte-scale data management, RDF and graphs, Specific users, Web analytics

27 Comments

← Previous Page

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

More on NoSQL and HVSP (or OLRP)

Cloudera Enterprise and Hadoop evolution

Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010