May 14, 2011

Alternatives for Hadoop/MapReduce data storage and management

There’s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and Bottleneck Whack-A-Mole remaining in its future.

Known HDFS and Hadoop data storage and management issues include but are not limited to:

Different entities have different ideas about how such deficiencies should be addressed.  Read more

May 13, 2011

Introduction to SnapLogic

I talked with the SnapLogic team last week, in connection with their SnapReduce Hadoop-oriented offering. This gave me an opportunity to catch up on what SnapLogic is up to overall. SnapLogic is a data integration/ETL (Extract/Transform/Load) company with a good pedigree: Informatica founder Gaurav Dillon invested in and now runs SnapLogic, and VC Ben Horowitz is involved. SnapLogic company basics include:

SnapLogic’s core/hub product is called SnapCenter. In addition, for any particular kind of data one might want to connect, there are “snaps” which connect to — i.e. snap into — SnapCenter.

SnapLogic’s market position(ing) sounds like Cast Iron’s, by which I mean: Read more

May 12, 2011

Data integration vendors and Hadoop

There have been many recent announcements about how data integration/ETL (Extract/Transform/Load) vendors are going to work with MapReduce.  Most of what they say boils down to one or more of a few things:

Some additional twists include:

Finally, my former clients at Pervasive, who haven’t briefed me for a while, seem to have told Doug Henschen that they have pointed DataRush at MapReduce.* However, I couldn’t find evidence of same on the Pervasive DataRush website beyond some help in using all the cores on any one Hadoop node.

*Also see that article because it names a bunch of ETL vendors doing Hadoop-related things.

May 6, 2011

Elastra sinks into the dead pool

Elastra is an ex-company. I’m not surprised, except by the fact that it took so long.

May 6, 2011

DB2 OLTP scale-out: pureScale

Tim Vincent of IBM talked me through DB2 pureScale Monday. IBM DB2 pureScale is a kind of shared-disk scale-out parallel OTLP DBMS, with some interesting twists. IBM’s scalability claims for pureScale, on a 90% read/10% write workload, include:

More precisely, those are counts of cluster “members,” but the recommended configuration is one member per operating system instance — i.e. one member per machine — for reasons of availability. In an 80% read/20% write workload, scalability is less — perhaps 90% scalability over 16 members.

Several elements are of IBM’s DB2 pureScale architecture are pretty straightforward:

Something called GPFS (Global Parallel File System), which comes bundled with DB2, sits underneath all this. It’s all based on the mainframe technology IBM Parallel Sysplex.

The weirdest part (to me) of DB2 pureScale is something called the Global Cluster Facility, which runs on its own set of boxes. (Edit: Actually, see Tim Vincent’s comment below.) Read more

May 4, 2011

IBM InfoSphere Warehouse pricing, packaging, compression and more

IBM InfoSphere Warehouse 9.7.3 has been announced, and is planned for general availability late this month. IBM InfoSphere Warehouse is, in essence, DB2-plus, where the “plus” comprises:

The main news in this release of InfoSphere Warehouse is probably pricing. While IBM has long had a funky server-power-based pricing scheme, it is now adding per-terabyte pricing, with a twist: IBM InfoSphere Warehouse now can be bought per terabyte of compressed user data. Specifically:

Per-terabyte pricing is generally a good way to think about analytic DBMS costs, for at least two reasons: Read more

May 3, 2011

Oracle on active-active replication

I am beginning to understand better some of the reasons that Oracle likes to review analyst publications before they go out. Notwithstanding what an Oracle executive told me Friday, I received an email from Irem Radzik of Oracle which said in part:

I am the product marketing director for Oracle GoldenGate product. We have noticed your blog post on Exadata covering a description for Active Data Guard. It refers to ADG being the “preferred way of Active-Active Oracle replication”.

I’d like to request correction on this comment as ADG does not have bidirectional replication capabilities which is required for Active-Active replication. GoldenGate is a complementary product to Active Data Guard with its bidirectional replication capabilities (as well as heterogeneous database support) and it is the preferred solution for Active-Active database replication.

Please note also a correction on product name spelling, notwithstanding that at least one Oracle person read the post before that, requested a different change, but didn’t notice that error.

May 3, 2011

Oracle and IBM workload management

When last night’s Oracle/Exadata post got too long — and before I knew Oracle would request a different section be cut — I set aside my comments on Oracle’s workload management story to post separately. Elements of Oracle’s workload management story include:

*Recall that “degrees of parallelism” in Oracle Parallel Query can now be set automagically.

One reason I split out this discussion of workload management is that I also talked with IBM’s Tim Vincent yesterday, who added some insight to what I already wrote last August about DB2/InfoSphere Warehouse workload management. Specifically:

May 3, 2011

Oracle and Exadata: Business and technical notes

Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included:  Read more

April 21, 2011

Application areas for SAS HPA

When I talked with SAS about its forthcoming in-memory parallel SAS HPA offering, we talked briefly about application areas. The three SAS cited were:

Meanwhile, in another interview I heard about, SAS emphasized retailers. Indeed, that’s what spawned my recent post about logistic regression.

The mobile communications one is a bit scary. Your cell phone — and hence your cellular company — know where you are, pretty much from moment to moment. Even without advanced analytic technology applied to it, that’s a pretty direct privacy threat. Throw in some analytics, and your cell company might know, for example, who you hang out with (in person), where you shop, and how those things predict your future behavior. And so the government — or just your employer — might know those things too.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.