May 7, 2010

Notes and cautions about new analytic technology

As previously noted, I headlined Aster’s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I’d presented before. I promised the audience that when I got back I’d put up a blog post linking to supporting material for the talk.

Part of the time, I talked about things I’ve written about before. For example: Read more

Categories: Aster Data, Business intelligence, Data warehousing, Predictive modeling and advanced analytics, Presentations

3 Comments

May 7, 2010

Clarifying the state of MPP in-database SAS

I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.

However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more

Categories: Aster Data, Data warehouse appliances, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Specific users, Teradata

11 Comments

May 4, 2010

Clustrix may be doing something interesting

Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:

Clustrix is making OLTP DBMS.
The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
Clustrix also talks of things that sound like consistent hashing.
The brand name “Sierra” also shows up along with the brand name “Clustrix.”

Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture

2 Comments

May 4, 2010

Truviso evidently reinvents itself

When Aleri bought Coral8 last year, I wrote that the independent CEP (Complex Event Processing) vendors were floundering. Aleri quickly threw in the towel and sold out to Sybase, which hardly changed my opinion. StreamBase actually is persevering, but not with any kind of breakout success. Big vendors, such as Microsoft and IBM, have at least some aspirations of eventually filling the gap.

Meanwhile, Truviso — which never got much market traction in the first place — was in hiding; Roman Bukary never did keep his promise to brief me on the company’s new and improved strategy. Then Truviso had yet another management change, amidst rumors that it was repositioning away from CEP. As per a press release Truviso emailed today, that’s now official, with Truviso’s main business being something to do with web analytics.

Edit: It seems Truviso was at some point absorbed into Cisco.

Categories: Streaming and complex event processing (CEP), Truviso, Web analytics

8 Comments

May 4, 2010

Revolution Analytics seems very confused

Revolution Analytics is a relaunch of a company previously known as REvolution Computing, built around the open source R language. Last week they sent around email claiming they were a new company (false), and asking for briefings in connection with an embargo this morning. I talked to Revolution Analytics yesterday, and they told me the embargo had been moved to Thursday.* However, Revolution apparently neglected to tell the press the same thing, and there’s an article out today — quoting me, because I’d given quotes in line with the original embargo, before I’d had the briefing myself. And what’s all this botched timing about? Mainly, it seems to be for a “statement of direction” about software Revolution Analytics hasn’t actually developed yet.

*More precisely, they spoke as if the embargo had been Thursday all along.

Categories: Investment research and trading, Parallelization, Predictive modeling and advanced analytics, Revolution Analytics, SAS Institute

13 Comments

May 3, 2010

IBM puts Cast Iron Systems out of its misery

Long ago, the first enterprise application integration (EAI) vendors offered pairwise integrations between different specific packaged applications. That was, for example what was going on at Katrina Garnett’s Crossworlds/Crossroads, which eventually became one of IBM’s first data integration software acquisitions. Years later, Cast Iron Systems tried what seemed to be pretty much the same thing, only better implemented. Recently, however, Cast Iron has been pretty hard to get a hold of, and I also couldn’t find anybody (competitor, friend of management, whatever) who believed Cast Iron was doing particularly well. So today’s news that IBM is acquiring Cast Iron Systems comes as no big surprise.

Categories: Cast Iron Systems, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, IBM and DB2

1 Comment

May 2, 2010

Daniel Abadi on NoSQL design tradeoffs

In a thought-provoking post, Daniel Abadi points out NoSQL-related terminological problems similar to the ones I just railed against, and argues

To me, CAP should really be PACELC — if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?

and goes on to say

For example, Amazon’s Dynamo (and related systems like Cassandra and SimpleDB) are PA/EL in PACELC — upon a partition, they give up consistency for availability; and under normal operation they give up consistency for lower latency. Giving up C in both parts of PACELC makes the design simpler — once the application is configured to be able to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency.

However, I think Daniel’s improved formulation is still misleading, in at least two ways:

Daniel implicitly assumes any given NoSQL system makes a fixed set of tradeoffs, when actually — as he in fact notes in his post — some of them offer tradeoffs that are quite tunable.
I think Daniel is at best oversimplifying when he appears to assert that best-case network latency is an important design criterion for all that many NoSQL systems. Naively, anything that acknowledges reads or writes requires two hops. Two-phase commit (2PC) requires three hops. 33% latency reductions are not the kinds of goals that drive dramatic DBMS redesigns, even though tenths of seconds — i.e. 100s of milliseconds — matter in the kinds of environments where NoSQL is sprouting up.

Categories: Amazon and its cloud, Cassandra, NoSQL, Theory and architecture

2 Comments

May 1, 2010

Read-your-writes (RYW), aka immediate, consistency

In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins.

Discussions of NoSQL design philosophies tend to quickly focus in on the matter of consistency. “Consistency”, however, turns out to be a rather overloaded concept, and confusion often ensues.

In this post I plan to address one essential subject, while ducking various related ones as hard as I can. It’s what Werner Vogel of Amazon called read-your-writes consistency (a term to which I was actually introduced by Justin Sheehy of Basho). It’s either identical or very similar to what is sometimes called immediate consistency, and presumably also to what Amazon has recently called the “read my last write” capability of SimpleDB.

This is something every database-savvy person should know about, but most so far still don’t. I didn’t myself until a few weeks ago.

Considering the many different kinds of consistency outlined in the Werner Vogel link above or in the Wikipedia consistency models article — whose names may not always be used in, er, a wholly consistent manner — I don’t think there’s much benefit to renaming read-your-writes consistency yet again. Rather, let’s just call it RYW consistency, come up with a way to pronounce “RYW”, and have done with it. (I suggest “ree-ooh”, which evokes two syllables from the original phrase. Thoughts?)

Definition: RYW (Read-Your-Writes) consistency is achieved when the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.

Categories: Amazon and its cloud, NoSQL, OLTP, Parallelization, Theory and architecture

26 Comments

April 29, 2010

Vertica update

Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:

Vertica plans to build a massive, world-conquering sales force.
If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
“Triple-digit” revenue growth is expected for this year.

1 Comment

April 27, 2010

Gear6 seems to have failed in the memcached market too

As previously noted, I’ve briefly cut back on blogging (and research) due to some family health issues. The first casualty was a post about memcached. One of the two companies to be featured were my new clients at Northscale. The other was Gear6. What they had in common was:

Both Northscale and Gear6 offered distributions of memcached.
Both Northscale and Gear6 also wanted to sell persistent versions of memcached — in essence, simple DBMS with the memcached API in place of a substantial DML (Data Manipulation Language).

Categories: Clustering, Couchbase, memcached, NoSQL

1 Comment

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes and cautions about new analytic technology

Clarifying the state of MPP in-database SAS

Clustrix may be doing something interesting

Truviso evidently reinvents itself

Revolution Analytics seems very confused

IBM puts Cast Iron Systems out of its misery

Daniel Abadi on NoSQL design tradeoffs

Read-your-writes (RYW), aka immediate, consistency

Vertica update

Gear6 seems to have failed in the memcached market too

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin