Notes and cautions about new analytic technology
As previously noted, I headlined Aster’s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I’d presented before. I promised the audience that when I got back I’d put up a blog post linking to supporting material for the talk.
Part of the time, I talked about things I’ve written about before. For example: Read more
Categories: Aster Data, Business intelligence, Data warehousing, Predictive modeling and advanced analytics, Presentations | 3 Comments |
Clarifying the state of MPP in-database SAS
I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.
However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more
Categories: Aster Data, Data warehouse appliances, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Specific users, Teradata | 11 Comments |
Clustrix may be doing something interesting
Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:
- Clustrix is making OLTP DBMS.
- The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
- Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
- A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
- Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
- Clustrix also talks of things that sound like consistent hashing.
- The brand name “Sierra” also shows up along with the brand name “Clustrix.”
Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture | 2 Comments |
Truviso evidently reinvents itself
When Aleri bought Coral8 last year, I wrote that the independent CEP (Complex Event Processing) vendors were floundering. Aleri quickly threw in the towel and sold out to Sybase, which hardly changed my opinion. StreamBase actually is persevering, but not with any kind of breakout success. Big vendors, such as Microsoft and IBM, have at least some aspirations of eventually filling the gap.
Meanwhile, Truviso — which never got much market traction in the first place — was in hiding; Roman Bukary never did keep his promise to brief me on the company’s new and improved strategy. Then Truviso had yet another management change, amidst rumors that it was repositioning away from CEP. As per a press release Truviso emailed today, that’s now official, with Truviso’s main business being something to do with web analytics.
Edit: It seems Truviso was at some point absorbed into Cisco.
Revolution Analytics seems very confused
Revolution Analytics is a relaunch of a company previously known as REvolution Computing, built around the open source R language. Last week they sent around email claiming they were a new company (false), and asking for briefings in connection with an embargo this morning. I talked to Revolution Analytics yesterday, and they told me the embargo had been moved to Thursday.* However, Revolution apparently neglected to tell the press the same thing, and there’s an article out today — quoting me, because I’d given quotes in line with the original embargo, before I’d had the briefing myself. And what’s all this botched timing about? Mainly, it seems to be for a “statement of direction” about software Revolution Analytics hasn’t actually developed yet.
*More precisely, they spoke as if the embargo had been Thursday all along.
Categories: Investment research and trading, Parallelization, Predictive modeling and advanced analytics, Revolution Analytics, SAS Institute | 13 Comments |
IBM puts Cast Iron Systems out of its misery
Long ago, the first enterprise application integration (EAI) vendors offered pairwise integrations between different specific packaged applications. That was, for example what was going on at Katrina Garnett’s Crossworlds/Crossroads, which eventually became one of IBM’s first data integration software acquisitions. Years later, Cast Iron Systems tried what seemed to be pretty much the same thing, only better implemented. Recently, however, Cast Iron has been pretty hard to get a hold of, and I also couldn’t find anybody (competitor, friend of management, whatever) who believed Cast Iron was doing particularly well. So today’s news that IBM is acquiring Cast Iron Systems comes as no big surprise.
Categories: Cast Iron Systems, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, IBM and DB2 | 1 Comment |
Daniel Abadi on NoSQL design tradeoffs
In a thought-provoking post, Daniel Abadi points out NoSQL-related terminological problems similar to the ones I just railed against, and argues
To me, CAP should really be PACELC — if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?
and goes on to say
For example, Amazon’s Dynamo (and related systems like Cassandra and SimpleDB) are PA/EL in PACELC — upon a partition, they give up consistency for availability; and under normal operation they give up consistency for lower latency. Giving up C in both parts of PACELC makes the design simpler — once the application is configured to be able to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency.
However, I think Daniel’s improved formulation is still misleading, in at least two ways:
- Daniel implicitly assumes any given NoSQL system makes a fixed set of tradeoffs, when actually — as he in fact notes in his post — some of them offer tradeoffs that are quite tunable.
- I think Daniel is at best oversimplifying when he appears to assert that best-case network latency is an important design criterion for all that many NoSQL systems. Naively, anything that acknowledges reads or writes requires two hops. Two-phase commit (2PC) requires three hops. 33% latency reductions are not the kinds of goals that drive dramatic DBMS redesigns, even though tenths of seconds — i.e. 100s of milliseconds — matter in the kinds of environments where NoSQL is sprouting up.
Categories: Amazon and its cloud, Cassandra, NoSQL, Theory and architecture | 2 Comments |
Read-your-writes (RYW), aka immediate, consistency
In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins.
Discussions of NoSQL design philosophies tend to quickly focus in on the matter of consistency. “Consistency”, however, turns out to be a rather overloaded concept, and confusion often ensues.
In this post I plan to address one essential subject, while ducking various related ones as hard as I can. It’s what Werner Vogel of Amazon called read-your-writes consistency (a term to which I was actually introduced by Justin Sheehy of Basho). It’s either identical or very similar to what is sometimes called immediate consistency, and presumably also to what Amazon has recently called the “read my last write” capability of SimpleDB.
This is something every database-savvy person should know about, but most so far still don’t. I didn’t myself until a few weeks ago.
Considering the many different kinds of consistency outlined in the Werner Vogel link above or in the Wikipedia consistency models article — whose names may not always be used in, er, a wholly consistent manner — I don’t think there’s much benefit to renaming read-your-writes consistency yet again. Rather, let’s just call it RYW consistency, come up with a way to pronounce “RYW”, and have done with it. (I suggest “ree-ooh”, which evokes two syllables from the original phrase. Thoughts?)
Definition: RYW (Read-Your-Writes) consistency is achieved when the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.
Categories: Amazon and its cloud, NoSQL, OLTP, Parallelization, Theory and architecture | 26 Comments |
Vertica update
Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:
- Vertica plans to build a massive, world-conquering sales force.
- If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
- “Triple-digit” revenue growth is expected for this year.
Gear6 seems to have failed in the memcached market too
As previously noted, I’ve briefly cut back on blogging (and research) due to some family health issues. The first casualty was a post about memcached. One of the two companies to be featured were my new clients at Northscale. The other was Gear6. What they had in common was:
- Both Northscale and Gear6 offered distributions of memcached.
- Both Northscale and Gear6 also wanted to sell persistent versions of memcached — in essence, simple DBMS with the memcached API in place of a substantial DML (Data Manipulation Language).
Categories: Clustering, Couchbase, memcached, NoSQL | 1 Comment |