The Clustrix story
After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included: Read more
| Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory | 8 Comments |
8 not very technical problems with analytic technology
In a couple of talks, including last Thursday’s, I’ve rattled off a list of eight serious problems with analytic technology, all of them human or organizational much more than purely technical. At best, these problems stand in the way of analytic success, and at least one is a lot worse than that.
The bulleted list in my notes is:
-
Individual-human
-
Expense of expertise
-
Limited numeracy
-
-
Organizational
-
Limited budgets
-
Legacy systems
-
General inertia
-
-
Political
-
Obsolete systems
-
Clueless lawmakers
-
Obsolete legal framework
-
I shall explain. Read more
| Categories: Analytic technologies, Business intelligence, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Surveillance and privacy | Leave a Comment |
Revisiting disk vibration as a data warehouse performance problem
Last April, I wrote about the problems disk vibration can cause for data warehouse performance. Possible performance hits exceeded 10X, wild as that sounds.
Now Slashdot and ZDnet have weighed in, although for the most part they only are suggesting 50-100% performance hits. Read more
| Categories: Data warehousing, Storage | 3 Comments |
Notes and cautions about new analytic technology
As previously noted, I headlined Aster’s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I’d presented before. I promised the audience that when I got back I’d put up a blog post linking to supporting material for the talk.
Part of the time, I talked about things I’ve written about before. For example: Read more
| Categories: Aster Data, Business intelligence, Data warehousing, Predictive modeling and advanced analytics, Presentations | 3 Comments |
Clarifying the state of MPP in-database SAS
I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.
However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more
| Categories: Aster Data, Data warehouse appliances, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Specific users, Teradata | 11 Comments |
Clustrix may be doing something interesting
Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:
- Clustrix is making OLTP DBMS.
- The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
- Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
- A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
- Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
- Clustrix also talks of things that sound like consistent hashing.
- The brand name “Sierra” also shows up along with the brand name “Clustrix.”
| Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture | 2 Comments |
Truviso evidently reinvents itself
When Aleri bought Coral8 last year, I wrote that the independent CEP (Complex Event Processing) vendors were floundering. Aleri quickly threw in the towel and sold out to Sybase, which hardly changed my opinion. StreamBase actually is persevering, but not with any kind of breakout success. Big vendors, such as Microsoft and IBM, have at least some aspirations of eventually filling the gap.
Meanwhile, Truviso — which never got much market traction in the first place — was in hiding; Roman Bukary never did keep his promise to brief me on the company’s new and improved strategy. Then Truviso had yet another management change, amidst rumors that it was repositioning away from CEP. As per a press release Truviso emailed today, that’s now official, with Truviso’s main business being something to do with web analytics.
Edit: It seems Truviso was at some point absorbed into Cisco.
Revolution Analytics seems very confused
Revolution Analytics is a relaunch of a company previously known as REvolution Computing, built around the open source R language. Last week they sent around email claiming they were a new company (false), and asking for briefings in connection with an embargo this morning. I talked to Revolution Analytics yesterday, and they told me the embargo had been moved to Thursday.* However, Revolution apparently neglected to tell the press the same thing, and there’s an article out today — quoting me, because I’d given quotes in line with the original embargo, before I’d had the briefing myself. And what’s all this botched timing about? Mainly, it seems to be for a “statement of direction” about software Revolution Analytics hasn’t actually developed yet.
*More precisely, they spoke as if the embargo had been Thursday all along.
| Categories: Investment research and trading, Parallelization, Predictive modeling and advanced analytics, Revolution Analytics, SAS Institute | 13 Comments |
IBM puts Cast Iron Systems out of its misery
Long ago, the first enterprise application integration (EAI) vendors offered pairwise integrations between different specific packaged applications. That was, for example what was going on at Katrina Garnett’s Crossworlds/Crossroads, which eventually became one of IBM’s first data integration software acquisitions. Years later, Cast Iron Systems tried what seemed to be pretty much the same thing, only better implemented. Recently, however, Cast Iron has been pretty hard to get a hold of, and I also couldn’t find anybody (competitor, friend of management, whatever) who believed Cast Iron was doing particularly well. So today’s news that IBM is acquiring Cast Iron Systems comes as no big surprise.
| Categories: Cast Iron Systems, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, IBM and DB2 | 1 Comment |
Daniel Abadi on NoSQL design tradeoffs
In a thought-provoking post, Daniel Abadi points out NoSQL-related terminological problems similar to the ones I just railed against, and argues
To me, CAP should really be PACELC — if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?
and goes on to say
For example, Amazon’s Dynamo (and related systems like Cassandra and SimpleDB) are PA/EL in PACELC — upon a partition, they give up consistency for availability; and under normal operation they give up consistency for lower latency. Giving up C in both parts of PACELC makes the design simpler — once the application is configured to be able to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency.
However, I think Daniel’s improved formulation is still misleading, in at least two ways:
- Daniel implicitly assumes any given NoSQL system makes a fixed set of tradeoffs, when actually — as he in fact notes in his post — some of them offer tradeoffs that are quite tunable.
- I think Daniel is at best oversimplifying when he appears to assert that best-case network latency is an important design criterion for all that many NoSQL systems. Naively, anything that acknowledges reads or writes requires two hops. Two-phase commit (2PC) requires three hops. 33% latency reductions are not the kinds of goals that drive dramatic DBMS redesigns, even though tenths of seconds — i.e. 100s of milliseconds — matter in the kinds of environments where NoSQL is sprouting up.
| Categories: Amazon and its cloud, Cassandra, NoSQL, Theory and architecture | 2 Comments |
