May 12, 2010

The Clustrix story

After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included: Read more

Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory

8 Comments

May 8, 2010

8 not very technical problems with analytic technology

In a couple of talks, including last Thursday’s, I’ve rattled off a list of eight serious problems with analytic technology, all of them human or organizational much more than purely technical. At best, these problems stand in the way of analytic success, and at least one is a lot worse than that.

The bulleted list in my notes is:

Individual-human
- Expense of expertise
- Limited numeracy
Organizational
- Limited budgets
- Legacy systems
- General inertia
Political
- Obsolete systems
- Clueless lawmakers
- Obsolete legal framework

I shall explain. Read more

Categories: Analytic technologies, Business intelligence, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Surveillance and privacy

Revisiting disk vibration as a data warehouse performance problem

Last April, I wrote about the problems disk vibration can cause for data warehouse performance. Possible performance hits exceeded 10X, wild as that sounds.

Now Slashdot and ZDnet have weighed in, although for the most part they only are suggesting 50-100% performance hits. Read more

Categories: Data warehousing, Storage

3 Comments

May 7, 2010

Notes and cautions about new analytic technology

As previously noted, I headlined Aster’s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I’d presented before. I promised the audience that when I got back I’d put up a blog post linking to supporting material for the talk.

Part of the time, I talked about things I’ve written about before. For example: Read more

Categories: Aster Data, Business intelligence, Data warehousing, Predictive modeling and advanced analytics, Presentations

3 Comments

May 7, 2010

Clarifying the state of MPP in-database SAS

I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.

However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more

Categories: Aster Data, Data warehouse appliances, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Specific users, Teradata

11 Comments

May 4, 2010

Clustrix may be doing something interesting

Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:

Clustrix is making OLTP DBMS.
The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
Clustrix also talks of things that sound like consistent hashing.
The brand name “Sierra” also shows up along with the brand name “Clustrix.”

Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture

2 Comments

May 4, 2010

Truviso evidently reinvents itself

When Aleri bought Coral8 last year, I wrote that the independent CEP (Complex Event Processing) vendors were floundering. Aleri quickly threw in the towel and sold out to Sybase, which hardly changed my opinion. StreamBase actually is persevering, but not with any kind of breakout success. Big vendors, such as Microsoft and IBM, have at least some aspirations of eventually filling the gap.

Meanwhile, Truviso — which never got much market traction in the first place — was in hiding; Roman Bukary never did keep his promise to brief me on the company’s new and improved strategy. Then Truviso had yet another management change, amidst rumors that it was repositioning away from CEP. As per a press release Truviso emailed today, that’s now official, with Truviso’s main business being something to do with web analytics.

Edit: It seems Truviso was at some point absorbed into Cisco.

Categories: Streaming and complex event processing (CEP), Truviso, Web analytics

8 Comments

May 4, 2010

Revolution Analytics seems very confused

Revolution Analytics is a relaunch of a company previously known as REvolution Computing, built around the open source R language. Last week they sent around email claiming they were a new company (false), and asking for briefings in connection with an embargo this morning. I talked to Revolution Analytics yesterday, and they told me the embargo had been moved to Thursday.* However, Revolution apparently neglected to tell the press the same thing, and there’s an article out today — quoting me, because I’d given quotes in line with the original embargo, before I’d had the briefing myself. And what’s all this botched timing about? Mainly, it seems to be for a “statement of direction” about software Revolution Analytics hasn’t actually developed yet.

*More precisely, they spoke as if the embargo had been Thursday all along.

Categories: Investment research and trading, Parallelization, Predictive modeling and advanced analytics, Revolution Analytics, SAS Institute

13 Comments

May 3, 2010

IBM puts Cast Iron Systems out of its misery

Long ago, the first enterprise application integration (EAI) vendors offered pairwise integrations between different specific packaged applications. That was, for example what was going on at Katrina Garnett’s Crossworlds/Crossroads, which eventually became one of IBM’s first data integration software acquisitions. Years later, Cast Iron Systems tried what seemed to be pretty much the same thing, only better implemented. Recently, however, Cast Iron has been pretty hard to get a hold of, and I also couldn’t find anybody (competitor, friend of management, whatever) who believed Cast Iron was doing particularly well. So today’s news that IBM is acquiring Cast Iron Systems comes as no big surprise.

Categories: Cast Iron Systems, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, IBM and DB2

1 Comment

May 2, 2010

Daniel Abadi on NoSQL design tradeoffs

In a thought-provoking post, Daniel Abadi points out NoSQL-related terminological problems similar to the ones I just railed against, and argues

To me, CAP should really be PACELC — if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?

and goes on to say

For example, Amazon’s Dynamo (and related systems like Cassandra and SimpleDB) are PA/EL in PACELC — upon a partition, they give up consistency for availability; and under normal operation they give up consistency for lower latency. Giving up C in both parts of PACELC makes the design simpler — once the application is configured to be able to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency.

However, I think Daniel’s improved formulation is still misleading, in at least two ways:

Daniel implicitly assumes any given NoSQL system makes a fixed set of tradeoffs, when actually — as he in fact notes in his post — some of them offer tradeoffs that are quite tunable.
I think Daniel is at best oversimplifying when he appears to assert that best-case network latency is an important design criterion for all that many NoSQL systems. Naively, anything that acknowledges reads or writes requires two hops. Two-phase commit (2PC) requires three hops. 33% latency reductions are not the kinds of goals that drive dramatic DBMS redesigns, even though tenths of seconds — i.e. 100s of milliseconds — matter in the kinds of environments where NoSQL is sprouting up.

Categories: Amazon and its cloud, Cassandra, NoSQL, Theory and architecture

2 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

The Clustrix story

8 not very technical problems with analytic technology

Revisiting disk vibration as a data warehouse performance problem

Notes and cautions about new analytic technology

Clarifying the state of MPP in-database SAS

Clustrix may be doing something interesting

Truviso evidently reinvents itself

Revolution Analytics seems very confused

IBM puts Cast Iron Systems out of its misery

Daniel Abadi on NoSQL design tradeoffs

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin