Analytic technologies

Discussion of technologies related to information query and analysis. Related subjects include:

Business intelligence
Data warehousing
(in Text Technologies) Text mining
(in The Monash Report) Data mining
(in The Monash Report) General issues in analytic technology

August 16, 2010

Vertica’s innovative architecture for flash, plus more about temp space than you perhaps wanted to know

Vertica is announcing:

Technology it already has released*, but has not published any reference architectures for.
A Barney partnership.**

In other words, Vertica has succumbed to the common delusion that it’s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting an interesting technical story, about how the analytic DBMS industry can exploit solid-state memory technology.

*Upgrades to Vertica FlexStore to handle flash memory, actually released as part of Vertica 4.0

** With Fusion I/O

To set the context, let’s recall a few points I’ve noted in the past:

Solid-state memory’s price/throughput tradeoffs obviously make it the future of database storage.
The flash future is coming soon, in part because flash’s propensity to wear out is overstated. This is especially true in the case of modern analytic DBMS, which tend to write to blocks all at once, and most particularly the case for append-only systems such as Vertica.
Being able to intelligently split databases among various cost tiers of storage – e.g. flash and disk – makes a whole lot of sense.

Taken together, those points tell us:

For optimal price/performance, analytic DBMS should support databases that run part on flash, part on disk.

While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What’s more, three aspects of Vertica’s architecture make it particularly well-suited for hybrid flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-flash for a relatively low actual investment in flash chips: Read more

Categories: Columnar database management, Data warehousing, Database compression, Solid-state memory, Vertica Systems

10 Comments

August 12, 2010

Teradata’s future product strategy

I think Teradata’s future product strategy is coming into focus. I’ll start by outlining some particular aspects, and then show how I think it all ties together.
Read more

Categories: Business intelligence, Data warehouse appliances, Data warehousing, Kickfire, MicroStrategy, Solid-state memory, Storage, Teradata

5 Comments

August 11, 2010

Big Data is Watching You!

There’s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

People
Financial trades
Electronic networks
Everything else

The most varied, interesting, and valuable of those four categories is the first one.

Categories: Aster Data, Data warehousing, Investment research and trading, Log analysis, MapReduce, Predictive modeling and advanced analytics, RDF and graphs, Specific users, Surveillance and privacy, Telecommunications, Web analytics

6 Comments

August 9, 2010

Links and observations

I’m back from a trip to the SF Bay area, with a lot of writing ahead of me. I’ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip: Read more

Categories: Analytic technologies, Aster Data, Calpont, Cassandra, Couchbase, Data warehouse appliances, Data warehousing, EMC, Exadata, Facebook, Greenplum, HP and Neoview, Kickfire, NoSQL, OLTP, ParAccel, Sybase, XtremeData

1 Comment

August 9, 2010

Notes on EMC’s Greenplum subsidiary

I spent considerable time last week with my clients at both Greenplum and EMC (if we ignore the fact that the deal has closed and they’re now the same company). I also had more of a hardcore engineering discussion than I’ve had with Greenplum for quite a while (I should have been pushier about that earlier). Takeaways included:

This is starting off as a honeymoon deal. Everything Greenplum was planning to do is being continued. Additional resources are being poured into Greenplum to do more.
Some Greenplum execs seem to envision staying long term, some seem to envision moving on to their next startups. The ones who envision moving on are, however, going to work hard first to make the merger a success.
Greenplum has, for quite a while, had more of an advanced analytics/embedded predictive modeling story than I realized. Bad on them for not fleshing it out more in marketing and product packaging alike.
Greenplum both denies the concurrency problems I previously noted and also has a very credible story as to how it will eliminate them. 🙂 Seriously, Greenplum tells of one customer that routinely runs 150 simultaneous queries – on what I think is not a terribly big system — and a number of POCs (Proofs of Concept) that simulated similar levels of concurrency.

Categories: Analytic technologies, Data warehousing, EMC, Greenplum

1 Comment

July 31, 2010

Teradata, Xkoto Gridscale (RIP), and active-active clustering

Having gotten a number of questions about Teradata’s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:

Teradata is discontinuing Xkoto’s existing product Gridscale, which Scott characterized as being too OLTP-focused to be a good fit for Teradata. Teradata hopes and expects that existing Xkoto Gridscale customers won’t renew maintenance. (I’m not sure that they’ll even get the option to do so.)
The point of Teradata’s technology + engineers acquisition of Xkoto is to enhance Teradata’s active-active or multi-active data warehousing capabilities, which it has had in some form for several years.
In particular, Teradata wants to tie together different products in the Teradata product line. (Note: Those typically all run pretty much the same Teradata database management software, except insofar as they might be on different releases.)
Scott rattled off all the plausible areas of enhancement, with multiple phrasings – performance, manageability, ease of use, tools, features, etc.
Teradata plans to have one or two releases based on Xkoto technology in 2011.

Frankly, I’m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or Continuent’s pre-Tungsten products, but if the DBMS vendors meet the same needs themselves, that’s OK too.

The logic behind active-active database implementations actually seems pretty compelling: Read more

Categories: Clustering, Continuent, Data warehousing, Solid-state memory, Teradata, Theory and architecture, Xkoto

9 Comments

July 30, 2010

Advice for some non-clients

Edit: Any further anonymous comments to this post will be deleted. Signed comments are permitted as always.

Most of what I get paid for is in some form or other consulting. (The same would be true for many other analysts.) And so I can be a bit stingy with my advice toward non-clients. But my non-clients are a distinguished and powerful group, including in their number Oracle, IBM, Microsoft, and most of the BI vendors. So here’s a bit of advice for them too.

Oracle. On the plus side, you guys have been making progress against your reputation for untruthfulness. Oh, I’ve dinged you for some past slip-ups, but on the whole they’ve been no worse than other vendors.’ But recently you pulled a doozy. The analyst reports section of your website fails to distinguish between unsponsored and sponsored work.* That is a horrible ethical stumble. Fix it fast. Then put processes in place to ensure nothing that dishonest happens again for a good long time.

*Merv Adrian’s “report” listed high on that page is actually a sponsored white paper. That Merv himself screwed up by not labeling it clearly as such in no way exonerates Oracle. Besides, I’m sure Merv won’t soon repeat the error — but for Oracle, this represents a whole pattern of behavior.

Oracle. And while I’m at it, outright dishonesty isn’t your only unnecessary credibility problem. You’re also playing too many games in analyst relations.

HP. Neoview will never succeed. Admit it to yourselves. Go buy something that can. Read more

Categories: Actian and Ingres, Business intelligence, Data warehouse appliances, Data warehousing, Exadata, HP and Neoview, Information Builders, Kalido, MarkLogic, NoSQL, Objectivity and Infinite Graph, Oracle, SenSage, Tableau Software

46 Comments

July 29, 2010

Microstrategy technology notes

Earlier this week, Microstrategy made Mark LaRow available to talk about technology. The proximate reason was my recent mention of Microstrategy’s mobile BI emphasis, but we also touched on Microstrategy’s approach to in-memory business intelligence and some other subjects. We didn’t go into the depth of a similar conversation I had recently with Qlik Technologies, but I found it quite interesting even so.

Highlights of the in-memory BI discussion included:

Microstrategy’s in-memory BI data structure is some kind of simple array, redundantly called a “vector array.” A more precise description was not available.
While early versions of the capability have been around since 2002, Microstrategy’s in-memory BI capability only got serious with Microstrategy 9, which was released in Q1 of 2009. In particular, Microstrategy 9 was the first time in-memory BI had full security.
Mark says a core reason for having their own in-memory BI is because Microstrategy has more smarts to predict which aggregates will or won’t be needed. Strictly speaking, that can’t be argued with. Vendors like Infobright would argue they come close enough to that ideal as to make little practical difference – but I’m also cheating by naming Infobright, which is particularly focused in that direction.
Microstrategy in-memory BI compresses data by about 2X. Mark didn’t know which compression algorithm was used.
The limitation on what’s in-memory is, of course, how much RAM you can fit on an SMP box. Microstrategy has seen up to ½ terabyte deployments.
In-memory Microstrategy data structures are typically built during the batch window, for performance reasons. This is not, strictly speaking, mandatory, but I didn’t get a sense that Microstrategy was being used for much that resembled real-time business intelligence.
Mark said Microstrategy has no interest in using solid-state memory to expand the reach of its in-memory BI. Frankly, if Microstrategy doesn’t change that stance, it’s in-memory BI capabilities are unlikely to stay significant for too many years.

Another key subject we discussed was Microstrategy’s view of dashboards. Read more

Categories: Business intelligence, Data warehousing, Memory-centric data management, MicroStrategy

How should somebody teach themselves database and programming skills?

From time to time, I get in a conversation with somebody who is:

Unemployed, underemployed, or otherwise desirous of having more commercial skills.
Not a programmer, but desirous of having some technical skills.
Astute enough to realize s/he will never be a serious techie.

I generally have two models in mind when guiding such a person:

Analytics/business intelligence/stats.
Website building.

Those are both useful skill sets for people who aren’t full-time techies, the first perhaps best for those who are more quantitative and big-company-friendly, the second perhaps better for the creative and/or rebellious types.

So what SPECIFICALLY should one guide them to do? My initial thoughts include: Read more

Categories: Business intelligence, MicroStrategy, MySQL, Open source

35 Comments

July 25, 2010

False-positive alerts, non-collaborative BI, inaccurate metrics, and what to do about them

I’ve been hinting at some points for quite a long time, without really spelling them out in written form. So let’s fix that. I believe:

“Push” alerting technology could be much more granular and useful, but is being held back by the problem of false positives.
Metrics passed down from on high didn’t work too well in Stalin’s USSR, and haven’t improved sufficiently since.
A large, necessary piece of the solution to both problems is a great engine for setting and modifying metrics definitions.

I shall explain. Read more

Categories: Analytic technologies, Business intelligence, Data warehousing, MicroStrategy, Theory and architecture

10 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in