Analytic technologies

Discussion of technologies related to information query and analysis. Related subjects include:

Business intelligence
Data warehousing
(in Text Technologies) Text mining
(in The Monash Report) Data mining
(in The Monash Report) General issues in analytic technology

December 27, 2009

Introduction to Gooddata

Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don’t know how many people’s lives she significantly affected – I’d guess it’s actually quite a few – but in any case the number is not zero. Roman Stanek, who has built and sold a couple of software business, cites her as a key influence setting him on his path.

Roman’s latest venture is business intelligence firm Gooddata. Gooddata was founded in 2007 and has been soliciting and getting attention for a while, so I was surprised to learn that Gooddata officially launched just a few weeks ago. Anyhow, some less technical highlights of the Gooddata story include: Read more

Categories: Amazon and its cloud, Analytic technologies, Business intelligence, Cloud computing, Games and virtual worlds, Gooddata, Jaspersoft, Market share and customer counts, Memory-centric data management, Pricing, Software as a Service (SaaS)

13 Comments

December 11, 2009

Ray Wang on SAP

Ray Wang made a terrific post based on SAP’s annual influencer love-in, an event which I no longer attend. Ray believes SAP has been in a “crisis”, and sums up his views as

The Bottom Line – SAP’s Turning The Corner

Credit must be given to SAP for charting a new course. A shift in the management philosophy and product direction will take years to realize, however, its not too late for change. SAP must remember its roots and become more German and less American. The renewed focus must put customer requests and priorities ahead of SAP’s bureaucracy. The emphasis must focus on the relationship. When that reemerges in how SAP works with customers, partners, influencers, and its own employees, SAP will be back in good graces. In the meantime, its time to get to work and deliver. Oracle’s Fusions Apps are coming soon and competitors such as IBM, Microsoft, Epicor, IFS, and SalesForce.com will not relent.

I recall the 1980s, when SAP’s main differentiator, at least in the English-speaking US, was a total commitment to customer success, and when it could be taken for granted that SAP would do business ethically. Things change, and not always for the better.

Anyhow, the reason I’m highlighting Ray’s post is that he makes reference to a number of interesting SAP-cetric technology trends or initiatives. Read more

Categories: Analytic technologies, Business intelligence, Memory-centric data management, MOLAP, SAP AG, Solid-state memory

1 Comment

December 7, 2009

A framework for thinking about data warehouse growth

There are only three ways that the amount of data stored in data warehouses can grow:

The same kinds of data are stored as before, with more being added over time.
The same kinds of data are stored as before, but in more detail.
New kinds of data are stored.

Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Log analysis, Solid-state memory, Storage, Telecommunications, Text, Web analytics

9 Comments

December 2, 2009

Webinar on MapReduce for complex analytics (Thursday, December 3, 10 am and 2 pm Eastern)

The second in my two-webinar series for Aster Data will occur tomorrow, twice (both live), at 10 am and 2 pm Eastern time. The other presenters will be Jonathan Goldman, who was a Principal Scientist at LinkedIn but now has joined Aster himself, and Steve Wooledge of Aster (playing host). Key links are:

Registration for tomorrow’s webinars
Replay of the first webinar
My slides from the first webinar

The main subjects of the webinar will be:

Some review of material from the first webinar (all three presenters)
Discussion of how MapReduce can help with three kinds of analytics:
- Pattern matching (Jonathan will give detail)
- Number-crunching (I’ll cover that, and it will be short)
- Graph analytics (I haven’t written the slides yet, but my starting point will be some of the relationship analytics ideas we discussed in August)

Arguably, aspects of data transformation fit into each of those three categories, which may help explain why data transformation has been so prominent among the early applications of MapReduce.

As you can see from Aster’s title for the webinar (which they picked while I was on vacation), at least their portion will be focused on customer analytics, e.g. web analytics.

Categories: Analytic technologies, Aster Data, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, MapReduce, RDF and graphs, Web analytics

4 Comments

November 25, 2009

New England Database Summit (January 28, 2010)

New England Database Day has now, in its third year, become a “Summit.” It’s a nice event, providing an opportunity for academics and business folks to mingle. The organizers are basically the local branch of the Mike Stonebraker research tree, with this year’s programming head being Daniel Abadi. It will be on Thursday, January 28, 2010, once again in the Stata Center at MIT. It would be reasonable to park in the venerable 4/5 Cambridge Center parking lot, especially if you’d like to eat at Legal Seafood afterwards.

So far there are two confirmed speakers — Raghu Ramakrishnan of Yahoo and me. My talk title will be something like “Database and analytic technology: The state of the union”, with all wordplay intended.

There’s more information at the official New England Database Summit website. There’s also a post with similar information on Daniel Abadi’s DBMS Musings blog.

Edit after the event:

Posts based on my January, 2010 New England Database Summit keynote address

Categories: Analytic technologies, Data warehousing, Michael Stonebraker, Presentations, Theory and architecture

4 Comments

November 23, 2009

Comments on a fabricated press release quote

My clients at Kickfire put out a press release last week quoting me as saying things I neither said nor believe. The press release is about a “Queen For A Day” kind of contest announced way back in April, in which users were invited to submit stories of their data warehouse problems, with the biggest sob stories winning free Kickfire appliances. The fabricated “quote” reads: Read more

Categories: About this blog, Data warehouse appliances, Data warehousing, Kickfire, Market share and customer counts, Sybase

3 Comments

November 23, 2009

Boston Big Data Summit keynote outline

Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Cloud computing, Clustering, Columnar database management, Data warehouse appliances, Data warehousing, DBMS product categories, Humor, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, NoSQL, OLTP, Open source, Parallelization, Presentations, Pricing, Solid-state memory, Storage, Telecommunications, Theory and architecture, Web analytics

6 Comments

November 7, 2009

Calpont’s InfiniDB

Since its inception, Calpont has gone through multiple management teams, strategies, and investor groups. What it hadn’t done, ever, is actually shipped a product. Last week, however, Calpont introduced a free/open source DBMS, InfiniDB, with technical details somewhat reminiscent of what Calpont was promising last April. Highlights include:

Like Infobright, Calpont’s InfiniDB is a columnar DBMS consisting of a MySQL front end and a columnar storage engine.
Community edition InfiniDB runs on a single server.
One of commercial/enterprise edition InfiniDB’s main claims to fame will be MPP support.
There’s no announced time frame for commercial edition InfiniDB.
InfiniDB’s current compression story is dictionary/token only, with decompression occurring before joins are executed. Improvement is a roadmap item.
Indeed, InfiniDB has many roadmap items, a few of which can be found here. Also, a great overview of InfiniDB’s current state and roadmap can be found in this MySQL Performance Blog thread. (And follow the links there to find performance discussions of other free analytic DBMS.)
One thing InfiniDB already has that is still a roadmap item for Infobright is the ability to run a query across multiple cores at once.
One thing free InfiniDB has that Infobright only offers in its Enterprise Edition is ACID-compliant Insert/Update/Delete. (Note: I wish people would stop saying that Infobright Enterprise Edition isn’t ACID-compliant, since that point was cleared up a while ago.)
InfiniDB has no indexes or materialized views.
However, InfiniDB’s retrieval is expedited by something called “Extents,” which sounds a lot like Netezza’s zone maps.

Being on vacation, I’ll stop there for now. (If it weren’t for Tropical Storm/ depression Ida, I might not even be posting this much until I get back.)

Categories: Analytic technologies, Calpont, Columnar database management, Data warehousing, Database compression, Infobright, MySQL, Open source

3 Comments

October 30, 2009

Aster Data 4.0 and the evolution of “advanced analytic(s) servers”

Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”

Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:

Aster is offering a slick vision for integrating big-database management and general analytic processing on the same MPP cluster, under the not-so-slick name “Data-Application Server.”
Aster is also offering a sophisticated vision for workload management.

In addition, Aster has matured nCluster in various ways, for example cleaning up a performance problem with single-row updates.

Highlights of the Aster “Data-Application Server” story include: Read more

Categories: Aster Data, Cloud computing, Data warehousing, EAI, EII, ETL, ELT, ETLT, MapReduce, Market share and customer counts, Teradata, Theory and architecture, Workload management

9 Comments

October 30, 2009

A question on MDX performance

An enterprise user wrote in with a question that boils down to:

What are reasonable MDX performance expectations?

MDX doesn’t come up in my life very much, and I don’t have much intuition about it. E.g., I don’t know whether one can slap an MDX-to-SQL converter on top of a fast analytic RDBMS and go to town. What’s more, I’m heading off on vacation and don’t feel like researching the matter myself in the immediate future. 🙂

So here’s the long form of the question. Any thoughts?

I have a general question on assessing the performance of an OLAP technology using a set of MDX queries. I would be interested to know if there are any benchmark MDX performance tests/results comparing different OLAP technologies (which may be based on different underlying DBMS’s if appropriate) on similar hardware setup, or even comparisons of complete appliance solutions. More generally, I want to determine what performance limits I could reasonably expect on what I think are fairly standard servers.

In my own work, I have set up a star schema model centered on a Fact table of 100 million rows (approx 60 columns), with dimensions ranging in cardinality from 5 to 10,000. In ad hoc analytics, is it expected that any query against such a dataset should return a result within a minute or two (i.e. before a user gets impatient), regardless of whether that query returns 100 cells or 50,000 cells (without relying on any aggregate table or caching mechanism)? Or is that level of performance only expected with a high end massively parallel software/hardware solution? The server specs I’m testing with are: 32-bit 4 core, 4GB RAM, 7.2k RPM SATA drive, running Windows Server 2003; 64-bit 8 core, 32GB RAM, 3 Gb/s SAS drive, running Windows Server 2003 (x64).

I realise that caching of query results and pre-aggregation mechanisms can significantly improve performance, but I’m coming from the viewpoint that in purely exploratory analytics, it is not possible to have all combinations of dimensions calculated in advance, in addition to being maintained.

Categories: Analytic technologies, Benchmarks and POCs, Data warehousing, MOLAP

16 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in