June 5, 2009

Greenplum update — Release 3.3 and so on

I visited Greenplum in early April, and talked with them again last night. As I noted in a separate post, there are a couple of subjects I won’t write about today. But that still leaves me free to cover a number of other points about Greenplum, including: Read more

Categories: Data warehousing, Database compression, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Market share and customer counts, Parallelization, PostgreSQL, Pricing

11 Comments

June 5, 2009

Greenplum will be announcing some stuff

Greenplum is having a webinar Monday to announce “The Next Big Leap in Data Warehousing” (capitalization theirs). The idea they’ll be talking about is a genuinely good one. And off the top of my head I can only think of a few vendors who implemented it before Greenplum, and even fewer who emphasize it explicitly. So if you like webinars, you might want to listen in. I plan to blog about the general concept soon after the 12:01 am Monday embargo lifts. (Uh, guys, it is Monday rather than Tuesday, right?) Read more

Categories: Data warehousing, Greenplum, Specific users

1 Comment

June 3, 2009

What statistics texts and other analytics books should we recommend to people?

On a message board I frequent, two different guys have asked for recommendations for statistics textbooks, in a kind of general knowledge vein. One phrases it as:

I’m looking for a general purpose statistics textbook for reference purposes.

giving his background as

I took Calculus-level Statistics in college. (i.e. 2 semesters of Calc was a prerequisite; this was the stats class that stat majors took.)

He was a computer science major and is now a professional programmer. (And if somebody can use a tournament-chess-smart programmer with outstandingly clear communication skills in the Buffalo area, I’m pretty sure he’d be glad to know about the opportunity. But I digress …)

The other is a law student with a more general need, which he phrases as

I want to use them for work to help identify trends; do multiple regressions; put values on things that aren’t easy to quantify, etc.

Economics I already know most of the basics from my undergrad studies, but I need more advanced economic theory and such.

He’s interested in what I’d call “pop” analytics books as well as hardcore stuff; e.g., the one book he’s identified already is “Competing with Analytics.” I’m thinking some good vendor white papers might be just as useful for him as that class of books. But he obviously also wants to learn the hardcore stuff as well.

I haven’t attended or taught a college course since 1981, and I tend to find the business books on analytics too simple for my tastes, so I’m not the right guy to answer from his own experience.

Does anybody have any helpful thoughts? Thanks!

Categories: Analytic technologies

11 Comments

May 30, 2009

Reinventing business intelligence

I’ve felt for quite a while that business intelligence tools are due for a revolution. But I’ve found the subject daunting to write about because — well, because it’s so multifaceted and big. So to break that logjam, here are some thoughts on the reinvention of business intelligence technology, with no pretense of being in any way comprehensive.

Natural language and classic science fiction

Actually, there’s a pretty well-known example of BI near-perfection — the Star Trek computers, usually voiced by the late Majel Barrett Roddenberry. They didn’t have a big role in the recent movie, which was so fast-paced nobody had time to analyze very much, but were a big part of the Star Trek universe overall. Star Trek’s computers integrated analytics, operations, and authentication, all with a great natural language/voice interface and visual displays. That example is at the heart of a 1998 article on natural language recognition I just re-posted.

As for reality: For decades, dating back at least to Artificial Intelligence Corporation’s Intellect, there have been offerings that provided “natural language” command, control, and query against otherwise fairly ordinary analytic tools. Such efforts have generally fizzled, for reasons outlined at the link above. Wolfram Alpha is the latest try; fortunately for its prospects, natural language is really only a small part of the Wolfram Alpha story.

A second theme has more recently emerged — using text indexing to get at data more flexibly than a relational schema would normally allow, either by searching on data values themselves (stressed by Attivio) or more by searching on the definitions of pre-built reports (the Google OneBox story). SAP’s Explorer is the latest such view, but I find Doug Henschen’s skepticism about SAP Explorer more persuasive than Cindi Howson’s cautiously favorable view. Partly that’s because I know SAP (and Business Objects); partly it’s because of difficulties such as those I already noted.

Flexibility and data exploration

It’s a truism that each generation of dashboard-like technology fails because it’s too inflexible. Users are shown the information that will provide them with the most insight. They appreciate it at first. But eventually it’s old hat, and when they want to do something new, the baked-in data model doesn’t support it.

The latest attempts to overcome this problem lie in two overlapping trends — cool data exploration/visualization tools, and in-memory analytics. Read more

Categories: Analytic technologies, Business intelligence, Google, Memory-centric data management, Microsoft and SQL*Server, SAP AG

19 Comments

May 29, 2009

Sneakernet to the cloud

Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, the best way to get large databases into the cloud is via sneakernet. In some circumstances, he is surely right. Possible implications include:

When sending data to the cloud, you probably want to compress it to the max before sending. Clearpace’s new RainStor structured-data archiving service emphasizes that idea. RainStor marketing says cloud, cloud, cloud — but Clearpace thinks you really should have a bit of its software onsite too, to compress the data before sending it across the wire.
Getting data from one cloud to another cloud could be problematic. I’m fond of saying that weblog data naturally lives in the cloud at your hosting company’s location, so you should analyze it there too. But this makes the most sense if you analyze it or at least filter/reduce it in place. (That said, the really, really big web companies have lots of different data centers, and presumably do move huge amounts of log data from place to place.)

But for one-time moves of data sets — sure, sneaker net/snail mail should work just fine.

Categories: Amazon and its cloud, Cloud computing, Database compression, EAI, EII, ETL, ELT, ETLT, Web analytics

2 Comments

May 27, 2009

Song of the contract programming firm, and other filks

I heard a different version of the same idea at Boskone once, but here is a pretty good send-up of what might occur at a customer review session. (Warning, however: Low production values.) Also, in case you missed them, considerably funnier are a couple of classic Star Trek filksongs, especially the first.

While I’m on the subject, a couple of more serious filksongs I really like are:

Jordin Kare’s Fire in the Sky
Heather Alexander singing Demonbane.

Other great serious filksongs are “Queen of Air and Darkness” (Poul Anderson lyrics) and Jordin Kare’s “When the Ship Lifts, All Debts Are Paid”, but I can’t find recordings of those now.

And finally, back to the humor: I just found a video to a song I posted previously.

Categories: Fun stuff, Humor

2 Comments

May 26, 2009

Teradata Developer Exchange (DevX) begins to emerge

Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its. It’s called Teradata Developer Exchange — DevX for short. Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so. Major elements are about what one would expect:

Articles
Blogs
Downloads
Surprisingly, so far as I can tell, no forums

If you’re a Teradata user, you absolutely should check out Teradata DevX. If you just research Teradata — my situation 🙂 — there are some aspects that might be of interest anyway. In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility. Mainly, these are UDFs (User-Defined Functions), in areas such as:

Compression
Geospatial data
Imitating Oracle or DB2 UDFs (as migration aids)

Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint. A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard. A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.

Categories: Database compression, Emulation, transparency, portability, GIS and geospatial, Teradata

2 Comments

May 22, 2009

Yet more on MySQL forks and storage engines

The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention. The underlying question is:

Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL? Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?

Categories: MySQL, Open source, PostgreSQL

11 Comments

May 21, 2009

How big are the intelligence agencies’ data warehouses?

Edit: The relevant part of the article cited has now been substantially changed, in line with Jeff Jonas’ remarks in the comment thread below.

Joe Harris linked me to an article that made a rather extraordinary claim:

At another federal agency Jonas worked at (he wouldn’t say which), they had a very large data warehouse in the basement. The size of the data warehouse was a secret, but Jonas estimated it at 4 exabytes (EB), and increasing at the rate of 5 TB per day.

Now, if one does the division, the quote claims it takes 800,000 days for the database to double in size, which is absurd. Perhaps this (Jeff) Jonas guy was just talking about a 4 petabyte system and got confused. (Of course, that would still be pretty big.) But before I got my arithmetic straight, I ran the 4 exabyte figure past a couple of folks, as a target for the size of the US government’s largest classified database. Best guess turns out to be that it’s 1-2 orders of magnitude too high for the government’s largest database, not 3. But that’s only a guess …

Categories: Data warehousing, Specific users

5 Comments

May 21, 2009

Notes on CEP application development

While performance may not be all that great a source of CEP competitive differentiation, event processing vendors find plenty of other bases for technological competition, including application development, analytics, packaged applications, and data integration. In particular:

Most independent CEP vendors have some kind of application story in the capital markets vertical, such as packaged applications, ISV partners with packaged applications, application frameworks, and so on.
CEP vendors offer lots of connectors to specific financial industry price/quote/trade feeds, as well as the usual other kinds of database connectivity (SQL, XML, etc.)
Aleri/Coral8 (separately and now together) like to call attention to their business intelligence/analytics offerings. Analytics is front-and-center on Truviso’s web site too, not that Truviso does much to call attention to itself, period. (Roman Bukary once said he’d outline Truviso’s new strategy to me in 6-8 weeks or so … it’s now 14 months and counting.)

So far as I can tell, the areas of applications and analytics are fairly uncontroversial. Different CEP vendors have implemented different kinds of things, no doubt focusing on those they thought they would find easiest to build and then sell. But these seem to be choices in business execution, not in core technical philosophy.

In CEP application development, however, real philosophical differences do seem to arise. There are at least three different CEP application development paradigms: Read more

Categories: Aleri and Coral8, Business intelligence, Microsoft and SQL*Server, Progress, Apama, and DataDirect, StreamBase, Streaming and complex event processing (CEP)

5 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Greenplum update — Release 3.3 and so on

Greenplum will be announcing some stuff

What statistics texts and other analytics books should we recommend to people?

Reinventing business intelligence

Sneakernet to the cloud

Song of the contract programming firm, and other filks

Teradata Developer Exchange (DevX) begins to emerge

Yet more on MySQL forks and storage engines

How big are the intelligence agencies’ data warehouses?

Notes on CEP application development

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin