June 27, 2011

What colleges should teach in analytics

Based on a Teradata press release calling attention to the small amount of explicit university instruction in business intelligence, I was asked:

Does BI really need a dedicated undergrad track? What sort of BI and analytics-related skills should students look to obtain now in order to be viable in the job marketplace five years out?

My answers were (slightly edited):

Most important is a basic, intuitive understanding of statistical significance. If you’re looking at an apparent trend, is it real or just random variation?
Also crucial are general analytic and quantitative problem-solving skills.
One also should have a comfort level learning how to use new software tools.
Everybody in business should have those skillsets. So should people in science, medicine, teaching, journalism, government, and most other vocations.
The more analytically oriented should add basic programming skills, and basic knowledge of SQL. While SQL’s utter dominance is ebbing a bit, it still will be with us for a very long time.

Of course, there are more specialized skills also worth teaching, in a number of areas, starting with statistics and other predictive modeling technologies. But it’s OK to go through life not knowing those.

Categories: Analytic technologies, Business intelligence, Data warehousing, NoSQL, Predictive modeling and advanced analytics, Teradata

1 Comment

June 26, 2011

What to think about BEFORE you make a technology decision

When you are considering technology selection or strategy, there are a lot of factors that can each have bearing on the final decision — a whole lot. Below is a very partial list.

In almost any IT decision, there are a number of environmental constraints that need to be acknowledged. Organizations may have standard vendors, favored vendors, or simply vendors who give them particularly deep discounts. Legacy systems are in place, application and system alike, and may or may not be open to replacement. Enterprises may have on-premise or off-premise preferences; SaaS (Software as a Service) vendors probably have multitenancy concerns. Your organization can determine which aspects of your system you’d ideally like to see be tightly integrated with each other, and which you’d prefer to keep only loosely coupled. You may have biases for or against open-source software. You may be pro- or anti-appliance. Some applications have a substantial need for elastic scaling. And some kinds of issues cut across multiple areas, such as budget, timeframe, security, or trained personnel.

Multitenancy is particularly interesting, because it has numerous implications. Read more

Categories: Analytic technologies, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, Predictive modeling and advanced analytics, Software as a Service (SaaS)

3 Comments

June 24, 2011

Observations on Oracle pricing

A couple of months ago, Oracle asked me to pull some observations on pricing until after the earnings call that just occurred, and I grudgingly acquiesced. In the interim, more information on Oracle pricing has emerged (including in the comment thread to that post). The original notes are:

Oracle disputes some common claims about its cost and pricing. In particular, Oracle software maintenance costs a fixed 22% of your annual license price, so if you get a discount on your licenses, it ripples through to your maintenance. This is true even if you have an all-you-can-eat ULA (Unlimited License Agreement).

Based on that, Oracle contends that Exadata isn’t all that expensive if you have a suitable ULA. You have to buy the hardware and the storage software, but the database server software is effectively free. (Whether your use of additional licenses affect the price of your ULA when it comes up for renewal might, of course, be a different matter.)
Nothing in that discussion obviates the point that if you’re just using Oracle Standard Edition, upgrading to Oracle Enterprise Edition, associated chargeable options, and/or Exadata can be seriously expensive.

Categories: Exadata, Oracle, Pricing

1 Comment

June 24, 2011

Forthcoming Oracle appliances

Edit: I checked with Oracle, and it’s indeed TimesTen that’s supposed to be the basis of this new appliance, as per a comment below. That would be less cool, alas.

Oracle seems to have said on yesterday’s conference call Oracle OpenWorld (first week in October) will feature appliances based on Tangosol and Hadoop. As I post this, the Seeking Alpha transcript of Oracle’s call is riddled with typos. Bolded comments below are by me. Read more

Categories: Data warehouse appliances, Hadoop, In-memory DBMS, MapReduce, Memory-centric data management, Object, Oracle

8 Comments

June 22, 2011

Citrusleaf RTA

Citrusleaf has released an add-on product called Citrusleaf RTA (Real-Time Attribution). It’s to be used when:

You want to update dashboards within a minute.
You want to update predictive models fairly quickly (within the hour?), although it’s not clear to me how much the models are being updated or changed with that latency.

The metrics envisioned are:

100 or so ad impressions per person …
… for 1 billion or so people …
… stored for 30-90 days …
… where each ad impression is a fairly short record …
… stored on disk …
… but indexed in a way so that the index can fit into RAM.
50-100,000 writes per second. (I didn’t ask on what amount of hardware.)
Several hundred reads per second.

A consistent relational schema is NOT assumed.

Citrusleaf’s solution is:

Have one index entry for each of the 1 billion people.
Bang each new object/record to disk. Include in it a pointer to the previous object/record for the same person.
Each time a new object/record is added, update the index in place so that it now points to the new once. Hence, the index is sized according to the number of people, not according to the total number of objects/records.
Eventually let objects/records age off in the obvious way.

The downside is that when you do read 100 objects/records per person, you might need to do 100 seeks.

Categories: Aerospike, Analytic technologies, Business intelligence, Data models and architecture, Data warehousing, Log analysis, Predictive modeling and advanced analytics, Theory and architecture, Web analytics

3 Comments

June 21, 2011

It’s official — the grand central EDW will never happen

I pointed out last year that the grand central enterprise data warehouse couldn’t happen; the post started:

An enterprise data warehouse should:

Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.

Manage all the data in your organization.

Pick ONE.

IBM’s main theme at the Enzee Universe conference has been to say the same thing.

Merv Adrian’s talk at the same conference made it clear that Gartner feels the same way, as does he personally. Indeed, like me, he’s racked up multiple decades of industry experience without ever finding a single theoretically ideal grand central EDW.

Forrester Research has been a little less clear on the point, but generally seems to be on the correct side of the issue as well.

If somebody is still saying that one central enterprise data warehouse can hold all the information or data you need on which to base your business decisions, they’re probably not somebody you should be listening to very hard.

Is that clear, or should I hammer home the point even harder? 😀

Categories: Data warehousing, IBM and DB2, Netezza

8 Comments

June 20, 2011

The Vertica story (with soundbites!)

I’ve blogged separately that:

Vertica has a bunch of customers, including seven with 1 or more petabytes of data each.
Vertica has progressed down the analytic platform path, with Monday’s release of Vertica 5.0.

And of course you know:

Vertica (the product) is columnar, MPP, and fast.*
Vertica (the company) was recently acquired by HP.**

Categories: Benchmarks and POCs, Columnar database management, ParAccel, Parallelization, Vertica Systems

4 Comments

June 20, 2011

Vertica as an analytic platform

Vertica 5.0 is coming out today, and delivering the down payment on Vertica’s analytic platform strategy. In Vertica lingo, there’s now a Vertica SDK (Software Development Kit), featuring Vertica UDT(F)s* (User-Defined Transform Functions). Vertica UDT syntax basics start: Read more

Categories: Analytic technologies, Data warehousing, GIS and geospatial, Predictive modeling and advanced analytics, RDF and graphs, Vertica Systems, Workload management

7 Comments

June 20, 2011

Temporal data, time series, and imprecise predicates

I’ve been confused about temporal data management for a while, because there are several different things going on.

Date arithmetic. This of course has been around for a very long — er, for a very long time.
Time-series-aware compression. This has been around for quite a while too.
“Time travel”/snapshotting — preserving the state of the database at previous points in time. This is a matter of exposing (and not throwing away) the information you capture via MVCC (Multi-Version Concurrency Control) and/or append-only updates (as opposed to update-in-place). Those update strategies are increasingly popular for pretty much anything except update-intensive OLTP (OnLine Transaction Processing) DBMS, so time-travel/snapshotting is an achievable feature for most vendors.
Bitemporal data access. This occurs when a fact has both a transaction timestamp and a separate validity duration. A Wikipedia article seems to cover the subject pretty well, and I touched on Teradata’s bitemporal plans back in 2009.
Time series SQL extensions. Vertica explained its version of these to me a few days ago. I imagine Sybase IQ and other serious financial-trading market players have similar features.

In essence, the point of time series/event series SQL functionality is to do SQL against incomplete, imprecise, or derived data.* Read more

Categories: Analytic technologies, Data types, Investment research and trading, Log analysis, Sybase, Telecommunications, Theory and architecture, Vertica Systems

2 Comments

June 20, 2011

Columnar DBMS vendor customer metrics

Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive. Read more

Categories: Columnar database management, Data warehousing, Games and virtual worlds, Infobright, Investment research and trading, Log analysis, Market share and customer counts, Open source, ParAccel, Petabyte-scale data management, SAND Technology, Sybase, Telecommunications, Vertica Systems, Web analytics

5 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

What colleges should teach in analytics

What to think about BEFORE you make a technology decision

Observations on Oracle pricing

Forthcoming Oracle appliances

Citrusleaf RTA

It’s official — the grand central EDW will never happen

The Vertica story (with soundbites!)

Vertica as an analytic platform

Temporal data, time series, and imprecise predicates

Columnar DBMS vendor customer metrics

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin