Analytic technologies
Discussion of technologies related to information query and analysis. Related subjects include:
- Business intelligence
- Data warehousing
- (in Text Technologies) Text mining
- (in The Monash Report) Data mining
- (in The Monash Report) General issues in analytic technology
Examples and definition of machine-generated data
In posts made last December, January, and April, I argued:
- Much of the growth in analytic data volumes will come in the form of machine-generated data.
- Unlike human-generated data, machine-generated data will grow at Moore’s Law kinds of speeds.
- Thus, unlike human-generated data, which I advocate keeping pretty much in all its detail, machine-generated data will continue to be in large part thrown away.
Recently and somewhat belatedly, I added a somewhat obvious point — if we don’t keep all or even most of our machine-generated data, then what we keep is likely to be in some way massaged, extracted, or derived. The purpose of this post is to address a second oversight — giving a hopefully clear definition of what I actually mean by “machine-generated data.” Read more
| Categories: Data warehousing | 28 Comments |
Evolving definitions and technology categories for 2011
It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. Read more
| Categories: Analytic technologies, Data types, Data warehousing, DBMS product categories, MOLAP, Theory and architecture | 6 Comments |
Data that is derived, augmented, enhanced, adjusted, or cooked
On this food-oriented weekend, I could easily go on long metaphorical flights about the distinction between “raw” and “cooked” data. I’ll spare you that part — reluctantly, given my fondness for fresh fruit, sushi, and steak tartare — but there’s no escaping the importance of derived/augmented/enhanced/cooked/adjusted data for analytic data processing. The five areas I have in mind are, loosely speaking:
- Aggregates, when they are maintained, generally for reasons of performance or response time.
- Calculated scores, commonly based on data mining/predictive analytics.
- Text analytics.
- The kinds of ETL (Extract/Transform/Load) Hadoop and other forms of MapReduce are commonly used for.
- Adjusted data, especially in scientific contexts.
| Categories: Analytic technologies, Data warehousing, Derived data | 12 Comments |
Teradata announcements made very simple
For reasons of health,* I very regretfully canceled my trip to what is the first conference to go on my schedule every year — Teradata Partners. From afar, I’m not plugged into the details of Teradata’s announcement/embargo schedule. But what you need to know starts with this:
- Teradata signaled a year ago that its software focus was on adding analytic functionality, including specifically in the temporal area.
- Teradata likes to refresh its hardware annually, with a 50%+ price/performance improvement. (This year Teradata is going to 6-core Xeon processors.)
*Just a cough, but I’m both exhausted and potentially contagious, and this wasn’t a trip on which I had any truly urgent obligations (speeches, packed-room consulting sessions, whatever).
| Categories: Analytic technologies, Data warehousing, Teradata | Leave a Comment |
Notes and links October 22, 2010
A number of recent posts have had good comments. This time, I won’t call them out individually.
Evidently Mike Olson of Cloudera is still telling the machine-generated data story, exactly as he should be. The Information Arbitrage/IA Ventures folks said something similar, focusing specifically on “sensor data” …
… and, even better, went on to say: Read more
Notes on data warehouse appliance prices
I’m not terribly motivated to do a detailed analysis of data warehouse appliance list prices, in part because:
- Everybody knows that in practice data warehouse appliances tend to be deeply discounted from list price.
- The only realistic metric to use for pricing data warehouse appliances is price-per-terabyte, and people have gotten pretty sick of that one.
That said, here are some notes on data warehouse appliance prices. Read more
| Categories: Data warehouse appliances, Data warehousing, Database compression, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing | 8 Comments |
Notes on the EMC Greenplum Data Computing Appliance
The big confidential part of my visit last week to EMC’s Data Computing Division, nee’ Greenplum, was of course this week’s announcement of the first EMC/Greenplum “Data Computing Appliance.” Basics include: Read more
| Categories: Analytic technologies, Data warehousing, EMC, Exadata, Greenplum, Oracle, Parallelization, Storage | 1 Comment |
Vertica-Hadoop integration
DBMS/Hadoop integration is a confusing subject. My post on the Cloudera/Aster Data partnership awaits some clarification in the comment thread. A conversation with Vertica left me unsure about some Hadoop/Vertica Year 2 details as well, although I’m doing better after a follow-up call. On the plus side, we also covered some rather cool Hadoop/Vertica product futures, and those seemed easier to understand. 🙂
I say “Year 2” because Hadoop/Vertica integration has been going on since last year. Indeed, Vertica says that there are now over 25 users of the Hadoop/Vertica combination and hence Vertica’s Hadoop connector. Vertica is now introducing — for immediate GA — a new version of its Hadoop connector. So far as I understood:
Read more
| Categories: Analytic technologies, Cloudera, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, Market share and customer counts, SQL/Hadoop integration, Text, Vertica Systems | 6 Comments |
A few notes from XLDB 4
As much as I believe in the XLDB conferences, I only found time to go to (a big) part of one day of XLDB 4 myself. In general: Read more
| Categories: Analytic technologies, Health care, Michael Stonebraker, MySQL, Open source, Parallelization, Petabyte-scale data management, Scientific research, Surveillance and privacy | 2 Comments |
Partnering with Cloudera
After I criticized the marketing of the Aster/Cloudera partnership, my clients at Aster Data and Cloudera ganged up on me and tried to persuade me I was wrong. Be that as it may, that conversation and others were helpful to me in understanding the core thesis: Read more
