Analytic technologies
Discussion of technologies related to information query and analysis. Related subjects include:
- Business intelligence
- Data warehousing
- (in Text Technologies) Text mining
- (in The Monash Report) Data mining
- (in The Monash Report) General issues in analytic technology
Hybrid-columnar soundbites
Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday’s post on Teradata columnar:
- Oracle does not actually offer columnar I/O; the other three systems do. But see the “I won’t be surprised” part in yesterday’s Teradata post.
- Aster does not offer columnar compression; the other three do.
- EMC Greenplum and Teradata offer different kinds of ways to mix column and row storage in the same table; each has its advantages.
- Teradata generally has a more mature and capable offering than EMC Greenplum, for most purposes, whichever way you choose to organize your tables.
Edit: The Wall Street Journal got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote
While columnar technology has been around for years, Teradata says its product is unique because it allows users to include both columns and rows in the same database.
Googling on “Teradata To Unveil New Analytics Product To Speed Business Adoption” might get you around the paywall to see the offending piece.
| Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Teradata | 2 Comments |
Aster Database Release 5 and Teradata Aster appliance
It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn’t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.
Along with the announcements comes updated positioning such as:
- Better SQL than the MapReduce alternatives have.
- Better MapReduce than the SQL alternatives have.
- Easy(ier) way to do complex analytics on multi-structured data. (Aster has embraced that term.)
and of course
- Now also with Teradata’s beautifully engineered hardware and system management software!
| Categories: Aster Data, Data warehouse appliances, Data warehousing, Predictive modeling and advanced analytics, Teradata, Workload management | Leave a Comment |
Teradata Columnar and Teradata 14 compression
Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by “Teradata 14” I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14’s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster Data (now part of Teradata).
The basic idea of Teradata Columnar is:
- Each table can be stored in Teradata in row format, column format, or a mix.
- You can do almost anything with a Teradata columnar table that you can do with a row-based one.
- If you choose column storage, you also get some new compression choices.
| Categories: Archiving and information preservation, Columnar database management, Data warehousing, Database compression, Oracle, Rainstor, Teradata | 7 Comments |
XLDB: The one conference I like to attend
I’m not a big fan of conferences, but I really like XLDB. Last year I got a lot out of XLDB, even though I couldn’t stay long (my elder care issues were in full swing). The year before I attended the whole thing — in Lyon, France, no less — and learned a lot more. This year’s XLDB conference is at SLAC — the organization formerly known as the Stanford Linear Accelerator Center — on Sand Hill Road in Menlo Park, October 18-19. As of right now, I plan to be there, at least on the first day. XLDB’s agenda and registration details (inexpensive) can be found on the XLDB conference website.
The only reason I wouldn’t go is if that turned out to be a lousy week for me to travel to California.
The people who go XLDB tend to be really smart — either research scientists, hardcore database technologists, or others who can hold their own with those folks. Audience participation can be intense; the most talkative members I can recall were Mike Stonebraker, Martin Kersten, Michael McIntire, and myself. Even the vendor folks tend to the smart — past examples include Stephen Brobst, Jeff Hammerbacher, Luke Lonergan, and IBM Fellow Laura Haas. When we had a datageek bash on my last trip to the SF area, several guys said they were planning to attend XLDB as well.
XLDB stands for eXtremely Large DataBases, and those are indeed what gets talked about there. Read more
| Categories: Data warehousing, Predictive modeling and advanced analytics, Scientific research | 5 Comments |
Are there any remaining reasons to put new OLTP applications on disk?
Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:
- 100s of gigabytes of data at first, growing to >1 terabyte over time.
- High peak loads.
- Public cloud portability (but they have private data centers they can use today).
- Simple database design — not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.
- Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)
So I’m leaning to saying: Read more
“Big data” has jumped the shark
I frequently observe that no market categorization is ever precise and, in particular, that bad jargon drives out good. But when it comes to “big data” or “big data analytics”, matters are worse yet. The definitive shark-jumping moment may be Forrester Research’s Brian Hopkins’ claim that:
… typical data warehouse appliances, even if they are petascale and parallel, [are] NOT big data solutions.
Nonsense almost as bad can be found in other venues.
Forrester seems to claim that “big data” is characterized by Volume, Velocity, Variety, and Variability. Others, less alliteratively-inclined, might put Complexity in the mix. So far, so good; after all, much of what people call “big data” is collections of disparate data streams, all collected somewhere in a big bit bucket. But when people start defining “big data” to include Variety and/or Variability, they’ve gone too far.
Aster Data business trends
Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to acquisition by their new orange overlords. The answers aren’t what they used to be. Aster no longer focuses much on what it used to call frontline (i.e., low-latency, operational) applications; those are of course a key strength for Teradata. Rather, Aster focuses on investigative analytics — they’ve long endorsed my use of the term — and on the batch run/scoring kinds of applications that inform operational systems.
| Categories: Analytic technologies, Application areas, Aster Data, Data warehousing, DataStax, RDF and graphs, Surveillance and privacy, Teradata, Web analytics | 1 Comment |
Vertica projections — an overview
Partially at my suggestion, Vertica has blogged a three–part series explaining the “projections” that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include:
- A Vertica projection can contain:
- All the columns in a table.
- Some of the columns in a table.
- A prejoin among tables.
- Vertica projections are updated and maintained just as base tables are. (I.e., there’s no kind of batch lag.)
- You can import the same logical schema you use elsewhere. Vertica puts no constraints on your logical schema. Note: Vertica has been claiming good support for all logical schemas since Vertica 4.0 came out in early 2010.
- Vertica (the product) will automatically generate a physical schema for you — i.e. a set of projections — that Vertica (the company) thinks will do a great job for you. Note: That also dates back to Vertica 4.0.
- Vertica claims that queries are very fast even when you haven’t created projections explicitly for them. Note: While the extent to which this is true may be a matter of dispute, competitors clearly overreach when they make assertions like “every major Vertica query needs a projection prebuilt for it.”
- On the other hand, it is advisable to build projections (automatically or manually) that optimize performance of certain parts of your query load.
The blog posts contain a lot more than that, of course, both rah-rah and technical detail, including reminders of other Vertica advantages (compression, no logging, etc.). If you’re interested in analytic DBMS, they’re worth a look.
Derived data, progressive enhancement, and schema evolution
The emphasis I’m putting on derived data is leading to a variety of questions, especially about how to tease apart several related concepts:
- Derived data.
- Many-step processes to produce derived data.
- Schema evolution.
- Temporary data constructs.
So let’s dive in. Read more
| Categories: Data models and architecture, Data warehousing, Derived data, MarkLogic, Text | Leave a Comment |
Virtual data marts in Sybase IQ
I made a few remarks about Sybase IQ 15.3 when it became generally available in July. Now that I’ve had a current briefing, I’ll make a few more.
The key enhancement in Sybase IQ 15.3 is distributed query — what others might call parallel query — aka PlexQ. A Sybase IQ query can now be distributed among many nodes, all talking to the same SAN (Storage-Area Network). Any Sybase IQ node can take the responsibility of being the “leader” for that particular query.
In itself, this isn’t that impressive; all the same things could have been said about pre-Exadata Oracle.* But PlexQ goes somewhat further than just removing a bottleneck from Sybase IQ. Notably, Sybase has rolled out a virtual data mart capability. Highlights of the Sybase IQ virtual data mart story include: Read more
